Results 11  20
of
230
Unsupervised Searchbased Structured Prediction
, 2009
"... We describe an adaptation and application of a searchbased structured prediction algorithm “Searn” to unsupervised learning problems. We show that it is possible to reduce unsupervised learning to supervised learning and demonstrate a highquality unsupervised shiftreduce parsing model. We additio ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
We describe an adaptation and application of a searchbased structured prediction algorithm “Searn” to unsupervised learning problems. We show that it is possible to reduce unsupervised learning to supervised learning and demonstrate a highquality unsupervised shiftreduce parsing model. We additionally show a close connection between unsupervised Searn and expectation maximization. Finally, we demonstrate the efficacy of a semisupervised extension. The key idea that enables this is an application of the predictself idea for unsupervised learning.
A unified and discriminative model for query refinement
 In SIGIR ‘08
, 2008
"... This paper addresses the issue of query refinement, which involves reformulating illformed search queries in order to enhance relevance of search results. Query refinement typically includes a number of tasks such as spelling error correction, word splitting, word merging, phrase segmentation, word ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
(Show Context)
This paper addresses the issue of query refinement, which involves reformulating illformed search queries in order to enhance relevance of search results. Query refinement typically includes a number of tasks such as spelling error correction, word splitting, word merging, phrase segmentation, word stemming, and acronym expansion. In previous research, such tasks were addressed separately or through employing generative models. This paper proposes employing a unified and discriminative model for query refinement. Specifically, it proposes a Conditional Random Field (CRF) model suitable for the problem, referred to as Conditional Random Field for Query Refinement (CRFQR). Given a sequence of query words, CRFQR predicts a sequence of refined query words as well as corresponding refinement operations. In that sense, CRFQR differs greatly from conventional CRF models. Two types of CRFQR models, namely a basic model and an extended model are introduced. One merit of employing CRFQR is that different refinement tasks can be performed simultaneously and thus the accuracy of refinement can be enhanced. Furthermore, the advantages of discriminative models over generative models can be fully leveraged. Experimental results demonstrate that CRFQR can significantly outperform baseline methods. Furthermore, when CRFQR is used in web search, a significant improvement of relevance can be obtained.
What HMMs can do
, 2002
"... Since their inception over thirty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems — today, most stateoftheart speech systems are HMMbased. There have been a number of ways to explain HMMs and to list their capabil ..."
Abstract

Cited by 50 (5 self)
 Add to MetaCart
Since their inception over thirty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems — today, most stateoftheart speech systems are HMMbased. There have been a number of ways to explain HMMs and to list their capabilities, each of these ways having both advantages and disadvantages. In an effort to better understand what HMMs can do, this tutorial analyzes HMMs by exploring a novel way in which an HMM can be defined, namely in terms of random variables and conditional independence assumptions. We prefer this definition as it allows us to reason more throughly about the capabilities of HMMs. In particular, it is possible to deduce that there are, in theory at least, no theoretical limitations to the class of probability distributions representable by HMMs. This paper concludes that, in search of a model to supersede the HMM for ASR, we should rather than trying to correct for HMM limitations in the general case, new models should be found based on their potential for better parsimony, computational requirements, and noise insensitivity.
Using combinatorial optimization within maxproduct belief propagation
 Advances in Neural Information Processing Systems (NIPS
, 2007
"... In general, the problem of computing a maximum a posteriori (MAP) assignment in a Markov random field (MRF) is computationally intractable. However, in certain subclasses of MRF, an optimal or closetooptimal assignment can be found very efficiently using combinatorial optimization algorithms: cert ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
(Show Context)
In general, the problem of computing a maximum a posteriori (MAP) assignment in a Markov random field (MRF) is computationally intractable. However, in certain subclasses of MRF, an optimal or closetooptimal assignment can be found very efficiently using combinatorial optimization algorithms: certain MRFs with mutual exclusion constraints can be solved using bipartite matching, and MRFs with regular potentials can be solved using minimum cut methods. However, these solutions do not apply to the many MRFs that contain such tractable components as subnetworks, but also other noncomplying potentials. In this paper, we present a new method, called COMPOSE, for exploiting combinatorial optimization for subnetworks within the context of a maxproduct belief propagation algorithm. COMPOSE uses combinatorial optimization for computing exact maxmarginals for an entire subnetwork; these can then be used for inference in the context of the network as a whole. We describe highly efficient methods for computing maxmarginals for subnetworks corresponding both to bipartite matchings and to regular networks. We present results on both synthetic and real networks encoding correspondence problems between images, which involve both matching constraints and pairwise geometric constraints. We compare to a range of current methods, showing that the ability of COMPOSE to transmit information globally across the network leads to improved convergence, decreased running time, and higherscoring assignments. 1
Learning Gaussian conditional random fields for lowlevel vision
 In Proc. of CVPR
, 2007
"... Markov Random Field (MRF) models are a popular tool for vision and image processing. Gaussian MRF models are particularly convenient to work with because they can be implemented using matrix and linear algebra routines. However, recent research has focused on on discretevalued and nonconvex MRF mo ..."
Abstract

Cited by 46 (3 self)
 Add to MetaCart
(Show Context)
Markov Random Field (MRF) models are a popular tool for vision and image processing. Gaussian MRF models are particularly convenient to work with because they can be implemented using matrix and linear algebra routines. However, recent research has focused on on discretevalued and nonconvex MRF models because Gaussian models tend to oversmooth images and blur edges. In this paper, we show how to train a Gaussian Conditional Random Field (GCRF) model that overcomes this weakness and can outperform the nonconvex Field of Experts model on the task of denoising images. A key advantage of the GCRF model is that the parameters of the model can be optimized efficiently on relatively large images. The competitive performance of the GCRF model and the ease of optimizing its parameters make the GCRF model an attractive option for vision and image processing applications. 1.
Word alignment via quadratic assignment
 In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference
, 2006
"... Recently, discriminative word alignment methods have achieved stateoftheart accuracies by extending the range of information sources that can be easily incorporated into aligners. The chief advantage of a discriminative framework is the ability to score alignments based on arbitrary features of t ..."
Abstract

Cited by 45 (6 self)
 Add to MetaCart
(Show Context)
Recently, discriminative word alignment methods have achieved stateoftheart accuracies by extending the range of information sources that can be easily incorporated into aligners. The chief advantage of a discriminative framework is the ability to score alignments based on arbitrary features of the matching word tokens, including orthographic form, predictions of other models, lexical context and so on. However, the proposed bipartite matching model of Taskar et al. (2005), despite being tractable and effective, has two important limitations. First, it is limited by the restriction that words have fertility of at most one. More importantly, first order correlations between consecutive words cannot be directly captured by the model. In this work, we address these limitations by enriching the model form. We give estimation and inference algorithms for these enhancements. Our best model achieves a relative AER reduction of 25% over the basic matching formulation, outperforming intersected IBM Model 4 without using any overly computeintensive features. By including predictions of other models as features, we achieve AER of 3.8 on the standard Hansards dataset. 1
Solving Multiclass Support Vector Machines with LaRank
 In 24th International Conference on Machine Learning
, 2007
"... Optimization algorithms for large margin multiclass recognizers are often too costly to handle ambitious problems with structured outputs and exponential numbers of classes. Optimization algorithms that rely on the full gradient are not effective because, unlike the solution, the gradient is not spa ..."
Abstract

Cited by 45 (3 self)
 Add to MetaCart
(Show Context)
Optimization algorithms for large margin multiclass recognizers are often too costly to handle ambitious problems with structured outputs and exponential numbers of classes. Optimization algorithms that rely on the full gradient are not effective because, unlike the solution, the gradient is not sparse and is very large. The LaRank algorithm sidesteps this difficulty by relying on a randomized exploration inspired by the perceptron algorithm. We show that this approach is competitive with gradient based optimizers on simple multiclass problems. Furthermore, a single LaRank pass over the training examples delivers test error rates that are nearly as good as those of the final solution. 1.
Hierarchical Apprenticeship Learning, with Application to Quadruped Locomotion
"... We consider apprenticeship learning—learning from expert demonstrations—in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difficulty controlling th ..."
Abstract

Cited by 43 (3 self)
 Add to MetaCart
(Show Context)
We consider apprenticeship learning—learning from expert demonstrations—in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difficulty controlling the system, which makes this approach infeasible. For example, consider the task of teaching a quadruped robot to navigate over extreme terrain; demonstrating an optimal policy (i.e., an optimal set of foot locations over the entire terrain) is a highly nontrivial task, even for an expert. In this paper we propose a method for hierarchical apprenticeship learning, which allows the algorithm to accept isolated advice at different hierarchical levels of the control task. This type of advice is often feasible for experts to give, even if the expert is unable to demonstrate complete trajectories. This allows us to extend the apprenticeship learning paradigm to much larger, more challenging domains. In particular, in this paper we apply the hierarchical apprenticeship learning algorithm to the task of quadruped locomotion over extreme terrain, and achieve, to the best of our knowledge, results superior to any previously published work. 1
Decision Tree Fields
"... This paper introduces a new formulation for discrete image labeling tasks, the Decision Tree Field (DTF), that combines and generalizes random forests and conditional random fields (CRF) which have been widely used in computer vision. In a typical CRF model the unary potentials are derived from soph ..."
Abstract

Cited by 43 (8 self)
 Add to MetaCart
(Show Context)
This paper introduces a new formulation for discrete image labeling tasks, the Decision Tree Field (DTF), that combines and generalizes random forests and conditional random fields (CRF) which have been widely used in computer vision. In a typical CRF model the unary potentials are derived from sophisticated random forest or boosting based classifiers, however, the pairwise potentials are assumed to (1) have a simple parametric form with a prespecified and fixed dependence on the image data, and (2) to be defined on the basis of a small and fixed neighborhood. In contrast, in DTF, local interactions between multiple variables are determined by means of decision trees evaluated on the image data, allowing the interactions to be adapted to the image content. This results in powerful graphical models which are able to represent complex label structure. Our key technical contribution is to show that the DTF model can be trained efficiently and jointly using a convex approximate likelihood function, enabling us to learn over a million free model parameters. We show experimentally that for applications which have a rich and complex label structure, our model achieves excellent results. 1.
Efficient parameter estimation for RNA secondary structure prediction
 BIOINFORMATICS
"... Motivation: Accurate prediction of RNA secondary structure from the base sequence is an unsolved computational challenge. The accuracy of predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The most widely used model, t ..."
Abstract

Cited by 38 (9 self)
 Add to MetaCart
Motivation: Accurate prediction of RNA secondary structure from the base sequence is an unsolved computational challenge. The accuracy of predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The most widely used model, the Turner99 model, has hundreds of parameters, and so a robust parameter estimation scheme should efficiently handle large data sets with thousands of structures. Moreover, the estimation scheme should also be trained using available experimental free energy data in addition to structural data. Results: In this work, we present constraint generation (CG), the first computational approach to RNA free energy parameter estimation that can be efficiently trained on large sets of structural as well as thermodynamic data. Our constraint generation approach employs a novel iterative scheme, whereby the energy values are first computed as the solution to a constrained optimization problem. Then the newlycomputed energy parameters are used to update the constraints on the optimization function, so as to better optimize the energy parameters in the next iteration. Using our method on biologically sound data, we obtain revised parameters for the Turner99 energy model. We show that by using our new parameters, we obtain significant improvements in prediction accuracy over current stateoftheart methods.