Results 1  10
of
130
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
, 2010
"... ..."
Contrastive estimation: Training loglinear models on unlabeled data
 In Proc. of ACL
, 2005
"... Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are loglinear, allowing the incorporation of arbitrary features into the model. To train on unlabele ..."
Abstract

Cited by 157 (16 self)
 Add to MetaCart
(Show Context)
Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are loglinear, allowing the incorporation of arbitrary features into the model. To train on unlabeled data, we require unsupervised estimation methods for loglinear models; few exist. We describe a novel approach, contrastive estimation. We show that the new technique can be intuitively understood as exploiting implicit negative evidence and is computationally efficient. Applied to a sequence labeling problem—POS tagging given a tagging dictionary and unlabeled text—contrastive estimation outperforms EM (with the same feature set), is more robust to degradations of the dictionary, and can largely recover by modeling additional features. 1
Topic Models Conditioned on Arbitrary Features with Dirichletmultinomial Regression
"... Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichletmultinomial regression (DMR) topic model that includes a loglinear prior on ..."
Abstract

Cited by 99 (1 self)
 Add to MetaCart
(Show Context)
Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichletmultinomial regression (DMR) topic model that includes a loglinear prior on documenttopic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates. We show that by selecting appropriate features, DMR topic models can meet or exceed the performance of several previously published topic models designed for specific data. 1
Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation
"... We describe a new approach to SMT adaptation that weights outofdomain phrase pairs according to their relevance to the target domain, determined by both how similar to it they appear to be, and whether they belong to general language or not. This extends previous work on discriminative weighting b ..."
Abstract

Cited by 63 (5 self)
 Add to MetaCart
(Show Context)
We describe a new approach to SMT adaptation that weights outofdomain phrase pairs according to their relevance to the target domain, determined by both how similar to it they appear to be, and whether they belong to general language or not. This extends previous work on discriminative weighting by using a finer granularity, focusing on the properties of instances rather than corpus components, and using a simpler training procedure. We incorporate instance weighting into a mixturemodel framework, and find that it yields consistent improvements over a wide range of baselines. 1
Optimizing costly functions with simple constraints: A limitedmemory projected quasinewton algorithm
 Proc. of Conf. on Artificial Intelligence and Statistics
, 2009
"... An optimization algorithm for minimizing a smooth function over a convex set is described. Each iteration of the method computes a descent direction by minimizing, over the original constraints, a diagonal plus lowrank quadratic approximation to the function. The quadratic approximation is construct ..."
Abstract

Cited by 51 (9 self)
 Add to MetaCart
(Show Context)
An optimization algorithm for minimizing a smooth function over a convex set is described. Each iteration of the method computes a descent direction by minimizing, over the original constraints, a diagonal plus lowrank quadratic approximation to the function. The quadratic approximation is constructed using a limitedmemory quasiNewton update. The method is suitable for largescale problems where evaluation of the function is substantially more expensive than projection onto the constraint set. Numerical experiments on onenorm regularized test problems indicate that the proposed method is competitive with stateoftheart methods such as boundconstrained LBFGS and orthantwise descent. We further show that the method generalizes to a wide class of problems, and substantially improves on stateoftheart methods for problems such as learning the structure of Gaussian graphical models and Markov random fields. 1
A Fast Dual Algorithm for Kernel Logistic Regression
, 2002
"... This paper gives a new iterative algorithm for kernel logistic regression. It is based ..."
Abstract

Cited by 45 (0 self)
 Add to MetaCart
This paper gives a new iterative algorithm for kernel logistic regression. It is based
2010 Analysis and generalizations of the linearized Bregman method
 SIAM J. Imaging Sci
"... Abstract. This paper analyzes and improves the linearized Bregman method for solving the basis pursuit and related sparse optimization problems. The analysis shows that the linearized Bregman method has the exact regularization property; namely, it converges to an exact solution of the basis pursuit ..."
Abstract

Cited by 39 (10 self)
 Add to MetaCart
(Show Context)
Abstract. This paper analyzes and improves the linearized Bregman method for solving the basis pursuit and related sparse optimization problems. The analysis shows that the linearized Bregman method has the exact regularization property; namely, it converges to an exact solution of the basis pursuit problem whenever its smooth parameter α is greater than a certain value. The analysis is based on showing that the linearized Bregman algorithm is equivalent to gradient descent applied to a certain dual formulation. This result motivates generalizations of the algorithm enabling the use of gradientbased optimization techniques such as line search, Barzilai–Borwein, limited memory BFGS (LBFGS), nonlinear conjugate gradient, and Nesterov’s methods. In the numerical simulations, the two proposed implementations, one using Barzilai–Borwein steps with nonmonotone line search and the other using LBFGS, gave more accurate solutions in much shorter times than the basic implementation of the linearized Bregman method with a socalled kicking technique. Key words. Bregman, linearized Bregman, compressed sensing, ℓ1minimization, basis pursuit
Single Image Depth Estimation From Predicted Semantic Labels
"... We consider the problem of estimating the depth of each pixel in a scene from a single monocular image. Unlike traditional approaches [18, 19], which attempt to map from appearance features to depth directly, we first perform a semantic segmentation of the scene and use the semantic labels to guide ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of estimating the depth of each pixel in a scene from a single monocular image. Unlike traditional approaches [18, 19], which attempt to map from appearance features to depth directly, we first perform a semantic segmentation of the scene and use the semantic labels to guide the 3D reconstruction. This approach provides several advantages: By knowing the semantic class of a pixel or region, depth and geometry constraints can be easily enforced (e.g., “sky ” is far away and “ground” is horizontal). In addition, depth can be more readily predicted by measuring the difference in appearance with respect to a given semantic class. For example, a tree will have more uniform appearance in the distance than it does close up. Finally, the incorporation of semantic features allows us to achieve stateoftheart results with a significantly simpler model than previous works. 1.
Integrating Visual and Range Data for Robotic Object Detection
"... Abstract. The problem of object detection and recognition is a notoriously difficult one, and one that has been the focus of much work in the computer vision and robotics communities. Most work has concentrated on systems that operate purely on visual inputs (i.e., images) and largely ignores other ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
(Show Context)
Abstract. The problem of object detection and recognition is a notoriously difficult one, and one that has been the focus of much work in the computer vision and robotics communities. Most work has concentrated on systems that operate purely on visual inputs (i.e., images) and largely ignores other sensor modalities. However, despite the great progress made down this track, the goal of high accuracy object detection for robotic platforms in cluttered realworld environments remains elusive. Instead of relying on information from the image alone, we present a method that exploits the multiple sensor modalities available on a robotic platform. In particular, our method augments a 2d object detector with 3d information from a depth sensor to produce a “multimodal object detector.” We demonstrate our method on a working robotic system and evaluate its performance on a number of common household/office objects. 1
Guiding unsupervised grammar induction using contrastive estimation
 In Proc. of IJCAI Workshop on Grammatical Inference Applications
, 2005
"... We describe a novel training criterion for probabilistic grammar induction models, contrastive estimation [Smith and Eisner, 2005], which can be interpreted as exploiting implicit negative evidence and includes a wide class of likelihoodbased objective functions. This criterion is a generalization ..."
Abstract

Cited by 33 (8 self)
 Add to MetaCart
We describe a novel training criterion for probabilistic grammar induction models, contrastive estimation [Smith and Eisner, 2005], which can be interpreted as exploiting implicit negative evidence and includes a wide class of likelihoodbased objective functions. This criterion is a generalization of the function maximized by the ExpectationMaximization algorithm [Dempster et al., 1977]. CE is a natural fit for loglinear models, which can include arbitrary features but for which EM is computationally difficult. We show that, using the same features, loglinear dependency grammar models trained using CE can drastically outperform EMtrained generative models on the task of matching human linguistic annotations (the MATCHLINGUIST task). The selection of an implicit negative evidence class—a “neighborhood”—appropriate to a given task has strong implications, but a good neighborhood one can target the objective of grammar induction to a specific application. 1