Results 1  10
of
88
Contrastive estimation: Training loglinear models on unlabeled data
 In Proc. of ACL
, 2005
"... Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are loglinear, allowing the incorporation of arbitrary features into the model. To train on unlabele ..."
Abstract

Cited by 131 (15 self)
 Add to MetaCart
(Show Context)
Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are loglinear, allowing the incorporation of arbitrary features into the model. To train on unlabeled data, we require unsupervised estimation methods for loglinear models; few exist. We describe a novel approach, contrastive estimation. We show that the new technique can be intuitively understood as exploiting implicit negative evidence and is computationally efficient. Applied to a sequence labeling problem—POS tagging given a tagging dictionary and unlabeled text—contrastive estimation outperforms EM (with the same feature set), is more robust to degradations of the dictionary, and can largely recover by modeling additional features. 1
Topic Models Conditioned on Arbitrary Features with Dirichletmultinomial Regression
"... Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichletmultinomial regression (DMR) topic model that includes a loglinear prior on ..."
Abstract

Cited by 60 (1 self)
 Add to MetaCart
(Show Context)
Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichletmultinomial regression (DMR) topic model that includes a loglinear prior on documenttopic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates. We show that by selecting appropriate features, DMR topic models can meet or exceed the performance of several previously published topic models designed for specific data. 1
A Fast Dual Algorithm for Kernel Logistic Regression
, 2002
"... This paper gives a new iterative algorithm for kernel logistic regression. It is based ..."
Abstract

Cited by 38 (0 self)
 Add to MetaCart
This paper gives a new iterative algorithm for kernel logistic regression. It is based
Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation
"... We describe a new approach to SMT adaptation that weights outofdomain phrase pairs according to their relevance to the target domain, determined by both how similar to it they appear to be, and whether they belong to general language or not. This extends previous work on discriminative weighting b ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
(Show Context)
We describe a new approach to SMT adaptation that weights outofdomain phrase pairs according to their relevance to the target domain, determined by both how similar to it they appear to be, and whether they belong to general language or not. This extends previous work on discriminative weighting by using a finer granularity, focusing on the properties of instances rather than corpus components, and using a simpler training procedure. We incorporate instance weighting into a mixturemodel framework, and find that it yields consistent improvements over a wide range of baselines. 1
Integrating Visual and Range Data for Robotic Object Detection
"... Abstract. The problem of object detection and recognition is a notoriously difficult one, and one that has been the focus of much work in the computer vision and robotics communities. Most work has concentrated on systems that operate purely on visual inputs (i.e., images) and largely ignores other ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
(Show Context)
Abstract. The problem of object detection and recognition is a notoriously difficult one, and one that has been the focus of much work in the computer vision and robotics communities. Most work has concentrated on systems that operate purely on visual inputs (i.e., images) and largely ignores other sensor modalities. However, despite the great progress made down this track, the goal of high accuracy object detection for robotic platforms in cluttered realworld environments remains elusive. Instead of relying on information from the image alone, we present a method that exploits the multiple sensor modalities available on a robotic platform. In particular, our method augments a 2d object detector with 3d information from a depth sensor to produce a “multimodal object detector.” We demonstrate our method on a working robotic system and evaluate its performance on a number of common household/office objects. 1
Guiding unsupervised grammar induction using contrastive estimation
 In Proc. of IJCAI Workshop on Grammatical Inference Applications
, 2005
"... We describe a novel training criterion for probabilistic grammar induction models, contrastive estimation [Smith and Eisner, 2005], which can be interpreted as exploiting implicit negative evidence and includes a wide class of likelihoodbased objective functions. This criterion is a generalization ..."
Abstract

Cited by 25 (7 self)
 Add to MetaCart
We describe a novel training criterion for probabilistic grammar induction models, contrastive estimation [Smith and Eisner, 2005], which can be interpreted as exploiting implicit negative evidence and includes a wide class of likelihoodbased objective functions. This criterion is a generalization of the function maximized by the ExpectationMaximization algorithm [Dempster et al., 1977]. CE is a natural fit for loglinear models, which can include arbitrary features but for which EM is computationally difficult. We show that, using the same features, loglinear dependency grammar models trained using CE can drastically outperform EMtrained generative models on the task of matching human linguistic annotations (the MATCHLINGUIST task). The selection of an implicit negative evidence class—a “neighborhood”—appropriate to a given task has strong implications, but a good neighborhood one can target the objective of grammar induction to a specific application. 1
Optimizing costly functions with simple constraints: A limitedmemory projected quasinewton algorithm
 Proc. of Conf. on Artificial Intelligence and Statistics
, 2009
"... An optimization algorithm for minimizing a smooth function over a convex set is described. Each iteration of the method computes a descent direction by minimizing, over the original constraints, a diagonal plus lowrank quadratic approximation to the function. The quadratic approximation is construct ..."
Abstract

Cited by 25 (7 self)
 Add to MetaCart
(Show Context)
An optimization algorithm for minimizing a smooth function over a convex set is described. Each iteration of the method computes a descent direction by minimizing, over the original constraints, a diagonal plus lowrank quadratic approximation to the function. The quadratic approximation is constructed using a limitedmemory quasiNewton update. The method is suitable for largescale problems where evaluation of the function is substantially more expensive than projection onto the constraint set. Numerical experiments on onenorm regularized test problems indicate that the proposed method is competitive with stateoftheart methods such as boundconstrained LBFGS and orthantwise descent. We further show that the method generalizes to a wide class of problems, and substantially improves on stateoftheart methods for problems such as learning the structure of Gaussian graphical models and Markov random fields. 1
Informationtheoretic semantic multimedia indexing
 in ACM Conference on Image and Video Retrieval
, 2007
"... To solve the problem of indexing collections with diverse text documents, image documents, or documents with both text and images, one needs to develop a model that supports heterogeneous types of documents. In this paper, we show how information theory supplies us with the tools necessary to develo ..."
Abstract

Cited by 22 (10 self)
 Add to MetaCart
(Show Context)
To solve the problem of indexing collections with diverse text documents, image documents, or documents with both text and images, one needs to develop a model that supports heterogeneous types of documents. In this paper, we show how information theory supplies us with the tools necessary to develop a unique model for text, image, and text/image retrieval. In our approach, for each possible query keyword we estimate a maximum entropy model based on exclusively continuous features that were preprocessed. The unique continuous featurespace of text and visual data is constructed by using a minimum description length criterion to find the optimal featurespace representation (optimal from an information theory point of view). We evaluate our approach in three experiments: only text retrieval, only image retrieval, and text combined with image retrieval.
2010 Analysis and generalizations of the linearized Bregman method
 SIAM J. Imaging Sci
"... Abstract. This paper analyzes and improves the linearized Bregman method for solving the basis pursuit and related sparse optimization problems. The analysis shows that the linearized Bregman method has the exact regularization property; namely, it converges to an exact solution of the basis pursuit ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
(Show Context)
Abstract. This paper analyzes and improves the linearized Bregman method for solving the basis pursuit and related sparse optimization problems. The analysis shows that the linearized Bregman method has the exact regularization property; namely, it converges to an exact solution of the basis pursuit problem whenever its smooth parameter α is greater than a certain value. The analysis is based on showing that the linearized Bregman algorithm is equivalent to gradient descent applied to a certain dual formulation. This result motivates generalizations of the algorithm enabling the use of gradientbased optimization techniques such as line search, Barzilai–Borwein, limited memory BFGS (LBFGS), nonlinear conjugate gradient, and Nesterov’s methods. In the numerical simulations, the two proposed implementations, one using Barzilai–Borwein steps with nonmonotone line search and the other using LBFGS, gave more accurate solutions in much shorter times than the basic implementation of the linearized Bregman method with a socalled kicking technique. Key words. Bregman, linearized Bregman, compressed sensing, ℓ1minimization, basis pursuit
BART: A modular toolkit for coreference resolution
 In Association for Computational Linguistics (ACL) Demo Session
, 2008
"... Developing a full coreference system able to run all the way from raw text to semantic interpretation is a considerable engineering effort. Accordingly, there is very limited availability of offthe shelf tools for researchers whose interests are not primarily in coreference or others who want to co ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
Developing a full coreference system able to run all the way from raw text to semantic interpretation is a considerable engineering effort. Accordingly, there is very limited availability of offthe shelf tools for researchers whose interests are not primarily in coreference or others who want to concentrate on a specific aspect of the problem. We present BART, a highly modular toolkit for developing coreference applications. In the Johns Hopkins workshop on using lexical and encyclopedic knowledge for entity disambiguation, the toolkit was used to extend a reimplementation of the Soon et al. (2001) proposal with a variety of additional syntactic and knowledgebased features, and experiment with alternative resolution processes, preprocessing tools, and classifiers. 1.