Results 1  10
of
170
Shallow Parsing with Conditional Random Fields
, 2003
"... Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluati ..."
Abstract

Cited by 570 (8 self)
 Add to MetaCart
Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluation datasets and extensive comparison among methods. We show here how to train a conditional random field to achieve performance as good as any reported base nounphrase chunking method on the CoNLL task, and better than any reported single model. Improved training methods based on modern optimization algorithms were critical in achieving these results. We present extensive comparisons between models and training methods that confirm and strengthen previous results on shallow parsing and training methods for maximumentropy models.
Word representations: A simple and general method for semisupervised learning
 In ACL
, 2010
"... If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and ch ..."
Abstract

Cited by 202 (3 self)
 Add to MetaCart
(Show Context)
If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near stateoftheart supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. We find further improvements by combining different word representations. You can download our word features, for offtheshelf use in existing NLP systems, as well as our code, here:
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data
 IN ICML
, 2004
"... In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linearchain cond ..."
Abstract

Cited by 166 (12 self)
 Add to MetaCart
(Show Context)
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linearchain conditional random fields (CRFs) in which each time slice contains a set of state variables and edgesa distributed state representation as in dynamic Bayesian networks (DBNs)and parameters are tied across slices. Since exact
Largescale machine learning with stochastic gradient descent
 in COMPSTAT
, 2010
"... Abstract. During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs ..."
Abstract

Cited by 147 (1 self)
 Add to MetaCart
Abstract. During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs for the case of smallscale and largescale learning problems. The largescale case involves the computational complexity of the underlying optimization algorithm in nontrivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for largescale problems. In particular, second order stochastic gradient and averaged stochastic gradient are asymptotically efficient after a single pass on the training set.
Accelerated training of conditional random fields with stochastic gradient methods
 In ICML
, 2006
"... We apply Stochastic MetaDescent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than lim ..."
Abstract

Cited by 138 (6 self)
 Add to MetaCart
(Show Context)
We apply Stochastic MetaDescent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than limitedmemory BFGS, the leading method reported to date. We report results for both exact and inexact inference techniques. 1.
Posterior Regularization for Structured Latent Variable Models
"... We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model co ..."
Abstract

Cited by 134 (8 self)
 Add to MetaCart
(Show Context)
We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efficiency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efficient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multiview learning, crosslingual dependency grammar induction, unsupervised partofspeech induction, and bitext word alignment. 1
Named Entity Recognition in Tweets: An Experimental Study
, 2011
"... People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by rebu ..."
Abstract

Cited by 126 (11 self)
 Add to MetaCart
(Show Context)
People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by rebuilding the NLP pipeline beginning with partofspeech tagging, through chunking, to namedentity recognition. Our novel TNER system doubles F1 score compared with the Stanford NER system. TNER leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms cotraining, increasing F1 by 25 % over ten common entity types. Our NLP tools are available at:
The importance of syntactic parsing and inference in semantic role labeling
 COMPUTATIONAL LINGUISTICS
, 2008
"... We present a general framework for semantic role labeling. The framework combines a machine learning technique with an integer linear programming based inference procedure, which incorporates linguistic and structural constraints into a global decision process. Within this framework, we study the ro ..."
Abstract

Cited by 92 (26 self)
 Add to MetaCart
(Show Context)
We present a general framework for semantic role labeling. The framework combines a machine learning technique with an integer linear programming based inference procedure, which incorporates linguistic and structural constraints into a global decision process. Within this framework, we study the role of syntactic parsing information in semantic role labeling. We show that full syntactic parsing information is, by far, most relevant in identifying the argument, especially, in the very first stage—the pruning stage. Surprisingly, the quality of the pruning stage cannot be solely determined based on its recall and precision. Instead, it depends on the characteristics of the output candidates that determine the difficulty of the downstream problems. Motivated by this observation, we propose an effective and simple approach of combining different semantic role labeling systems through joint inference, which significantly improves its performance. Our system has been evaluated in the CoNLL2005 shared task on semantic role labeling, and achieves the highest F1 score among 19 participants.
Integer linear programming inference for conditional random fields
 Proc. of ICML
, 2005
"... Inference in Conditional Random Fields and Hidden Markov Models is done using the Viterbi algorithm, an efficient dynamic programming algorithm. In many cases, general (nonlocal and nonsequential) constraints may exist over the output sequence, but cannot be incorporated and exploited in a natu ..."
Abstract

Cited by 89 (18 self)
 Add to MetaCart
Inference in Conditional Random Fields and Hidden Markov Models is done using the Viterbi algorithm, an efficient dynamic programming algorithm. In many cases, general (nonlocal and nonsequential) constraints may exist over the output sequence, but cannot be incorporated and exploited in a natural way by this inference procedure. This paper proposes a novel inference procedure based on integer linear programming (ILP) and extends CRF models to naturally and efficiently support general constraint structures. For sequential constraints, this procedure reduces to simple linear programming as the inference process. Experimental evidence is supplied in the context of an important NLP problem, semantic role labeling. 1.
Semantic role labeling via integer linear programming inference
 In Proceedings of COLING04
, 2004
"... We present a system for the semantic role labeling task. The system combines a machine learning technique with an inference procedure based on integer linear programming that supports the incorporation of linguistic and structural constraints into the decision process. The system is tested on the da ..."
Abstract

Cited by 85 (23 self)
 Add to MetaCart
(Show Context)
We present a system for the semantic role labeling task. The system combines a machine learning technique with an inference procedure based on integer linear programming that supports the incorporation of linguistic and structural constraints into the decision process. The system is tested on the data provided in the CoNLL2004 shared task on semantic role labeling and achieves very competitive results. 1