Results 1 - 10
of
49
Guiding semi-supervision with constraint-driven learning
- In Proc. of the Annual Meeting of the ACL
, 2007
"... Over the last few years, two of the main research directions in machine learning of natural language processing have been the study of semi-supervised learning algorithms as a way to train classifiers when the labeled data is scarce, and the study of ways to exploit knowledge and global information ..."
Abstract
-
Cited by 32 (8 self)
- Add to MetaCart
Over the last few years, two of the main research directions in machine learning of natural language processing have been the study of semi-supervised learning algorithms as a way to train classifiers when the labeled data is scarce, and the study of ways to exploit knowledge and global information in structured learning tasks. In this paper, we suggest a method for incorporating domain knowledge in semi-supervised learning algorithms. Our novel framework unifies and can exploit several kinds of task specific constraints. The experimental results presented in the information extraction domain demonstrate that applying constraints helps the model to generate better feedback during learning, and hence the framework allows for high performance learning with significantly less training data than was possible before on these tasks. 1
The importance of syntactic parsing and inference in semantic role labeling
- COMPUTATIONAL LINGUISTICS
, 2008
"... We present a general framework for semantic role labeling. The framework combines a machine learning technique with an integer linear programming based inference procedure, which incorporates linguistic and structural constraints into a global decision process. Within this framework, we study the ro ..."
Abstract
-
Cited by 28 (13 self)
- Add to MetaCart
We present a general framework for semantic role labeling. The framework combines a machine learning technique with an integer linear programming based inference procedure, which incorporates linguistic and structural constraints into a global decision process. Within this framework, we study the role of syntactic parsing information in semantic role labeling. We show that full syntactic parsing information is, by far, most relevant in identifying the argument, especially, in the very first stage—the pruning stage. Surprisingly, the quality of the pruning stage cannot be solely determined based on its recall and precision. Instead, it depends on the characteristics of the output candidates that determine the difficulty of the downstream problems. Motivated by this observation, we propose an effective and simple approach of combining different semantic role labeling systems through joint inference, which significantly improves its performance. Our system has been evaluated in the CoNLL-2005 shared task on semantic role labeling, and achieves the highest F1 score among 19 participants.
A robust combination strategy for semantic role labeling
- Journal of Artificial Intelligence Research
, 2005
"... This paper focuses on semantic role labeling using automatically-generated syntactic information. A simple and robust strategy for system combination is presented, which allows to partially recover from input parsing errors and to significantly boost results of individual systems. This combination s ..."
Abstract
-
Cited by 25 (7 self)
- Add to MetaCart
This paper focuses on semantic role labeling using automatically-generated syntactic information. A simple and robust strategy for system combination is presented, which allows to partially recover from input parsing errors and to significantly boost results of individual systems. This combination scheme is also very flexible since the individual systems are not required to provide any information other than their solution. Extensive experimental evaluation in the CoNLL-2005 shared task framework supports our previous claims. The proposed architecture outperforms the best results reported in that evaluation exercise.
On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing
- In Proc. EMNLP
, 2010
"... This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. The approa ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram part-of-speech tagger. 1
A new perceptron algorithm for sequence labeling with non-local features
- In Proceedings of EMNLP
, 2007
"... We cannot use non-local features with current major methods of sequence labeling such as CRFs due to concerns about complexity. We propose a new perceptron algorithm that can use non-local features. Our algorithm allows the use of all types of non-local features whose values are determined from the ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
We cannot use non-local features with current major methods of sequence labeling such as CRFs due to concerns about complexity. We propose a new perceptron algorithm that can use non-local features. Our algorithm allows the use of all types of non-local features whose values are determined from the sequence and the labels. The weights of local and non-local features are learned together in the training process with guaranteed convergence. We present experimental results from the CoNLL 2003 named entity recognition (NER) task to demonstrate the performance of the proposed algorithm. 1
Modeling Discriminative Global Inference
- PROCEEDINGS OF THE FIRST IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC)
, 2007
"... Many recent advances in complex domains such as Natural Language Processing (NLP) have taken a discriminative approach in conjunction with the global application of structural and domain specific constraints. We introduce LBJ, a new modeling language for specifying exact inference systems of this ty ..."
Abstract
-
Cited by 16 (12 self)
- Add to MetaCart
Many recent advances in complex domains such as Natural Language Processing (NLP) have taken a discriminative approach in conjunction with the global application of structural and domain specific constraints. We introduce LBJ, a new modeling language for specifying exact inference systems of this type, combining ideas from machine learning, optimization, First Order Logic (FOL), and Object Oriented Programming (OOP). Expressive constraints are specified declaratively as arbitrary FOL formulas over functions and objects. The language's run-time library translates them to a mathematical programming representation from which an exact solution is computed. In addition, the compiler leverages an existing OOP language: objects and functions are grouped as the OOP objects and methods that encapsulate the user's data.
Polyhedral Outer Approximations with Application to Natural Language Parsing
"... Recent approaches to learning structured predictors often require approximate inference for tractability; yet its effects on the learned model are unclear. Meanwhile, most learning algorithms act as if computational cost was constant within the model class. This paper sheds some light on the first i ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Recent approaches to learning structured predictors often require approximate inference for tractability; yet its effects on the learned model are unclear. Meanwhile, most learning algorithms act as if computational cost was constant within the model class. This paper sheds some light on the first issue by establishing risk bounds for max-margin learning with LP relaxed inference and addresses the second issue by proposing a new paradigm that attempts to penalize “timeconsuming” hypotheses. Our analysis relies on a geometric characterization of the outer polyhedra associated with the LP relaxation. We then apply these techniques to the problem of dependency parsing, for which a concise LP formulation is provided that handles non-local output features. A significant improvement is shown over arc-factored models. 1.
Local search with very large-scale neighborhoods for optimal permutations in machine translation
- In Proc. of the Workshop on Computationally Hard Problems and Joint Inference
, 2006
"... We introduce a novel decoding procedure for statistical machine translation and other ordering tasks based on a family of Very Large-Scale Neighborhoods, some of which have previously been applied to other NP-hard permutation problems. We significantly generalize these problems by simultaneously con ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We introduce a novel decoding procedure for statistical machine translation and other ordering tasks based on a family of Very Large-Scale Neighborhoods, some of which have previously been applied to other NP-hard permutation problems. We significantly generalize these problems by simultaneously considering three distinct sets of ordering costs. We discuss how these costs might apply to MT, and some possibilities for training them. We show how to search and sample from exponentially large neighborhoods using efficient dynamic programming algorithms that resemble statistical parsing. We also incorporate techniques from statistical parsing to improve the runtime of our search. Finally, we report results of preliminary experiments indicating that the approach holds promise. 1
Structured Output Learning with Indirect Supervision
- PROC. 27TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 2010
"... We present a novel approach for structure prediction that addresses the difficulty of obtaining labeled structures for training. We observe that structured output problems often have a companion learning problem of determining whether a given input possesses a good structure. For example, the compan ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We present a novel approach for structure prediction that addresses the difficulty of obtaining labeled structures for training. We observe that structured output problems often have a companion learning problem of determining whether a given input possesses a good structure. For example, the companion problem for the part-ofspeech (POS) tagging task asks whether a given sequence of words has a corresponding sequence of POS tags that is “legitimate”. While obtaining direct supervision for structures is difficult and expensive, it is often very easy to obtain indirect supervision from the companion binary decision problem. In this paper, we develop a large margin framework that jointly learns from both direct and indirect forms of supervision. Our experiments exhibit the significant contribution of the easy-toget indirect binary supervision on three important NLP structure learning problems. 1.
A Generic Sentence Trimmer with CRFs
"... The paper presents a novel sentence trimmer in Japanese, which combines a non-statistical yet generic tree generation model and Conditional Random Fields (CRFs), to address improving the grammaticality of compression while retaining its relevance. Experiments found that the present approach outperfo ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The paper presents a novel sentence trimmer in Japanese, which combines a non-statistical yet generic tree generation model and Conditional Random Fields (CRFs), to address improving the grammaticality of compression while retaining its relevance. Experiments found that the present approach outperforms in grammaticality and in relevance a dependency-centric approach (Oguro et al., 2000; Morooka et al., 2004; Yamagata et al., 2006; Fukutomi et al., 2007) − the only line of work in prior literature (on Japanese compression) we are aware of that allows replication and permits a direct comparison.

