Results 1 - 10
of
13
Head-Driven Statistical Models for Natural Language Parsing
, 2003
"... This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down ..."
Abstract
-
Cited by 780 (13 self)
- Add to MetaCart
This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street Journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models
Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
, 2002
"... We describe new algorithms for training tagging models, as an alternative to maximum-entropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modific ..."
Abstract
-
Cited by 340 (10 self)
- Add to MetaCart
We describe new algorithms for training tagging models, as an alternative to maximum-entropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modification of the proof of convergence of the perceptron algorithm for classification problems. We give experimental results on part-of-speech tagging and base noun phrase chunking, in both cases showing improvements over results for a maximum-entropy tagger.
Comparative Experiments on Learning Information Extractors for Proteins and their Interactions
, 2004
"... Automatically extracting information from biomedical text holds the promise of easily consolidating large amounts of biological knowledge in computer-accessible form. This strategy is particularly attractive for extracting data relevant to genes of the human genome from the 11 million abstracts in M ..."
Abstract
-
Cited by 55 (7 self)
- Add to MetaCart
Automatically extracting information from biomedical text holds the promise of easily consolidating large amounts of biological knowledge in computer-accessible form. This strategy is particularly attractive for extracting data relevant to genes of the human genome from the 11 million abstracts in Medline. However, extraction eorts have been frustrated by the lack of conventions for describing human genes and proteins. We have developed and evaluated a variety of learned information extraction systems for identifying human protein names in Medline abstracts and subsequently extracting information on interactions between the proteins. We demonstrate that machine learning approaches using support vector machines and maximum entropy are able to identify human proteins with higher accuracy than several previous approaches. We also demonstrate that various rule induction methods are able to identify protein interactions with higher precision than manually-developed rules.
Collective Information Extraction with Relational Markov Networks
, 2004
"... Most information extraction (IE) systems treat separate potential extractions as independent. However, in many cases, considering inuences between dierent potential extractions could improve overall accuracy. Statistical methods based on undirected graphical models, such as conditional random elds ..."
Abstract
-
Cited by 52 (4 self)
- Add to MetaCart
Most information extraction (IE) systems treat separate potential extractions as independent. However, in many cases, considering inuences between dierent potential extractions could improve overall accuracy. Statistical methods based on undirected graphical models, such as conditional random elds (CRFs), have been shown to be an eective approach to learning accurate IE systems. We present a new IE method that employs Relational Markov Networks (a generalization of CRFs), which can represent arbitrary dependencies between extractions. This allows for \collective information extraction" that exploits the mutual in- uence between possible extractions. Experiments on learning to extract protein names from biomedical text demonstrate the advantages of this approach.
An Alternate Objective Function for Markovian Fields
, 2002
"... In labelling or prediction tasks, a trained model's test performance is often based on the quality of its single-time marginal distributions over labels rather than its joint distribution over label sequences. We propose using a new cost function for discriminative learning that more accurately refl ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
In labelling or prediction tasks, a trained model's test performance is often based on the quality of its single-time marginal distributions over labels rather than its joint distribution over label sequences. We propose using a new cost function for discriminative learning that more accurately reflects such test time conditions. We present an efficient method to compute the gradient of this cost for Maximum Entropy Markov Models, Conditional Random Fields, and for an extension of these models involving hidden states. Our experimental results show that the new cost can give significant improvements and that it provides a novel and effective way of dealing with the 'label-bias' problem.
Discriminative Learning for Label Sequences via Boosting
- Advances in Neural Information Processing Systems 15
, 2002
"... This paper investigates a boosting approach to discriminative learning of label sequences based on a sequence rank loss function. ..."
Abstract
-
Cited by 25 (6 self)
- Add to MetaCart
This paper investigates a boosting approach to discriminative learning of label sequences based on a sequence rank loss function.
Active Learning of Partially Hidden Markov Models
- In Proceedings of the ECML/PKDD Workshop on Instance Selection
, 2001
"... We consider the task of learning hidden Markov models (HMMs) when only partially (sparsely) labeled observation sequences are available for training. This setting is motivated by the information extraction problem, where only few tokens in the training documents are given a semantic tag while most t ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
We consider the task of learning hidden Markov models (HMMs) when only partially (sparsely) labeled observation sequences are available for training. This setting is motivated by the information extraction problem, where only few tokens in the training documents are given a semantic tag while most tokens are unlabeled. We first describe the partially hidden Markov model together with an algorithm for learning HMMs from partially labeled data. We then present an active learning algorithm that selects "difficult" unlabeled tokens and asks the user to label them. We study empirically by how much active learning reduces the required data labeling effort, or increases the quality of the learned model achievable with a given amount of user effort.
Learning to Extract Proteins and their Interactions from Medline Abstracts
- In: ICML-2003 Workshop on Machine Learning in Bioinformatics. (2003
, 2003
"... We present results from a variety of learned information extraction systems for identifying human protein names in Medline abstracts and subsequently extracting interactions between the proteins. We demonstrate that machine learning approaches using support vector machines and hidden Markov m ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We present results from a variety of learned information extraction systems for identifying human protein names in Medline abstracts and subsequently extracting interactions between the proteins. We demonstrate that machine learning approaches using support vector machines and hidden Markov models are able to identify human proteins with higher accuracy than several previous approaches. We also demonstrate that various rule induction methods are able to identify protein interactions with higher precision than manually-developed rules.
On Improving the Efficiency of the Iterative Proportional Fitting Procedure
, 2003
"... Iterative proportional fitting (IPF) on junction trees is an important tool for learning in graphical models. We identify the propagation and IPF updates on the junction tree as fixed point equations of a single constrained entropy maximization problem. This allows a more efficient... ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Iterative proportional fitting (IPF) on junction trees is an important tool for learning in graphical models. We identify the propagation and IPF updates on the junction tree as fixed point equations of a single constrained entropy maximization problem. This allows a more efficient...
Extracting Gene and Protein Names from Biomedical Abstracts
- Unpublished Technical Note
, 2002
"... abstracts were obtained. Tagging was carried out using the texttagger.pl software downloaded from http://www-2.cs.cmu.edu/~kseymore/ general_tagger.pl. This program accepts a directory of les to be tagged and allows a user to tag them using a graphical user interface based on a le of possible labe ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
abstracts were obtained. Tagging was carried out using the texttagger.pl software downloaded from http://www-2.cs.cmu.edu/~kseymore/ general_tagger.pl. This program accepts a directory of les to be tagged and allows a user to tag them using a graphical user interface based on a le of possible labels and writes the tagged les into an output directory. An example of tagged abstract is shown in Figure 1.

