Results 1 - 10
of
17
Piecewise training of undirected models
- In Proc. of UAI
, 2005
"... For many large undirected models that arise in real-world applications, exact maximumlikelihood training is intractable, because it requires computing marginal distributions of the model. Conditional training is even more difficult, because the partition function depends not only on the parameters, ..."
Abstract
-
Cited by 55 (5 self)
- Add to MetaCart
For many large undirected models that arise in real-world applications, exact maximumlikelihood training is intractable, because it requires computing marginal distributions of the model. Conditional training is even more difficult, because the partition function depends not only on the parameters, but also on the observed input, requiring repeated inference over each training example. An appealing idea for such models is to independently train a local undirected classifier over each clique, afterwards combining the learned weights into a single global model. In this paper, we show that this piecewise method can be justified as minimizing a new family of upper bounds on the log partition function. On three natural-language data sets, piecewise training is more accurate than pseudolikelihood, and often performs comparably to global training using belief propagation. 1
Named Entity Recognition using an HMM-based Chunk Tagger
, 2002
"... This paper proposes an HMM-based chunk tagger, from which a named entity recognition system is built to combine four internal and external evidences: 1) simple internal feature such as capitalization and digitalization; 2) internal semantic feature of important triggers; 3) internal gazetteer fea ..."
Abstract
-
Cited by 46 (4 self)
- Add to MetaCart
This paper proposes an HMM-based chunk tagger, from which a named entity recognition system is built to combine four internal and external evidences: 1) simple internal feature such as capitalization and digitalization; 2) internal semantic feature of important triggers; 3) internal gazetteer feature; 4) external macro context feature.
Text Chunking based on a Generalization of Winnow
- Journal of Machine Learning Research
, 2001
"... This paper describes a text chunking system based on a generalization of the Winnow algorithm. ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
This paper describes a text chunking system based on a generalization of the Winnow algorithm.
Shallow Parsing Using Specialized HMMs
- Journal of Machine Learning Research
, 2002
"... We present a unified technique to solve di#erent shallow parsing tasks as a tagging problem using a Hidden Markov Model-based approach (HMM). This technique consists of the incorporation of the relevant information for each task into the models. To do this, the training corpus is transformed to t ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
We present a unified technique to solve di#erent shallow parsing tasks as a tagging problem using a Hidden Markov Model-based approach (HMM). This technique consists of the incorporation of the relevant information for each task into the models. To do this, the training corpus is transformed to take into account this information. In this way, no change is necessary for either the training or tagging process, so it allows for the use of a standard HMM approach. Taking into account this information, we construct a Specialized HMM which gives more complete contextual models. We have tested our system on chunking and clause identification tasks using di#erent specialization criteria. The results obtained are in line with the results reported for most of the relevant state-of-the-art approaches.
Memory-Based Shallow Parsing
- Journal of Machine Learning Research
, 2002
"... We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improvin ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and the results are compared with that of other systems. This reveals that our approach works well for base phrase identification while its application towards recognizing embedded structures leaves some room for improvement.
Shallow parsing using noisy and non-stationary training material
- Journal of Machine Learning Research
, 2002
"... Shallow parsers are usually assumed to be trained on noise-free material, drawn from the same distribution as the testing material. However, when either the training set is noisy or else drawn from a different distributions, performance may be degraded. Using the parsed Wall Street Journal, we inves ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Shallow parsers are usually assumed to be trained on noise-free material, drawn from the same distribution as the testing material. However, when either the training set is noisy or else drawn from a different distributions, performance may be degraded. Using the parsed Wall Street Journal, we investigate the performance of four shallow parsers (maximum entropy, memory-based learning, N-grams and ensemble learning) trained using various types of artificially noisy material. Our first set of results show that shallow parsers are surprisingly robust to synthetic noise, with performance gradually decreasing as the rate of noise increases. Further results show that no single shallow parser performs best in all noise situations. Final results show that simple, parser-specific extensions can improve noise-tolerance. Our second set of results addresses the question of whether naturally occurring disfluencies undermines performance more than does a change in distribution. Results using the parsed Switchboard corpus suggest that, although naturally occurring disfluencies might harm performance, differences in distribution between the training set and the testing set are more significant. 1.
Transductive HMM based chinese text chunking
- IEEE NLP-KE
, 2003
"... In this paper, we present a novel methodology to enhance Chinese text chunking with the aid of transductive Hidden Markov Models (transductive HMMs, henceforth). We consider chunking as a special tagging problem and attempt to utilize, via a number of transformation functions, as much relevant conte ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper, we present a novel methodology to enhance Chinese text chunking with the aid of transductive Hidden Markov Models (transductive HMMs, henceforth). We consider chunking as a special tagging problem and attempt to utilize, via a number of transformation functions, as much relevant contextual information as possible for model training. These functions enable the models to make use of contextual information to a greater extent and keep us away from costly changes of the original training and tagging process. Each of them results in an individual model with certain pros and cons. Through a number of experiments, we succeed in integrating the best two models into a significantly better one. We carry out the chunking experiments on the HIT Chinese Treebank corpus. Experimental results show that it is an effective approach, achieving an F score of
A (Acronyms)
, 2004
"... iii Dedication v Contents vi List of Tables x List of Figures xi Acknowledgments xiii 1 ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
iii Dedication v Contents vi List of Tables x List of Figures xi Acknowledgments xiii 1
Weighted Probabilistic Sum Model based on Decision Tree Decomposition for Text Chunking
, 2001
"... ..."
A Statistical Approach to Extract Chinese Chunk Candidates From Large Corpora
, 2003
"... The extraction of Chunk candidates from real corpora is one of the fundamental tasks of building example-based machine translation model. This paper presents a statistical approach to extract Chinese chunk candidates from large monolingual corpora. The first step is to extract large N-grams (up to 2 ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The extraction of Chunk candidates from real corpora is one of the fundamental tasks of building example-based machine translation model. This paper presents a statistical approach to extract Chinese chunk candidates from large monolingual corpora. The first step is to extract large N-grams (up to 20-gram) from raw corpus. Then two newly proposed Fast Statistical Substring Reduction (FSSR) algorithms can be applied to the initial N-gram set to remove some unnecessary N-grams using their frequency information. The two algorithms are effcient (both have a time complexity of O(n)) and can e#ectively reduce the size of N-gram set up to 50%. Finally, mutual information is used to obtain chunk candidates from reduced N-gram set.

