Results 1 - 10
of
94
Incorporating non-local information into information extraction systems by gibbs sampling
- In ACL
, 2005
"... Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, ..."
Abstract
-
Cited by 192 (15 self)
- Add to MetaCart
Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consistency constraints. This technique results in an error reduction of up to 9 % over state-of-the-art systems on two established information extraction tasks. 1
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
- Proceedings of CoNLL-2003
, 2003
"... ..."
Wide-coverage efficient statistical parsing with CCG and log-linear models
- COMPUTATIONAL LINGUISTICS
, 2007
"... This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminativ ..."
Abstract
-
Cited by 87 (20 self)
- Add to MetaCart
This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in the training data as well as the correct parse. The lexicalized grammar formalism used is Combinatory Categorial Grammar (CCG), and the grammar is automatically extracted from CCGbank, a CCG version of the Penn Treebank. The combination of discriminative training and an automatically extracted grammar leads to a significant memory requirement (over 20 GB), which is satisfied using a parallel implementation of the BFGS optimisation algorithm running on a Beowulf cluster. Dynamic programming over a packed chart, in combination with the parallel implementation, allows us to solve one of the largest-scale estimation problems in the statistical parsing literature in under three hours. A key component of the parsing system, for both training and testing, is a Maximum Entropy supertagger which assigns CCG lexical categories to words in a sentence. The supertagger makes the discriminative training feasible, and also leads to a highly efficient parser. Surprisingly,
Named Entity Recognition through Classifier Combination
- IN PROCEEDINGS OF CONLL-2003
, 2003
"... This paper presents a classifier-combination experimental framework for named entity recognition in which four diverse classifiers (robust linear classifier, maximum entropy, transformation-based learning, and hidden Markov model) are combined under different conditions. When no gazetteer or o ..."
Abstract
-
Cited by 61 (4 self)
- Add to MetaCart
This paper presents a classifier-combination experimental framework for named entity recognition in which four diverse classifiers (robust linear classifier, maximum entropy, transformation-based learning, and hidden Markov model) are combined under different conditions. When no gazetteer or other additional training resources are used, the combined system attains a performance of 91.6F on the English development data; integrating name, location and person gazetteers, and named entity systems trained on additional, more general, data reduces the F-measure error by a factor of 15 to 21% on the English data.
Named Entity Recognition using an HMM-based Chunk Tagger
, 2002
"... This paper proposes an HMM-based chunk tagger, from which a named entity recognition system is built to combine four internal and external evidences: 1) simple internal feature such as capitalization and digitalization; 2) internal semantic feature of important triggers; 3) internal gazetteer fea ..."
Abstract
-
Cited by 46 (4 self)
- Add to MetaCart
This paper proposes an HMM-based chunk tagger, from which a named entity recognition system is built to combine four internal and external evidences: 1) simple internal feature such as capitalization and digitalization; 2) internal semantic feature of important triggers; 3) internal gazetteer feature; 4) external macro context feature.
Efficient Support Vector Classifiers for Named Entity Recognition
- In Proceedings of the 19th International Conference on Computational Linguistics (COLING’02
, 2002
"... proper nouns and numerical information are extracted from documents and are classified into categories such as person, organization, and date. It is a key technology of Information Extraction and Open-Domain Question Answering. First, we show that an NE recognizer based on Support Vector Machines (S ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
proper nouns and numerical information are extracted from documents and are classified into categories such as person, organization, and date. It is a key technology of Information Extraction and Open-Domain Question Answering. First, we show that an NE recognizer based on Support Vector Machines (SVMs) gives better scores than conventional systems. However, off-the-shelf SVM classifiers are too inefficient for this task. Therefore, we present a method that makes the system substantially faster. This approach can also be applied to other similar tasks such as chunking and part-of-speech tagging. We also present an SVM-based feature selection method and an efficient training method.
An effective two-stage model for exploiting non-local dependencies in named entity recognition
- In ACL-COLING’06: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
, 2006
"... This paper shows that a simple two-stage approach to handle non-local dependencies in Named Entity Recognition (NER) can outperform existing approaches that handle non-local dependencies, while being much more computationally efficient. NER systems typically use sequence models for tractable inferen ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
This paper shows that a simple two-stage approach to handle non-local dependencies in Named Entity Recognition (NER) can outperform existing approaches that handle non-local dependencies, while being much more computationally efficient. NER systems typically use sequence models for tractable inference, but this makes them unable to capture the long distance structure present in text. We use a Conditional Random Field (CRF) based NER system using local features to make predictions and then train another CRF which uses both local information and features extracted from the output of the first CRF. Using features capturing non-local dependencies from the same document, our approach yields a 12.6 % relative error reduction on the F1 score, over state-of-theart NER systems using local-information alone, when compared to the 9.3 % relative error reduction offered by the best systems that exploit non-local information. Our approach also makes it easy to incorporate non-local information from other documents in the test corpus, and this gives us a 13.3 % error reduction over NER systems using local-information alone. Additionally, our running time for inference is just the inference time of two sequential CRFs, which is much less than that of other more complicated approaches that directly model the dependencies and do approximate inference. 1
Effective adaptation of a hidden Markov model-based named entity recognizer for biomedical domain
- In: Proceedings of NLP in Biomedicine, ACL
, 2003
"... In this paper, we explore how to adapt a general Hidden Markov Model-based named entity recognizer effectively to biomedical domain. We integrate various features, including simple deterministic features, morphological features, POS features and semantic trigger features, to capture various evidence ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
In this paper, we explore how to adapt a general Hidden Markov Model-based named entity recognizer effectively to biomedical domain. We integrate various features, including simple deterministic features, morphological features, POS features and semantic trigger features, to capture various evidences especially for biomedical named entity and evaluate their contributions. We also present a simple algorithm to solve the abbreviation problem and a rule-based method to deal with the cascaded phenomena in biomedical domain. Our experiments on GENIA V3.0 and GENIA V1.1 achieve the 66.1 and 62.5 F-measure respectively, which outperform the previous best published results by 8.1 F-measure when using the same training and testing data. 1
Named entity recognition: a maximum entropy approach using global information
- In Proceedings of COLING02
, 2002
"... This paper presents a maximum entropy-based named entity recognizer (NER). It differs from previous machine learning-based NERs in that it uses information from the whole document to classify each word, with just one classifier. Previous work that involves the gathering of information from the whole ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
This paper presents a maximum entropy-based named entity recognizer (NER). It differs from previous machine learning-based NERs in that it uses information from the whole document to classify each word, with just one classifier. Previous work that involves the gathering of information from the whole document often uses a secondary classifier, which corrects the mistakes of a primary sentencebased classifier. In this paper, we show that the maximum entropy framework is able to make use of global information directly, and achieves performance that is comparable to the best previous machine learning-based NERs on MUC-6 and MUC-7 test data. 1
A Shallow Text Processing Core Engine
- Computational Intelligence
, 2002
"... We present 1 sppc, a high-performance system for intelligent extraction of structured data from free text documents. sppc consists of a set of domain-adaptive shallow core components which are realized by means of cascaded weighted finite state machines and generic dynamic tries. The system has ..."
Abstract
-
Cited by 20 (11 self)
- Add to MetaCart
We present 1 sppc, a high-performance system for intelligent extraction of structured data from free text documents. sppc consists of a set of domain-adaptive shallow core components which are realized by means of cascaded weighted finite state machines and generic dynamic tries. The system has been fully implemented for German which includes morphological and on-line compound analysis, e#cient POS-filtering, high performance named entity recognition and chunk parsing based on a novel divideand -conquer strategy. The whole approach proved to be very useful for processing of free word order languages like German. sppc has a good performance (more than 6000 words per second on standard PC environments) and achieves high linguistic coverage. Especially for the divide-and-conquer parsing strategy we obtained an f-measure of 87.14% on unseen data. Key words: natural language processing, shallow free text processing, German language, finite-state technology, information extract...

