Results 1 - 10
of
23
Named Entity Recognition through Classifier Combination
- IN PROCEEDINGS OF CONLL-2003
, 2003
"... This paper presents a classifier-combination experimental framework for named entity recognition in which four diverse classifiers (robust linear classifier, maximum entropy, transformation-based learning, and hidden Markov model) are combined under different conditions. When no gazetteer or o ..."
Abstract
-
Cited by 61 (4 self)
- Add to MetaCart
This paper presents a classifier-combination experimental framework for named entity recognition in which four diverse classifiers (robust linear classifier, maximum entropy, transformation-based learning, and hidden Markov model) are combined under different conditions. When no gazetteer or other additional training resources are used, the combined system attains a performance of 91.6F on the English development data; integrating name, location and person gazetteers, and named entity systems trained on additional, more general, data reduces the F-measure error by a factor of 15 to 21% on the English data.
Memory-Based Shallow Parsing
- Journal of Machine Learning Research
, 2002
"... We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improvin ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and the results are compared with that of other systems. This reveals that our approach works well for base phrase identification while its application towards recognizing embedded structures leaves some room for improvement.
Learning Sentence-internal Temporal Relations
- In Journal of AI Research
, 2006
"... In this paper we propose a data intensive approach for inferring sentence-internal temporal relations. Temporal inference is relevant for practical NLP applications which either extract or synthesize temporal information (e.g., summarisation, question answering). Our method bypasses the need for man ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
In this paper we propose a data intensive approach for inferring sentence-internal temporal relations. Temporal inference is relevant for practical NLP applications which either extract or synthesize temporal information (e.g., summarisation, question answering). Our method bypasses the need for manual coding by exploiting the presence of markers like after, which overtly signal a temporal relation. We first show that models trained on main and subordinate clauses connected with a temporal marker achieve good performance on a pseudo-disambiguation task simulating temporal inference (during testing the temporal marker is treated as unseen and the models must select the right marker from a set of possible candidates). Secondly, we assess whether the proposed approach holds promise for the semi-automatic creation of temporal annotations. Specifically, we use a model trained on noisy and approximate data (i.e., main and subordinate clauses) to predict intra-sentential relations present in TimeBank, a corpus annotated rich temporal information. Our experiments compare and contrast several probabilistic models differing in their feature space, linguistic assumptions and data requirements. We evaluate performance against gold standard corpora and also against human subjects. 1.
A Machine Learning Approach to Modeling Scope Preferences
- Computational Linguistics
, 2003
"... This article describes a corpus-based investigation of quantifier scope preferences. Following recent work on multimodular grammar frameworks in theoretical linguistics and a long history of combining multiple information sources in natural language processing, scope is treated as a distinct module ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This article describes a corpus-based investigation of quantifier scope preferences. Following recent work on multimodular grammar frameworks in theoretical linguistics and a long history of combining multiple information sources in natural language processing, scope is treated as a distinct module of grammar from syntax. This module incorporates multiple sources of evidence regarding the most likely scope reading for a sentence and is entirely data-driven. The experiments discussed in this article evaluate the performance of our models in predicting the most likely scope reading for a particular sentence, using Penn Treebank data both with and without syntactic annotation. We wish to focus attention on the issue of determining scope preferences, which has largely been ignored in theoretical linguistics, and to explore different models of the interaction between syntax and quantifier scope
Blueprint for a High Performance NLP Infrastructure
, 2003
"... Natural Language Processing (NLP) system developers face a number of new challenges. Interest is increasing for real-world systems that use NLP tools and techniques. The quantity of text now available for training and processing is increasing dramatically. Also, the range of languages and task ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Natural Language Processing (NLP) system developers face a number of new challenges. Interest is increasing for real-world systems that use NLP tools and techniques. The quantity of text now available for training and processing is increasing dramatically. Also, the range of languages and tasks being researched continues to grow rapidly. Thus it is an ideal time to consider the development of new experimental frameworks. We describe the requirements, initial design and exploratory implementation of a high performance NLP infrastructure.
Feature-rich memory-based classification for shallow nlp and information extraction
- In Text Mining. Theoretical aspects and applications. Springer LCNS series
, 2003
"... Abstract. Memory-Based Learning (MBL) is based on the storage of all available training data, and similarity-based reasoning for handling new cases. By interpreting tasks such as POS tagging and shallow parsing as classification tasks, the advantages of MBL (implicit smoothing of sparse data, automa ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract. Memory-Based Learning (MBL) is based on the storage of all available training data, and similarity-based reasoning for handling new cases. By interpreting tasks such as POS tagging and shallow parsing as classification tasks, the advantages of MBL (implicit smoothing of sparse data, automatic integration and relevance weighting of information sources, handling exceptional data) contribute to state-of-the-art accuracy. However, Hidden Markov Models (HMM) typically achieve higher accuracy than MBL (and other Machine Learning approaches) for tasks such as POS tagging and chunking. In this paper, we investigate how the advantages of MBL, such as its potential to integrate various sources of information, come to play when we compare our approach to HMMs on two Information Extraction (IE) datasets: the well-known Seminar Announcement data set and a new German Curriculum Vitae data set. 1 Memory-Based Language Processing Memory-Based Learning (MBL) is a supervised classification-based learning method. A vector of feature values (an instance) is associated with a class by a
HowtogetaChineseName(Entity): Segmentation and Combination Issues
- In Proceedings of EMNLP’03
, 2003
"... When building a Chinese named entity recognition system, one must deal with certain language-specific issues such as whether the model should be based on characters or words. While there is no unique answer to this question, we discuss in detail advantages and disadvantages of each model, ide ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
When building a Chinese named entity recognition system, one must deal with certain language-specific issues such as whether the model should be based on characters or words. While there is no unique answer to this question, we discuss in detail advantages and disadvantages of each model, identify problems in segmentation and suggest possible solutions, presenting our observations, analysis, and experimental results. The second topic of this paper is classifier combination.
Combining classifiers for spoken language understanding
- in Proc. ASRU
, 2003
"... We are interested in the problem of understanding spontaneous speech in the context of human-machine dialogs. Utterance classification is a key component of the understanding process to determine the intent of the user. This paper presents methods for combining different statistical classifiers for ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We are interested in the problem of understanding spontaneous speech in the context of human-machine dialogs. Utterance classification is a key component of the understanding process to determine the intent of the user. This paper presents methods for combining different statistical classifiers for spoken language understanding. We propose three combination methods. The first one combines the scores assigned to the call-types by individual classifiers using a voting mechanism. The second method is a cascaded approach. The third method employs a top level learner to decide on the final call-type. We have evaluated these combination methods over three large spoken dialog databases collected (¤¦¥¨§� © dialogs) using the AT&T natural spoken dialog system for customer care applications. The results indicate that it is possible to significantly reduce the error rate of the understanding module using these combination methods. 1.
Detecting Errors in Corpora Using Support Vector Machines
- In COLING-2002
, 2002
"... While the corpus-based research relies on human annotated corpora, it is often said that a non-negligible amount of errors remain even in frequently used corpora such as Penn Treebank. Detection of errors in annotated corpora is important for corpus-based natural language processing. In this paper, ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
While the corpus-based research relies on human annotated corpora, it is often said that a non-negligible amount of errors remain even in frequently used corpora such as Penn Treebank. Detection of errors in annotated corpora is important for corpus-based natural language processing. In this paper, we propose a method to detect errors in corpora using support vector machines (SVMs). This method is based on the idea of extracting exceptional elements that violate consistency. We propose a method of using SVMs to assign a weight to each element and to find errors in a POS tagged corpus. We apply the method to English and Japanese POS-tagged corpora and achieve high precision in detecting errors.

