Results 1 - 10
of
391
Improving Machine Learning Approaches to Coreference Resolution
, 2002
"... We present a noun phrase coreference system that extends the work of Soon et al. (2001) and, to our knowledge, produces the best results to date on the MUC6 and MUC-7 coreference resolution data sets --- F-measures of 70.4 and 63.4, respectively. ..."
Abstract
-
Cited by 333 (24 self)
- Add to MetaCart
We present a noun phrase coreference system that extends the work of Soon et al. (2001) and, to our knowledge, produces the best results to date on the MUC6 and MUC-7 coreference resolution data sets --- F-measures of 70.4 and 63.4, respectively.
A Machine Learning Approach to Coreference Resolution of Noun Phrases
, 2001
"... this paper, we present a learning approach to coreference resolution of noun phrases in unrestricted text. The approach learns from a small, annotated corpus and the task includes resolving not just a certain type of noun phrase (e.g., pronouns) but rather general noun phrases. It also does not rest ..."
Abstract
-
Cited by 270 (3 self)
- Add to MetaCart
this paper, we present a learning approach to coreference resolution of noun phrases in unrestricted text. The approach learns from a small, annotated corpus and the task includes resolving not just a certain type of noun phrase (e.g., pronouns) but rather general noun phrases. It also does not restrict the entity types of the noun phrases; that is, coreference is assigned whether they are of "organization," "person," or other types. We evaluate our approach on common data sets (namely, the MUC-6 and MUC-7 coreference corpora) and obtain encouraging results, indicating that on the general noun phrase coreference task, the learning approach holds promise and achieves accuracy comparable to that of nonlearning approaches. Our system is the first learning-based system that offers performance comparable to that of state-of-the-art nonlearning systems on these data sets
Robust Pronoun Resolution With Limited Knowledge
, 1998
"... Most traditional approaches to anaphora resolution rely heavily on linguistic and domain knowledge. One of the disadvantages of developing a knowledgebased system, however, is that it is a very labourintensive and time-consuming task. This paper presents a robust, knowledge-poor approach to resolvin ..."
Abstract
-
Cited by 185 (7 self)
- Add to MetaCart
Most traditional approaches to anaphora resolution rely heavily on linguistic and domain knowledge. One of the disadvantages of developing a knowledgebased system, however, is that it is a very labourintensive and time-consuming task. This paper presents a robust, knowledge-poor approach to resolving pronouns in technical manuals, which operates on texts pre-processed by a part-of-speech tagger. Input is checked against agreement and for a number of antecedent indicators. Candidates are assigned scores by each indicator and the candidate with the highest score is returned as the antecedent. Evaluation reports a success rate of 89.7% which is better than the suc- cess rates of the approaches selected for comparison and tested on the same data. In addition, preliminary experiments show that the approach can be successfully adapted for other languages with minimum modifications.
Anaphora for everyone: Pronominal anaphora resolution without a parser
- In Proceedings of COLING-96 (16th International Conference on Computational Linguistics
, 1996
"... We present an algorithm for anaphora resolution which is a modified and extended version of that developed by (Lappin and Leass, 1994). In contrast to that work, our algorithm does not require in-depth, full, syntactic parsing of text. Instead, with minimal compromise in output quality, the modifica ..."
Abstract
-
Cited by 164 (11 self)
- Add to MetaCart
(Show Context)
We present an algorithm for anaphora resolution which is a modified and extended version of that developed by (Lappin and Leass, 1994). In contrast to that work, our algorithm does not require in-depth, full, syntactic parsing of text. Instead, with minimal compromise in output quality, the modifications enable the resolution process to work from the output of a part of speech tagger, enriched only with annotations of grammatical function of lexical items in the input text stream. Evaluation of the results of our implementation demonstrates that accurate anaphora resolution can be realized within natural language processing frameworks which do not—or cannot—employ robust and reliable parsing components. 1
A Statistical Approach to Anaphora Resolution
- In Proceedings of the Sixth Workshop on Very Large Corpora
, 1998
"... This paper presents an algorithm for identifying pronominal anaphora and two experiments based upon this algorithm. We incorporate multiple anaphora resolution factors into a statistical framework -- specifically the distance between the pronoun and the proposed antecedent, gender/number/animaticity ..."
Abstract
-
Cited by 161 (4 self)
- Add to MetaCart
This paper presents an algorithm for identifying pronominal anaphora and two experiments based upon this algorithm. We incorporate multiple anaphora resolution factors into a statistical framework -- specifically the distance between the pronoun and the proposed antecedent, gender/number/animaticity of the proposed antecedent, governing head information and noun phrase repetition. We combine them into a single probability that enables hs to identify the referent. Our first experiment shows the relative contribution of each source Of information and demonstrates a success rate of 82.9% for all sources combined. The second experiment investigates a method for unsuper- vised learning of gender/number/animaticity information. We present some experiments illustrating the accuracy of the method and note that with this information added, our pronoun resolution method achieves 84.2% accuracy.
Mining the Biomedical Literature in the Genomic Era: An Overview
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2003
"... The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last f ..."
Abstract
-
Cited by 132 (5 self)
- Add to MetaCart
The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last few years there is a lot of interest within the scientific community in literature-mining tools to help sort through this abundance of literature, and find the nuggets of information most relevant and useful for specific analysis tasks. This paper
Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts
, 1998
"... s Takeshi Sekimizu 1 Hyun S. Park 1 3 Juniichi Tsujii 1 2 sekimizu@is.s.u-tokyo.ac.jp hsp20@is.s.u-tokyo.ac.jp tsujii@is.s.u-tokyo.ac.jp 1 Department of Information Science, University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan. 2 Department of Language Engineering, UMIST, PO B ..."
Abstract
-
Cited by 132 (10 self)
- Add to MetaCart
s Takeshi Sekimizu 1 Hyun S. Park 1 3 Juniichi Tsujii 1 2 sekimizu@is.s.u-tokyo.ac.jp hsp20@is.s.u-tokyo.ac.jp tsujii@is.s.u-tokyo.ac.jp 1 Department of Information Science, University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan. 2 Department of Language Engineering, UMIST, PO Box 88, Manchester M60 1QD, United Kingdom 3 Department of Computer Science, Sungshin Women's University, 249-1 Dongsundong, Sungbuk-gu, Seoul, Korea Abstract We have selected the most frequently seen verbs from raw text retrieved from 1-million-word Medline abstracts, and we were able to identify (or bracket) noun phrases contained in the corpus, with the precision rate of 90%. Then, based on the noun-phrase-bracketted corpus, we tried to find the subject and object terms for some frequently seen verbs in the domain. The precision rate of finding the right subject and object for each verb were about 72.9%. This task could have been only possible because we were able to linguistically ana...
Noun Phrase Coreference as Clustering
, 1999
"... This paper introduces a new, unsupervised algorithm for noun phrase coreference resolution. It differs from existing methods in that it views coreference resolution as a clustering task. In an eval- uation on the MUC-6 coreference resolution cor- pus, the algorithm achieves an F-measure of 53.6% pla ..."
Abstract
-
Cited by 101 (4 self)
- Add to MetaCart
This paper introduces a new, unsupervised algorithm for noun phrase coreference resolution. It differs from existing methods in that it views coreference resolution as a clustering task. In an eval- uation on the MUC-6 coreference resolution cor- pus, the algorithm achieves an F-measure of 53.6% placing it firmly between the worst (40%) and best (65%) systems in the MUC-6 evaluation. More importantly, the clustering approach outperforms the only MUC-6 system to treat coreference resolution as a learning problem. The clustering algorithm appears to provide a flexible mechanism for coordinating the application of context-independent and context-dependent constraints and preferences for accurate partitioning of noun phrases into coreference equivalence classes.
Never Look Back: An Alternative to Centering
, 1998
"... I propose a model for determining the hearer's attentional state which depends solely on a list of salient discourse entities (S-list). The ordering among the elements of the S-list covers also the function of the backward-looking center in the cen- tering model. The ranking criteria for the S- ..."
Abstract
-
Cited by 89 (9 self)
- Add to MetaCart
I propose a model for determining the hearer's attentional state which depends solely on a list of salient discourse entities (S-list). The ordering among the elements of the S-list covers also the function of the backward-looking center in the cen- tering model. The ranking criteria for the S-list are based on the distinction between hearer-old and hearer-new discourse entities and incorporate preferences for inter- and intra-sentential anaphora. The model is the basis for an algorithm which operates incrementally, word by word.
Functional Centering -- Grounding Referential Coherence in Information Structure
- COMPUTATIONAL LINGUISTICS
, 1999
"... this paper gives a comprehensive picture of a complex, yet not explicitly spelled-out theory of discourse coherence, the centering model (Grosz, Joshi, and Weinstein, 1983, 1995) marked a major step in clarifying the relationship between attentional states and (local) discourse segment structure. Mo ..."
Abstract
-
Cited by 76 (2 self)
- Add to MetaCart
this paper gives a comprehensive picture of a complex, yet not explicitly spelled-out theory of discourse coherence, the centering model (Grosz, Joshi, and Weinstein, 1983, 1995) marked a major step in clarifying the relationship between attentional states and (local) discourse segment structure. More precisely, the centering model accounts for the interactions between local coherence and preferential choices of referring expressions. It relates differences in coherence (in part) to varying demands on inferences as required by different types of referring expressions, given a particular attentional state of the hearer in a discourse setting (Grosz, Joshi, and Weinstein 1995, 204-205). The claim is made then that the lower the inference load put on the hearer, the more coherent the underlying discourse appears. The centering model as formulated by Grosz, Joshi, and Weinstein (1995) refines the structure of "centers" of discourse, which are conceived as the representational device for the attentional state at the local level of discourse. They distinguish two basic types of centers, which can be assigned to each utterance Ui--a single backward- looking center, Cb(Ui), and a partially ordered set of discourse entities, the forward- looking centers, Cf(Ui). The ordering on Cf is relevant for determining the Cb. It can be viewed as a salience ranking that reflects the assumption that the higher the ranking of a discourse entity in Cf, the more likely it will be mentioned again in the immediately following utterance. Thus, given an adequate ordering of the discourse entities in Cf, the costs of computations necessary to establish local coherence are minimized