Results 1 - 10
of
91
Functional Analysis and
- Semi-Groups, Amer. Math. Soc. Colloq. Publ
, 1957
"... In this paper we describe a Cross Document Summarizer XDoX designed specifically to summarize large document sets (50-500 documents and more). Such sets of documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX works by id ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
In this paper we describe a Cross Document Summarizer XDoX designed specifically to summarize large document sets (50-500 documents and more). Such sets of documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX works by identifying the most salient themes within the set (at the granularity level that is regulated by the user) and composing an extraction summary, which reflects these main themes. In the current version, XDoX is not optimized to produce a summary based on a few unrelated documents; indeed, such summaries are best obtained simply by concatenating summaries of individual documents. We show examples of summaries obtained in our tests as well as from our participation in the first Document
Distinguishing Word Senses in Untagged Text
- In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing
"... This paper describes an experimental com- parison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. ..."
Abstract
-
Cited by 59 (15 self)
- Add to MetaCart
This paper describes an experimental com- parison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text.
Wikify!: linking documents to encyclopedic knowledge
- In CIKM ’07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
, 2007
"... This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system ..."
Abstract
-
Cited by 57 (3 self)
- Add to MetaCart
This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system able to automatically enrich a text with links to encyclopedic knowledge. Given an input document, the system identifies the important concepts in the text and automatically links these concepts to the corresponding Wikipedia pages. Evaluations of the system show that the automatic annotations are reliable and hardly distinguishable from manual annotations. providing the users a quick way of accessing additional information. Wikipedia contributors perform these annotations by hand following a Wikipedia“manual of style,”which gives guidelines concerning the selection of important concepts in a text, as well as the assignment of links to appropriate related articles. For instance, Figure 1 shows an example of a Wikipedia page, including the definition for one of the meanings of the word “plant.”
A statistical model for multilingual entity detection and tracking
- In NAACL/HLT
, 2004
"... Entity detection and tracking is a relatively new addition to the repertoire of natural language tasks. In this paper, we present a statistical language-independent framework for identifying and tracking named, nominal and pronominal references to entities within unrestricted text documents, and cha ..."
Abstract
-
Cited by 53 (11 self)
- Add to MetaCart
Entity detection and tracking is a relatively new addition to the repertoire of natural language tasks. In this paper, we present a statistical language-independent framework for identifying and tracking named, nominal and pronominal references to entities within unrestricted text documents, and chaining them into clusters corresponding to each logical entity present in the text. Both the mention detection model and the novel entity tracking model can use arbitrary feature types, being able to integrate a wide array of lexical, syntactic and semantic features. In addition, the mention detection model crucially uses feature streams derived from different named entity classifiers. The proposed framework is evaluated with several experiments run in Arabic, Chinese and English texts; a system based on the approach described here and submitted to the latest Automatic Content Extraction (ACE) evaluation achieved top-tier results in all three evaluation languages. 1
An Automatic Method for Generating Sense Tagged Corpora
- IN PROCEEDINGS OF AAAI-99
, 1999
"... The unavailability of very large corpora with semantically disambiguated words is a major limitation in text processing research. For example, statistical methods for word sense disambiguation of free text are known to achieve high accuracy results when large corpora are available to develop c ..."
Abstract
-
Cited by 49 (6 self)
- Add to MetaCart
The unavailability of very large corpora with semantically disambiguated words is a major limitation in text processing research. For example, statistical methods for word sense disambiguation of free text are known to achieve high accuracy results when large corpora are available to develop context rules, to train and test them. This paper presents a novel approach to automatically generate arbitrarily large corpora for word senses. The method is based on (1) the information provided in WordNet, used to formulate queries consisting of synonyms or denitions of word senses, and (2) the information gathered from Internet using existing search engines. The method was tested on 120 word senses and a precision of 91% was observed.
The Senseval-3 English lexical sample task
- In Proceedings of Senseval-3: The Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text
, 2004
"... This paper presents the task definition, resources, participating systems, and comparative results for the English lexical sample task, which was organized as part of the SENSEVAL-3 evaluation exercise. The task drew the participation of 27 teams from around the world, with a total of 47 systems. 1 ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
This paper presents the task definition, resources, participating systems, and comparative results for the English lexical sample task, which was organized as part of the SENSEVAL-3 evaluation exercise. The task drew the participation of 27 teams from around the world, with a total of 47 systems. 1
High Performance Question/Answering
, 2001
"... In this paper we present the features of a Question/Answering (Q/A) system that had unparalleled performance in the TREC-9 evaluations. We explain the accuracy of our system through the unique characteristics of its architecture: (1) usage of a wide-coverage answer type taxonomy; (2) repeated passa ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
In this paper we present the features of a Question/Answering (Q/A) system that had unparalleled performance in the TREC-9 evaluations. We explain the accuracy of our system through the unique characteristics of its architecture: (1) usage of a wide-coverage answer type taxonomy; (2) repeated passage retrieval; (3) lexico-semantic feedback loops; (4) extraction of the answers based on machine learning techniques; and (5) answer caching. Experimental results show the eects of each feature on the overall performance of the Q/A system and lead to general conclusions about Q/A from large text collections.
Word sense and subjectivity
- In: Proc. ACL-06
, 2006
"... Subjectivity and meaning are both important properties of language. This paper explores their interaction, and brings empirical evidence in support of the hypotheses that (1) subjectivity is a property that can be associated with word senses, and (2) word sense disambiguation can directly benefit fr ..."
Abstract
-
Cited by 26 (9 self)
- Add to MetaCart
Subjectivity and meaning are both important properties of language. This paper explores their interaction, and brings empirical evidence in support of the hypotheses that (1) subjectivity is a property that can be associated with word senses, and (2) word sense disambiguation can directly benefit from subjectivity annotations. 1
SiteQ: Engineering High Performance QA system Using Lexico-Semantic Pattern Matching and Shallow NLP
- In Proceedings of the Tenth Text REtrieval Conference (TREC
, 2001
"... this paper. 2. QA track: Systems and Experiences In TRECqo, the QA track consisted of three separate tasks: the main task, the list task and the context task. We participated in only the main task. The main task is similar to the task in previous QA tracks (TREC-8, TREC-9). NIST provided 500 quest ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
this paper. 2. QA track: Systems and Experiences In TRECqo, the QA track consisted of three separate tasks: the main task, the list task and the context task. We participated in only the main task. The main task is similar to the task in previous QA tracks (TREC-8, TREC-9). NIST provided 500 questions that seek short, fact-based answers. Some questions may not have a known answer in the document collection. In that case, the response string "NIL" is judged correct. This differs from the previous QA tracks and makes the task somewhat more difficult. The answer-string should contain no more than 50 bytes; 25o-byte runs were abandoned this year. Participants must return at least one and no more than five responses per question ranked by preferences
unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling
- In HLT/EMNLP 2005
, 2005
"... This paper introduces a graph-based algorithm for sequence data labeling, using random walks on graphs encoding label dependencies. The algorithm is illustrated and tested in the context of an unsupervised word sense disambiguation problem, and shown to significantly outperform the accuracy achieved ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
This paper introduces a graph-based algorithm for sequence data labeling, using random walks on graphs encoding label dependencies. The algorithm is illustrated and tested in the context of an unsupervised word sense disambiguation problem, and shown to significantly outperform the accuracy achieved through individual label assignment, as measured on standard senseannotated data sets. 1

