Results 1 - 10
of
18
Discovering Word Senses from Text
- In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining
, 2002
"... Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text ..."
Abstract
-
Cited by 159 (10 self)
- Add to MetaCart
Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning words to their most similar clusters. After assigning an element to a cluster, we remove their overlapping features from the element. This allows CBC to discover the less frequent senses of a word and to avoid discovering duplicate senses. Each cluster that a word belongs to represents one of its senses. We also present an evaluation methodology for automatically measuring the precision and recall of discovered senses. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval---Clustering.
Espresso: Leveraging generic patterns for automatically harvesting semantic relations
, 2006
"... In this paper, we present Espresso, a weakly-supervised, general-purpose, and accurate algorithm for harvesting semantic relations. The main contributions are: i) a method for exploiting generic patterns by filtering incorrect instances using the Web; and ii) a principled measure of pattern and inst ..."
Abstract
-
Cited by 80 (1 self)
- Add to MetaCart
In this paper, we present Espresso, a weakly-supervised, general-purpose, and accurate algorithm for harvesting semantic relations. The main contributions are: i) a method for exploiting generic patterns by filtering incorrect instances using the Web; and ii) a principled measure of pattern and instance reliability enabling the filtering algorithm. We present an empirical comparison of Espresso with various state of the art systems, on different size and genre corpora, on extracting various general and specific relations. Experimental results show that our exploitation of generic patterns substantially increases system recall with small effect on overall precision. 1
Performance Issues and Error Analysis in an Open-Domain Question Answering System
- ACM Trans. Inf. Syst
, 2002
"... This paper presents an in-depth analysis of a state-of-the-art Question Answering system. Several scenarios are examined: (1) the performance of each module in a serial baseline system, (2) the impact of feedbacks and the insertion of a logic prover, and (3) the impact of various lexical resources. ..."
Abstract
-
Cited by 59 (2 self)
- Add to MetaCart
This paper presents an in-depth analysis of a state-of-the-art Question Answering system. Several scenarios are examined: (1) the performance of each module in a serial baseline system, (2) the impact of feedbacks and the insertion of a logic prover, and (3) the impact of various lexical resources. The main conclusion is that the overall performance depends on the depth of natural language processing resources and the tools used for answer finding.
Concept Discovery from Text
- In Proceedings of Conference on Computational Linguistics
, 2002
"... WordNet are extremely useful. However, they often include many rare senses while missing domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers concepts from text. It initially discovers a set of tight clusters called commit ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
WordNet are extremely useful. However, they often include many rare senses while missing domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers concepts from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning elements to their most similar cluster. Evaluating cluster quality has always been a difficult task. We present a new evaluation methodology that is based on the editing distance between output clusters and classes extracted from WordNet (the answer key). Our experiments show that CBC outperforms several well-known clustering algorithms in cluster quality.
AquaLog: An Ontology-portable Question Answering System for the Semantic Web
- In Proceedings of ESWC
, 2005
"... Abstract. As semantic markup becomes ubiquitous, it will become important to be able to ask queries and obtain answers, using natural language (NL) expressions, rather than the keyword-based retrieval mechanisms used by the current search engines. AquaLog is a portable question-answering system whic ..."
Abstract
-
Cited by 45 (23 self)
- Add to MetaCart
Abstract. As semantic markup becomes ubiquitous, it will become important to be able to ask queries and obtain answers, using natural language (NL) expressions, rather than the keyword-based retrieval mechanisms used by the current search engines. AquaLog is a portable question-answering system which takes queries expressed in natural language and an ontology as input and returns answers drawn from the available semantic markup. We say that AquaLog is portable, because the configuration time required to customize the system for a particular ontology is negligible. AquaLog combines several powerful techniques in a novel way to make sense of NL queries and to map them to semantic markup. Moreover it also includes a learning component, which ensures that the performance of the system improves over time, in response to the particular community jargon used by the end users. In this paper we describe the current version of the system, in particular discussing its portability, its reasoning capabilities, and its learning mechanism. 1
A Noisy-Channel Approach to Question Answering
, 2003
"... We introduce a probabilistic noisychannel model for question answering and we show how it can be exploited in the context of an end-to-end QA system. Our noisy-channel system outperforms a stateof -the-art rule-based QA system that uses similar resources. We also show that the model we propos ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
We introduce a probabilistic noisychannel model for question answering and we show how it can be exploited in the context of an end-to-end QA system. Our noisy-channel system outperforms a stateof -the-art rule-based QA system that uses similar resources. We also show that the model we propose is flexible enough to accommodate within one mathematical framework many QA-specific resources and techniques, which range from the exploitation of WordNet, structured, and semi-structured databases to reasoning, and paraphrasing.
AquaLog: An ontology-driven Question Answering system as an interface to the Semantic Web
"... The semantic web vision is one in which rich, ontology-based semantic markup will become widely available. The availability of semantic markup on the web opens the way to novel, sophisticated forms of question answering. AquaLog is a portable question-answering system which takes queries expressed i ..."
Abstract
-
Cited by 40 (20 self)
- Add to MetaCart
The semantic web vision is one in which rich, ontology-based semantic markup will become widely available. The availability of semantic markup on the web opens the way to novel, sophisticated forms of question answering. AquaLog is a portable question-answering system which takes queries expressed in natural language and an ontology as input, and returns answers drawn from one or more knowledge bases (KBs). We say that AquaLog is portable because the configuration time required to customize the system for a particular ontology is negligible. AquaLog presents an elegant solution in which different strategies are combined together in a novel way. It makes use of the GATE NLP platform, string metric algorithms, WordNet and a novel ontology-based relation similarity service to make sense of user queries with respect to the target KB. Moreover it also includes a learning component, which ensures that the performance of the system improves over the time, in response to the particular community jargon used by end users.
BalkaNet: Aims, Methods, Results and Perspectives. A General Overview
- In: D. Tufiş (ed): Special Issue on BalkaNet. Romanian Journal on Science and Technology of Information
"... Abstract. BalkaNet is an EC funded project (IST-2000-29388) that started in September 2001 and will end in August 2004. It aims at developing [109] aligned wordnets for the following Balkan languages: Bulgarian, Greek, Romanian, Serbian, Turkish and to extend the Czech wordnet previously developed i ..."
Abstract
-
Cited by 32 (14 self)
- Add to MetaCart
Abstract. BalkaNet is an EC funded project (IST-2000-29388) that started in September 2001 and will end in August 2004. It aims at developing [109] aligned wordnets for the following Balkan languages: Bulgarian, Greek, Romanian, Serbian, Turkish and to extend the Czech wordnet previously developed in the EuroWordNet project. BalkaNet project has insofar delivered many useful results in the fields of both Computational Lexicography and Natural Language Processing. However, most of these results have been only partially disseminated in different conferences and journals. This is the first attempt to provide an overall description of the findings, methodologies and results of the project as well as a detailed account on each monolingual wordnet. The paper also presents the freeware multilingual tools designed for the development, maintenance and efficient exploitation of the aligned BalkaNet wordnets. A preliminary approach on BalkaNet’s application towards indexing Web documents and Information Retrieval is described, following the consideration that semantic networks are valuable in the context of real world systems and user communities. Last but not least, a rather thorough analyses of wordnet applications over the last years is intended to put in evidence the hottest themes for further developments based on wordnets. The ultimate objective of this contribution is to spread the knowledge and experience that we have acquired, to the benefit of the research and industrial communities. We also hope that our shared experience will be helpful for other wordnet-builders. 10 D. Tufi¸s, D. Cristea, S. Stamou 1.
Fine-Grained Proper Noun Ontologies for Question Answering
, 2002
"... The WordNet lexical ontology, which is primarily composed of common nouns, has been widely used in retrieval tasks. ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
The WordNet lexical ontology, which is primarily composed of common nouns, has been widely used in retrieval tasks.
Clustering by Committee
, 2003
"... children, the narratives that capture our thoughts, and the stories that shape our world. In this work, we present some recent advances in automatically acquiring knowledge from text. We propose a general- purpose clustering algorithm called CBC (Clustering By Committee) from which we will organiz ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
children, the narratives that capture our thoughts, and the stories that shape our world. In this work, we present some recent advances in automatically acquiring knowledge from text. We propose a general- purpose clustering algorithm called CBC (Clustering By Committee) from which we will organize documents according to topics as well as discover concepts and word senses. We will explore the value of these systems by experimenting with two novel evaluation methodologies that attempt to define what a word sense is and define the quality of a particular clustering.

