Results 1 - 10
of
150
Discovery of Inference Rules for Question Answering
- Natural Language Engineering
, 2001
"... One of the main challenges in question-answering is the potential mismatch between the expressions in questions and the expressions in texts. While humans appear to use inference rules such as “X writes Y ” implies “X is the author of Y ” in answering questions, such rules are generally unavailable ..."
Abstract
-
Cited by 190 (3 self)
- Add to MetaCart
One of the main challenges in question-answering is the potential mismatch between the expressions in questions and the expressions in texts. While humans appear to use inference rules such as “X writes Y ” implies “X is the author of Y ” in answering questions, such rules are generally unavailable to question-answering systems due to the inherent difficulty in constructing them. In this paper, we present an unsupervised algorithm for discovering inference rules from text. Our algorithm is based on an extended version of Harris ’ Distributional Hypothesis, which states that words that occurred in the same contexts tend to be similar. Instead of using this hypothesis on words, we apply it to paths in the dependency trees of a parsed corpus. Essentially, if two paths tend to link the same set of words, we hypothesize that their meanings are similar. We use examples to show that our system discovers many inference rules easily missed by humans. 1
Introduction to the special issue on word sense disambiguation
- Computational Linguistics J
, 1998
"... ..."
Learning words from sights and sounds: a computational model
, 2002
"... This paper presents an implemented computational model of word acquisition which learns directly from raw multimodal sensory input. Set in an information theoretic framework, the model acquires a lexicon by finding and statistically modeling consistent cross-modal structure. The model has been imple ..."
Abstract
-
Cited by 182 (29 self)
- Add to MetaCart
This paper presents an implemented computational model of word acquisition which learns directly from raw multimodal sensory input. Set in an information theoretic framework, the model acquires a lexicon by finding and statistically modeling consistent cross-modal structure. The model has been implemented in a system using novel speech processing, computer vision, and machine learning algorithms. In evaluations the model successfully performed speech segmentation, word discovery and visual categorization from spontaneous infant-directed speech paired with video images of single objects. These results demonstrate the possibility of using state-of-the-art techniques from sensory pattern recognition and machine learning to implement cognitive models which can process raw sensor data without the need for human transcription or labeling.
Discovering Word Senses from Text
- In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining
, 2002
"... Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text ..."
Abstract
-
Cited by 159 (10 self)
- Add to MetaCart
Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning words to their most similar clusters. After assigning an element to a cluster, we remove their overlapping features from the element. This allows CBC to discover the less frequent senses of a word and to avoid discovering duplicate senses. Each cluster that a word belongs to represents one of its senses. We also present an evaluation methodology for automatically measuring the precision and recall of discovered senses. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval---Clustering.
DIRT - Discovery of Inference Rules from Text
- In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining
, 2001
"... In this paper, we propose an unsupervised method for discovering inference rules from text, such as "X is author of Y X wrote Y", "X solved Y X found a solution to Y", and "X caused Y Y is triggered by X". Inference rules are extremely important in many fields such as natural language processing, in ..."
Abstract
-
Cited by 110 (2 self)
- Add to MetaCart
In this paper, we propose an unsupervised method for discovering inference rules from text, such as "X is author of Y X wrote Y", "X solved Y X found a solution to Y", and "X caused Y Y is triggered by X". Inference rules are extremely important in many fields such as natural language processing, information retrieval, and artificial intelligence in general. Our algorithm is based on an extended version of Harris's Distributional Hypothesis, which states that words that occurred in the same contexts tend to be similar. Instead of using this hypothesis on words, we apply it to paths in the dependency trees of a parsed corpus.
An efficient, probabilistically sound algorithm for segmentation and word discovery
- MACHINE LEARNING
, 1999
"... This paper presents a model-based, unsupervised algorithm for recovering word boundaries in a natural-language text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstract ..."
Abstract
-
Cited by 103 (2 self)
- Add to MetaCart
This paper presents a model-based, unsupervised algorithm for recovering word boundaries in a natural-language text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstractly so that the detailed component models of phonology, word-order, and word frequency can be replaced in a modular fashion. The model yields a language-independent, prior probability distribution on all possible sequences of all possible words over a given alphabet, based on the assumption that the input was generated by concatenating words from a fixed but unknown lexicon. The model is unusual in that it treats the generation of a complete corpus, regardless of length, as a single event in the probability space. Accordingly, the algorithm does not estimate a probability distribution on words; instead, it attempts to calculate the prior probabilities of various word sequences that could underlie the observed text. Experiments on phonemic transcripts of spontaneous speech by parents to young children suggest that our algorithm is more effective than other proposed algorithms, at least when utterance boundaries are given and the text includes a substantial number of short utterances.
Word sense disambiguation: The state of the art
- Computational Linguistics
, 1998
"... The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950's. Sense disambiguation is an “intermediate task ” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or ano ..."
Abstract
-
Cited by 92 (3 self)
- Add to MetaCart
The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950's. Sense disambiguation is an “intermediate task ” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or another to accomplish most natural language processing tasks. It is
Distributional Information: A Powerful Cue for Acquiring Syntactic Categories
- Cognitive Science
, 1998
"... Many theorists have dismissed a priori the idea that distributional information could play a significant role in syntactic category acquisition. We demonstrate empirically that such information provides a powerful cue to syntactic category membership, which can be exploited by a variety of simple, p ..."
Abstract
-
Cited by 86 (2 self)
- Add to MetaCart
Many theorists have dismissed a priori the idea that distributional information could play a significant role in syntactic category acquisition. We demonstrate empirically that such information provides a powerful cue to syntactic category membership, which can be exploited by a variety of simple, psychologically plausible mechanisms. We present a range of results using a large corpus of child-directed speech and explore their psychological implications. While our results show that a considerable amount of information concerning the syntac-tic categories can be obtained from distributional information alone, we stress that many other sources of information may also be potential contributors to the identification of syntactic classes. I.
Distributional Regularity and Phonotactic Constraints are Useful for Segmentation
- Cognition
, 1996
"... In order to acquire a lexicon, young children must segment speech into words, even though most words are unfamiliar to them. This is a non-trivial task because speech lacks any acoustic analog of the blank spaces between printed words. Two sources of information that might be useful for this task ar ..."
Abstract
-
Cited by 81 (1 self)
- Add to MetaCart
In order to acquire a lexicon, young children must segment speech into words, even though most words are unfamiliar to them. This is a non-trivial task because speech lacks any acoustic analog of the blank spaces between printed words. Two sources of information that might be useful for this task are distributional regularity and phonotactic constraints. Informally, distributional regularity refers to the intuition that sound sequences that occur frequently and in a variety of contexts are better candidates for the lexicon than those that occur rarely or in few contexts. We express that intuition formally by a class of functions called DR functions. We then put forth three hypotheses: First, that children segment using DR functions. Second, that they exploit phonotactic constraints on the possible pronunciations of words in their language. Specifically, they exploit both the requirement that every word must have a vowel and the constraints that languages impose constraints on word-ini...
Contextual dependencies in unsupervised word segmentation
- In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
, 2006
"... Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies ..."
Abstract
-
Cited by 43 (12 self)
- Add to MetaCart
Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. The bigram model greatly outperforms the unigram model (and previous probabilistic models), demonstrating the importance of such dependencies for word segmentation. We also show that previous probabilistic models rely crucially on suboptimal search procedures. 1

