Results 1 - 10
of
31
From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax
-
, 1993
"... ... This paper describes an approach based on two principles. First, rely on local morpho-syntactic cues to structure rather than trying to parse entire sentences. Second, treat these cues as probabilistic rather than absolute indicators of syntactic structure. Apply inferential statistics to the da ..."
Abstract
-
Cited by 123 (4 self)
- Add to MetaCart
... This paper describes an approach based on two principles. First, rely on local morpho-syntactic cues to structure rather than trying to parse entire sentences. Second, treat these cues as probabilistic rather than absolute indicators of syntactic structure. Apply inferential statistics to the data collected using the cues, rather than drawing a categorical conclusion from a single occurrence of a cue. The effectiveness of this approach for inferring the syntactic frames of verbs is supported by experiments on an English corpus using a program called Lerner. Lerner starts out with no knowledge of content words--it bootstraps from determiners, auxiliaries, modals, prepositions, pronouns, complementizers, coordinating conjunctions, and punctuation.
Distributional Part-of-Speech Tagging
- In Proc. of 7th Conference of the European Chapter of the Association for Computational Linguistics
, 1995
"... This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in context instead of word types. The algorithm is evaluated on the Brown Corpus. ..."
Abstract
-
Cited by 75 (6 self)
- Add to MetaCart
This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in context instead of word types. The algorithm is evaluated on the Brown Corpus.
Designing Statistical Language Learners: Experiments on Noun Compounds
, 1995
"... Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic models of language on a sufficient volume of training data. The design of statistical language learners therefore involves answering two questions: (i ..."
Abstract
-
Cited by 65 (0 self)
- Add to MetaCart
Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic models of language on a sufficient volume of training data. The design of statistical language learners therefore involves answering two questions: (i) Which of the multitude of possible language models will most accurately reflect the properties necessary to a given task? (ii) What will constitute a sufficient volume of training data? Regarding the first question, though a variety of successful models have been discovered, the space of possible designs remains largely unexplored. Regarding the second, exploration of the design space has so far proceeded without an adequate answer. The goal of this thesis is to advance the exploration of the statistical language learning design space. In pursuit of that goal, the thesis makes two main theoretical contributions: it identifies a new class of designs by providing a novel theory of statistical natural language processing, and it presents the foundations for a predictive theory of data requirements to assist in future design explorations. The first of these contributions is called the meaning distributions theory. This theory
Part-Of-Speech Induction From Scratch
- In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics
, 1993
"... This paper presents a method for inducing the parts of speech of a language and part-of-speech labels for individual words from a large text corpus. Vector representations for the part-of-speech of a word are formed from entries of its near lexical neighbors. A dimensionality reduction creates a spa ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
This paper presents a method for inducing the parts of speech of a language and part-of-speech labels for individual words from a large text corpus. Vector representations for the part-of-speech of a word are formed from entries of its near lexical neighbors. A dimensionality reduction creates a space representing the syntactic categories of unambiguous words. A neural net trained on these spatial representations classifies individual contexts of occurrence of ambiguous words. The method classifies both ambiguous and unambiguous words correctly with high accuracy.
Tuning a Corpus Analysis Approach for Automatic Query Expansion
, 1997
"... Searching online text collections can be both rewarding and frustrating. While valuable information can be found, typically many irrelevant documents are also retrieved and many relevant ones are missed. Terminology mismatches between the user's query and document contents are a main cause of retrie ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
Searching online text collections can be both rewarding and frustrating. While valuable information can be found, typically many irrelevant documents are also retrieved and many relevant ones are missed. Terminology mismatches between the user's query and document contents are a main cause of retrieval failures. Expanding a user's query with related words can improve search performance, but finding and using related words is an open problem. This research uses corpus analysis techniques to automatically discover similar words directly from the contents of the untagged databases. Using these similarities, user queries are automatically expanded, resulting in conceptual retrieval rather than requiring exact word matches between queries and documents. This work has been extended to multi-database collections where each sub-database has a collection-specific similarity matrix associated with it. If the best matrix is selected, substantial search improvements are possible. However, automati...
Combining Distributional and Morphological Information for Part of Speech Induction
- In Proceedings of the tenth Annual Meeting of the European Association for Computational Linguistics EACL 2003
, 2003
"... In this paper we discuss algorithms for clustering words into classes from unlabelled text using unsupervised algorithms, based on distributional and morphological information. We show how the use of morphological information can improve the performance on rare words, and that this is robust ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
In this paper we discuss algorithms for clustering words into classes from unlabelled text using unsupervised algorithms, based on distributional and morphological information. We show how the use of morphological information can improve the performance on rare words, and that this is robust across a wide range of languages.
Improving Statistical Language Model Performance with Automatically Generated Word Hierarchies
- COMPUTATIONAL LINGUISTICS
, 2003
"... ..."
Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text
, 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)
Experiments in Automatic Word Class and Word Sense Identification for Information Retrieval
"... Automatic identification of related words and automatic detection of word senses are two long-standing goals of researchers in natural language processing. Word class information and word sense identification may enhance the performance of information retrieval systems. Large online corpora and incr ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Automatic identification of related words and automatic detection of word senses are two long-standing goals of researchers in natural language processing. Word class information and word sense identification may enhance the performance of information retrieval systems. Large online corpora and increased computational capabilities make new techniques based on corpus linguistics feasible. Corpus-based analysis is especially needed for corpora from specialized fields for which no electronic dictionaries or thesauri exist. The methods described here use a combination of mutual information and word context to establish word similarities. Then, unsupervised classification is done using clustering in the word space, identifying word classes without pretagging. We also describe an extension of the method to handle the difficult problems of disambiguation and of determining part-of-speech and semantic information for low-frequency words. The method is powerful enough to produce high-quality results on a small corpus of 200,000 words from abstracts in a field of molecular biology.
A Genetic Algorithm for the Induction of Natural Language Grammars
- Proc IJCAI-95 Workshop on New Approaches to Learning for Natural Language Processing
, 1995
"... Strict pattern-based methods of grammar induction are often frustrated by the apparently inexhaustible variety of novel word combinations in large corpora. Statistical methods offer a possible solution by allowing frequent well-formed expressions to overwhelm the infrequent ungrammatical ones. They ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Strict pattern-based methods of grammar induction are often frustrated by the apparently inexhaustible variety of novel word combinations in large corpora. Statistical methods offer a possible solution by allowing frequent well-formed expressions to overwhelm the infrequent ungrammatical ones. They also have the desirable property of being able to construct robust grammars from positive instances alone. Unfortunately, the "zero-frequency" problem entails assigning a small probability to all possible word patterns, thus ungrammatical n-grams become as probable as unseen grammatical ones. Further, such grammars are unable to take advantage of inherent lexical properties that should allow infrequent words to inherit the syntactic properties of the class to which they belong. This paper describes a genetic algorithm (GA) that adapts a population of hypothesis grammars towards a more effective model of language structure. The GA is statistically sensitive in that the utility of frequent pat...

