Results 1 - 10
of
11
Corpus-based induction of syntactic structure: Models of dependency and constituency
- In Proceedings of the 42nd Annual Meeting of the ACL
, 2004
"... We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The product model outperforms both components on their respective evaluation metrics, giving the best pu ..."
Abstract
-
Cited by 128 (8 self)
- Add to MetaCart
We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The product model outperforms both components on their respective evaluation metrics, giving the best published figures for unsupervised dependency parsing and unsupervised constituency parsing. We also demonstrate that the combined model works and is robust cross-linguistically, being able to exploit either attachment or distributional regularities that are salient in the data. 1
Unsupervised Context Sensitive Language Acquisition from a Large Corpus
"... We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of g ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proficiency. The results
Bridging Computational, Formal and Psycholinguistic Approaches to Language
- IN PROC. OF THE 26TH CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY
, 2004
"... We compare our model of unsupervised learning of linguistic structures, ADIOS [1, 2, 3], to some recent work in computational linguistics and in grammar theory. Our approach resembles the Construction Grammar in its general philosophy (e.g., in its reliance on structural generalizations rather t ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
We compare our model of unsupervised learning of linguistic structures, ADIOS [1, 2, 3], to some recent work in computational linguistics and in grammar theory. Our approach resembles the Construction Grammar in its general philosophy (e.g., in its reliance on structural generalizations rather than on syntax projected by the lexicon, as in the current generative theories) , and the Tree Adjoining Grammar in its computational characteristics (e.g., in its apparent affinity with Mildly Context Sensitive Languages). The representations learned by our algorithm are truly emergent from the (unannotated) corpus data, whereas those found in published works on cognitive and construction grammars and on TAGs are hand-tailored. Thus, our results complement and extend both the computational and the more linguistically oriented research into language acquisition.
Motif extraction and protein classification
- In Proc. Computational Systems Bioinformatics (CSB). 80–85
, 2005
"... We introduce an unsupervised method for extracting meaningful motifs from biological sequence data. This de novo motif extraction (MEX) algorithm is data driven, finding motifs that are not necessarily over-represented in the entire corpus, yet display over-representation within local contexts. We a ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We introduce an unsupervised method for extracting meaningful motifs from biological sequence data. This de novo motif extraction (MEX) algorithm is data driven, finding motifs that are not necessarily over-represented in the entire corpus, yet display over-representation within local contexts. We apply our method to the problem of deriving functional classification of proteins from their sequence information. Applying MEX to amino-acid sequences of a group of enzymes, we obtain a set of motifs that serves as the basis for description of these proteins. This motif-space, derived from sequence data only, is then used as a basis for functional classification by an SVM classifier. Using the set of the oxidoreductase super-family, with about 7000 enzymes, we show that classification based on MEX motifs surpasses that of two other SVM based methods: SVMProt that relies on physical and chemical properties of the protein sequence of amino-acids, and SVM applied to a Smith-Waterman distance matrix. This demonstrates the effectiveness of our MEX algorithm, and the feasibility of sequence-tofunction classification. keywords motif extraction, enzyme classification
Unsupervised grammar induction in a framework of information compression by multiple alignment, unification and search, in: C. de la
- Proceedings of the Workshop and Tutorial on Learning Context-Free Grammars
, 2003
"... Abstract. This paper describes a novel approach to grammar induction that has been developed within a framework designed to integrate learning with other aspects of computing, AI, mathematics and logic. This framework, called information compression by multiple alignment, unification and search (ICM ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract. This paper describes a novel approach to grammar induction that has been developed within a framework designed to integrate learning with other aspects of computing, AI, mathematics and logic. This framework, called information compression by multiple alignment, unification and search (ICMAUS), is founded on principles of Minimum Length Encoding pioneered by Solomonoff and others. Most of the paper describes SP70, a computer model of the ICMAUS framework that incorporates processes for unsupervised learning of grammars. An example is presented to show how the model can infer a plausible grammar from appropriate input. Limitations of the current model and how they may be overcome are briefly discussed. 1
Rich Syntax from a Raw Corpus: Unsupervised Does It
"... We compare our model of unsupervised learning of linguistic structures, ADIOS [1], to some recent work in computational linguistics and in grammar theory. Our approach resembles the Construction Grammar in its general philosophy (e.g., in its reliance on structural generalizations rather than on ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We compare our model of unsupervised learning of linguistic structures, ADIOS [1], to some recent work in computational linguistics and in grammar theory. Our approach resembles the Construction Grammar in its general philosophy (e.g., in its reliance on structural generalizations rather than on syntax projected by the lexicon, as in the current generative theories), and the Tree Adjoining Grammar in its computational characteristics (e.g., in its apparent affinity with Mildly Context Sensitive Languages). The representations learned by our algorithm are truly emergent from the (unannotated) corpus data, whereas those found in published works on cognitive and construction grammars and on TAGs are hand-tailored. Thus, our results complement and extend both the computational and the more linguistically oriented research into language acquisition.
A Deterministic Dynamic Associative Memory (DDAM) Model for Concept Space Representation
, 2006
"... ..."
A New Vision of Language
"... d processing (Solan et al., 2003). The new computational model gives up the logicism of generative grammar in favor of information-theoretic learning of distributed construction patterns. The structure and the meaning of a sentence (which can be thought of as the proverbial elephant groped by the bl ..."
Abstract
- Add to MetaCart
d processing (Solan et al., 2003). The new computational model gives up the logicism of generative grammar in favor of information-theoretic learning of distributed construction patterns. The structure and the meaning of a sentence (which can be thought of as the proverbial elephant groped by the blind men) are thus represented by the chorus of responses of construction detectors, which can be further processed by methods that are being worked out for another cognitive domain with similar computational needs: vision (Edelman and Intrator, 2003). Recent empirical findings indicate that (1) langue, and not merely parole, is imperfect (Chipere, 2001), just as the other faculties of the mind are, (2) people can handle only very shallow true recursion or center embedding (MacDonald and Christiansen, 2002), and (3) language is more formulaic than creative (Wray, 2002). The newly emerging computational work shows also that (4) linguistic knowledge can be learned from scratch, and (5) relianc
Some Tests of an Unsupervised Model of Language Acquisition
, 2004
"... We outline an unsupervised language acquisition algorithm and offer some psycholinguistic support for a model based on it. Our approach resembles the Construction Grammar in its general philosophy, and the Tree Adjoining Grammar in its computational characteristics. The model is trained on a corpus ..."
Abstract
- Add to MetaCart
We outline an unsupervised language acquisition algorithm and offer some psycholinguistic support for a model based on it. Our approach resembles the Construction Grammar in its general philosophy, and the Tree Adjoining Grammar in its computational characteristics. The model is trained on a corpus of transcribed child-directed speech (CHILDES). The model's ability to process novel inputs makes it capable of taking various standard tests of English that rely on forced-choice judgment and on magnitude estimation of linguistic acceptability. We report encouraging results from several such tests, and discuss the limitations revealed by other tests in our present method of dealing with novel stimuli.

