• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Unsupervised grammar inference systems for natural language (2002)

by A Roberts, E Atwell
Add To MetaCart

Tools

Sorted by:
Results 1 - 6 of 6

Unsupervised Learning of Natural Languages

by Zach Solan , 2006
"... ..."
Abstract - Cited by 51 (9 self) - Add to MetaCart
Abstract not found

Bridging Computational, Formal and Psycholinguistic Approaches to Language

by Shimon Edelman, Zach Solan, David Horn, Eytan Ruppin - IN PROC. OF THE 26TH CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY , 2004
"... We compare our model of unsupervised learning of linguistic structures, ADIOS [1, 2, 3], to some recent work in computational linguistics and in grammar theory. Our approach resembles the Construction Grammar in its general philosophy (e.g., in its reliance on structural generalizations rather t ..."
Abstract - Cited by 5 (4 self) - Add to MetaCart
We compare our model of unsupervised learning of linguistic structures, ADIOS [1, 2, 3], to some recent work in computational linguistics and in grammar theory. Our approach resembles the Construction Grammar in its general philosophy (e.g., in its reliance on structural generalizations rather than on syntax projected by the lexicon, as in the current generative theories) , and the Tree Adjoining Grammar in its computational characteristics (e.g., in its apparent affinity with Mildly Context Sensitive Languages). The representations learned by our algorithm are truly emergent from the (unannotated) corpus data, whereas those found in published works on cognitive and construction grammars and on TAGs are hand-tailored. Thus, our results complement and extend both the computational and the more linguistically oriented research into language acquisition.

Learning Syntactic Constructions from Raw Corpora

by Shimon Edelman, Zach Solan, David Horn, Eytan Ruppin - 29TH BOSTON UNIVERSITY CONFERENCE ON LANGUAGE DEVELOPMENT , 2005
"... ... a lexicon populated by units of various sizes, as envisaged by (Langacker, 1987). Constructions may be specified completely, as in the case of simple morphemes or idioms such as take it to the bank, or partially, as in the expression what’s X doing Y?, where X and Y are slots that admit fillers ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
... a lexicon populated by units of various sizes, as envisaged by (Langacker, 1987). Constructions may be specified completely, as in the case of simple morphemes or idioms such as take it to the bank, or partially, as in the expression what’s X doing Y?, where X and Y are slots that admit fillers of particular types (Kay and Fillmore, 1999). Constructions offer an intriguing alternative to traditional rule-based syntax by hinting at the extent to which the complexity of language can stem from a rich repertoire of stored, more or less entrenched (Harris, 1998) representations that address both syntactic and semantic issues, and encompass, in addition to general rules, “totally idiosyncratic forms and patterns of all intermediate degrees of generality ” (Langacker, 1987, p.46). Because constructions are by their very nature language-specific, the question of acquisition in Construction Grammar is especially poignant. We address this issue by offering an unsupervised algorithm that learns constructions from raw corpora.

A Cognitive-based Unsupervised Algorithm to Learn Syntactic Structure and Categories (under review)

by Donald W. Kijek, Christian C. Wagner, Ph. D
"... The empirical acquisition of linguistic knowledge by means of a computer is fundamental to an effective NLP system. Although research in this field has spanned over 40 years, significant improvements in the unsupervised acquisition process continue to elude us due to the focus on purely computationa ..."
Abstract - Add to MetaCart
The empirical acquisition of linguistic knowledge by means of a computer is fundamental to an effective NLP system. Although research in this field has spanned over 40 years, significant improvements in the unsupervised acquisition process continue to elude us due to the focus on purely computational methods. We describe the integration of computational and cognitive disciplines to develop a learning algorithm that will compete against the human language learning process. The learning algorithm was applied to test corpora acquiring syntactic structural and categorical data. The research results indicate the unsupervised acquisition of linguistic knowledge can be achieved by using an algorithm largely influenced by observation of the human language learning process. 1.

Supporting online material

by Zach Solan, David Horn, Shimon Edelman, Structured Graph , 2004
"... Consider a corpus of m sentences (sequences) of variable length, each expressed in terms of a lexicon of finite size N. The sentences in the corpus correspond to m different paths in a pseudograph (a non-simple graph in which both loops and multiple edges are permitted) whose vertices are the unique ..."
Abstract - Add to MetaCart
Consider a corpus of m sentences (sequences) of variable length, each expressed in terms of a lexicon of finite size N. The sentences in the corpus correspond to m different paths in a pseudograph (a non-simple graph in which both loops and multiple edges are permitted) whose vertices are the unique lexicon entries, augmented by two special symbols, begin and end. Each of the N nodes has a number of incoming paths that is equal to the number of outgoing paths. Figure S1 illustrates the type of structure that we seek, namely, the bundling of paths, signifying a relatively high probability associated with a sub-structure that can be identified as a pattern. To extract it from the data, two probability functions are defined over the graph for any given search path S = (e1 → e2 →... → ek) = (e1; ek). 1 The first one, PR(ei; ej), is the right-moving ratio of fan-through flux of paths at ej to fan-in flux of paths at ej−1, starting at ei and moving along the sub-path ei → ei+1 → ei+2... → ej−1: PR(ei; ej) = p(ej|eiei+1ei+2...ej−1) = l(ei; ej) l(ei; ej−1) where l(ei; ej) is the number of occurrences of sub-paths (ei; ej) in the graph. Proceeding in the opposite direction, from the right end of the path to the left, we define the left-going probability 1 In general the notation (ei; ej), j> i corresponds to a rightward sub-path of S, starting with ei and ending with ej. A leftward sub-path of S, starting with ej and ending with ei is denoted by (ej; ei), i < j. 1

Some Tests of an Unsupervised Model of Language Acquisition

by Bo Pedersen And, Bo Pedersen, Shimon Edelman, Zach Solan, David Horn , 2004
"... We outline an unsupervised language acquisition algorithm and offer some psycholinguistic support for a model based on it. Our approach resembles the Construction Grammar in its general philosophy, and the Tree Adjoining Grammar in its computational characteristics. The model is trained on a corpus ..."
Abstract - Add to MetaCart
We outline an unsupervised language acquisition algorithm and offer some psycholinguistic support for a model based on it. Our approach resembles the Construction Grammar in its general philosophy, and the Tree Adjoining Grammar in its computational characteristics. The model is trained on a corpus of transcribed child-directed speech (CHILDES). The model's ability to process novel inputs makes it capable of taking various standard tests of English that rely on forced-choice judgment and on magnitude estimation of linguistic acceptability. We report encouraging results from several such tests, and discuss the limitations revealed by other tests in our present method of dealing with novel stimuli.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University