Results 1  10
of
44
Corpusbased induction of syntactic structure: Models of dependency and constituency
 In Proceedings of the 42nd Annual Meeting of the ACL
, 2004
"... We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The product model outperforms both components on their respective evaluation metrics, giving the best pu ..."
Abstract

Cited by 170 (8 self)
 Add to MetaCart
We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The product model outperforms both components on their respective evaluation metrics, giving the best published figures for unsupervised dependency parsing and unsupervised constituency parsing. We also demonstrate that the combined model works and is robust crosslinguistically, being able to exploit either attachment or distributional regularities that are salient in the data. 1
A Generative ConstituentContext Model for Improved Grammar Induction
, 2002
"... We present a generative distributional model for the unsupervised induction of natural language syntax which explicitly models constituent yields and contexts. ..."
Abstract

Cited by 89 (3 self)
 Add to MetaCart
We present a generative distributional model for the unsupervised induction of natural language syntax which explicitly models constituent yields and contexts.
Unsupervised induction of stochastic contextfree grammars using distributional clustering
"... An algorithm is presented for learning a phrasestructure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion. This criterion is shown to be related to the entropy of a rand ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
An algorithm is presented for learning a phrasestructure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion. This criterion is shown to be related to the entropy of a random variable associated with the tree structures, and it is demonstrated that it selects linguistically plausible constituents. This is incorporated in a Minimum Description Length algorithm. The evaluation of unsupervised models is discussed, and results are presented when the algorithm has been trained on 12 million words of the British National Corpus. 1
Unsupervised Language Acquisition: Theory and Practice
, 2001
"... In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the socalled Argument from the Poverty of the Stimulus advanced in favour of the p ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the socalled Argument from the Poverty of the Stimulus advanced in favour of the proposition that humans have languagespecific innate knowledge. I start by examining an a priori argument based on Gold's theorem, that purports to prove that natural languages cannot be learned, and some formal issues related to the choice of statistical grammars rather than symbolic grammars. I present three novel algorithms for learning various parts of natural languages: first, an algorithm for the induction of syntactic categories from unlabelled text using distributional information, that can deal with ambiguous and rare words; secondly, a set of algorithms for learning morphological processes in a variety of languages, including languages such as Arabic with nonconcatenative morphology; thirdly an algorithm for the unsupervised induction of a contextfree grammar from tagged text. I carefully examine the interaction between the various components, and show how these algorithms can form the basis for a empiricist model of language acquisition. I therefore conclude that the Argument from the Poverty of the Stimulus is unsupported by the evidence.
ABL: AlignmentBased Learning
, 2000
"... This pal)or int;roduces a new type of grammar learning algorit;hm, insl)ircd l)y sl,ring edii, dis tan(;c (Wagner an(t Fis(:hcr, 1974). The algorithm takes a (:oft)us of fiat senl,en(:cs as intml, and rcLurns a corpus of labelled, 1)ra(:keted senl, en(:es. Th( lnel,hod works on pairs of Lured sellt ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
This pal)or int;roduces a new type of grammar learning algorit;hm, insl)ircd l)y sl,ring edii, dis tan(;c (Wagner an(t Fis(:hcr, 1974). The algorithm takes a (:oft)us of fiat senl,en(:cs as intml, and rcLurns a corpus of labelled, 1)ra(:keted senl, en(:es. Th( lnel,hod works on pairs of Lured sellt,ellCeS l,ha[, have oBe o1: illore words in (:ommon. When t, wo sentences are (tivi(led int,o t)arLs i;haL m'e Lhc same in 1)ol, h s(mLen(:es and t)arLs that m:e (litlrenL, this interreal,ion is used to find ])m'Ls l, haL are hd;cr(:hmgeablc. These t)arLs m'e tak(m as possible (:onsLii, uenLs same type. Afi,er this aligmnent learning step, the sele(:tion learning s(,c 1) s(l(z(:l,s i,he mosL at)le (:onsl;ihmnl;s fi'om all possible (:onsLiLuent,s. This method was used 1,o booLsLra t) stru(:hrc on the A.TIS (:oftres (Mm'(:us et, al., 1993) and on the OVI'S 1 corpus (Bornmina eL al., 1997). While Lhc results are en(:om'aging (we o})l, aincd up t,o 89.25 % noncrossing l)ra(:ket,s 1)rc(:ision), this paper will 1)oini; ouL some of the shorl,COlnings of our apl)rom:h and will suggest 1)ossible sohd,ions.
The Unsupervised Acquisition of a Lexicon from Continuous Speech
 MIT Artificial Intelligence Lab
, 1995
"... We present an unsupervised learning algorithm that acquires a naturallanguage lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that havestymied p ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
We present an unsupervised learning algorithm that acquires a naturallanguage lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that havestymied previous grammarinduction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from rawspeech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.
Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text
, 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)
Variational Bayesian grammar induction for natural language
 In International Colloquium on Grammatical Inference
, 2006
"... Abstract. This paper presents a new grammar induction algorithm for probabilistic contextfree grammars (PCFGs). There is an approach to PCFG induction that is based on parameter estimation. Following this approach, we apply the variational Bayes to PCFGs. The variational Bayes (VB) is an approximat ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
Abstract. This paper presents a new grammar induction algorithm for probabilistic contextfree grammars (PCFGs). There is an approach to PCFG induction that is based on parameter estimation. Following this approach, we apply the variational Bayes to PCFGs. The variational Bayes (VB) is an approximation of Bayesian learning. It has been empirically shown that VB is less likely to cause overfitting. Moreover, the free energy of VB has been successfully used in model selection. Our algorithm can be seen as a generalization of PCFG induction algorithms proposed before. In the experiments, we empirically show that induced grammars achieve better parsing results than those of other PCFG induction algorithms. Based on the better parsing results, we give examples of recursive grammatical structures found by the proposed algorithm. 1
Grammar Modelbased Program Evolution
 In Proceedings of the 2004 IEEE Congress on Evolutionary Computation
, 2004
"... In Evolutionary Computation, genetic operators, such as mutation and crossover, are employed to perturb individuals to generate the next population. However these fixed, problem independent genetic operators may destroy the subsolution, usually called building blocks, instead of discovering and pres ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
In Evolutionary Computation, genetic operators, such as mutation and crossover, are employed to perturb individuals to generate the next population. However these fixed, problem independent genetic operators may destroy the subsolution, usually called building blocks, instead of discovering and preserving them. One way to overcome this problem is to build a model based on the good individuals, and sample this model to obtain the next population. There is a wide range of such work in Genetic Algorithms