Results 1 - 10
of
14
Unsupervised Language Acquisition: Theory and Practice
, 2001
"... In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the so-called Argument from the Poverty of the Stimulus advanced in favour of the p ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the so-called Argument from the Poverty of the Stimulus advanced in favour of the proposition that humans have language-specific innate knowledge. I start by examining an a priori argument based on Gold's theorem, that purports to prove that natural languages cannot be learned, and some formal issues related to the choice of statistical grammars rather than symbolic grammars. I present three novel algorithms for learning various parts of natural languages: first, an algorithm for the induction of syntactic categories from unlabelled text using distributional information, that can deal with ambiguous and rare words; secondly, a set of algorithms for learning morphological processes in a variety of languages, including languages such as Arabic with nonconcatenative morphology; thirdly an algorithm for the unsupervised induction of a context-free grammar from tagged text. I carefully examine the interaction between the various components, and show how these algorithms can form the basis for a empiricist model of language acquisition. I therefore conclude that the Argument from the Poverty of the Stimulus is unsupported by the evidence.
Using Single Layer Networks for Discrete, Sequential Data: an Example from Natural Language Processing
- Neural Computing Applications
, 1997
"... Natural Language Processing (NLP) is concerned with processing ordinary, unrestricted text. This work takes a new approach to a traditional NLP task, using neural computing methods. A parser which has been successfully implemented is described. It is a hybrid system, in which neural processors opera ..."
Abstract
-
Cited by 11 (10 self)
- Add to MetaCart
Natural Language Processing (NLP) is concerned with processing ordinary, unrestricted text. This work takes a new approach to a traditional NLP task, using neural computing methods. A parser which has been successfully implemented is described. It is a hybrid system, in which neural processors operate within a rule based framework. The neural processing components belong to the class of Generalized Single Layer Networks (GSLN). In general, supervised, feed-forward networks need more than one layer to process data. However, in some cases data can be pre-processed with a non-linear transformation, and then presented in a linearly separable form for subsequent processing by a single layer net. Such networks o er advantages of functional transparency and operational speed. For our parser, the initial stage of processing maps linguistic data onto a higher order representation, which can then be analysed by a single layer network. This transformation is supported by information theoretic analysis. Three di erent algorithms for the neural component were investigated. Single layer nets can be trained by nding weight adjustments based on (a) factors proportional to the input, as in the Perceptron, (b) factors proportional to the existing weights, and (c) an error minimization method. In our experiments generalization ability varies little � method (b) is used for a prototype parser. This is available via telnet.
Lexical Disambiguation Using Constraint Handling In Prolog (CHIP)
, 1993
"... This paper describes experiments with an algorithm for lexical sense disambiguation, that is, predicting which of many possible senses of a word is intended in a given sentence. The definitions of senses of a given word are those used in LDOCE, the Longman Dictionary of Contemporary English [Procter ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
This paper describes experiments with an algorithm for lexical sense disambiguation, that is, predicting which of many possible senses of a word is intended in a given sentence. The definitions of senses of a given word are those used in LDOCE, the Longman Dictionary of Contemporary English [Procter et al., 1978]. The algorithm first as- signs a set of meanings or senses drawn from LDOCE to each word in the given sentence, and then chooses the combination of word-senses (one for each word in the sentence), yielding the maximum semantic over- lap. The metric of semantic overlap is based on the fact that LDOCE sense definitions are made in terms of the Longman Defining Vocabulary, effectively a (large) set of semantic primitives. Since the prob- lem of finding the word-sense-chain with maximum overlap can be viewed as a specialised example of the class of constraint-based optimisation problems for which Constraint Handling In Prolog (CHIP) was designed, we have chosen to implement our algorithm in CHIP
A fast partial parse of natural language sentences using a connectionist method
- In 7th Conference of the European Chapter of the Association of Computational Linguistics
, 1995
"... method ..."
The Representation of Natural Language to Enable Neural Networks to Detect Syntactic Features
- In Proe. of IEE Colloquium on Grammatical In~erence
, 1994
"... Acknowledgements 3 1 ..."
A Unified MultiCorpus for Training Syntactic Constraint Models
- Leeds University, School of Computer Studies
, 1994
"... PoW, Nijmegen, UPenn, BNC, etc) are used as training data for statistical syntactic constraint models to improve recognition accuracy in speech and handwriting recognisers. However, linguists developing these linguistic resources have used quite different wordtagging and parse-tree labelling schemes ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
PoW, Nijmegen, UPenn, BNC, etc) are used as training data for statistical syntactic constraint models to improve recognition accuracy in speech and handwriting recognisers. However, linguists developing these linguistic resources have used quite different wordtagging and parse-tree labelling schemes in each of these annotated corpora. This restricts the accessibility of each corpus, making it impossible for speech and handwriting researchers to collate them into a single very large training set. This is particularly problematic as there is evidence that one of these parsed corpora on its own is too small for a general statistical model of higher-level syntactic structure, but the combined size of all the above annotated corpora should deliver a much more reliable model. We are developing a set of mapping algorithms to map between the main tagsets and phrase structure grammar schemes used in the above corpora. We will develop a Multi-tagged Corpus and a MultiTreebank, a single text-set annotated with all the above tagging and parsing schemes. The text-set is the Spoken English Corpus; this is already annotated with two syntax schemes, and we plan to have added at least one more by the AISB Workshop. However, the main 1 deliverable to the speech and handwriting research community is not the SECbased MultiTreebank, but the mapping suite used to produce it- this can be used to combine currently-incompatible syntactic training sets into a large unified multicorpus. Our development of the mapping algorithms aims to distinguish notational from substantive differences in the annotation schemes, and we will be able to evaluate tagging schemes in terms of how well they fit standard statistical language models such as n-pos (Markov) models. 2
Design and implementation of the AGTS Probabilistic Tagger
, 1998
"... Introduction The last ten years have seen the development of several probabilistic taggers, such as the PARTS program (Church 1988), the De Rose tagger (De Rose 1991), the Brill tagger (Brill and Marcus 1992) and the Xerox tagger (Cutting et al 1993). The AGTS (The Automatic Grammatical Tagging Sys ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Introduction The last ten years have seen the development of several probabilistic taggers, such as the PARTS program (Church 1988), the De Rose tagger (De Rose 1991), the Brill tagger (Brill and Marcus 1992) and the Xerox tagger (Cutting et al 1993). The AGTS (The Automatic Grammatical Tagging System) Tagger was a key project funded by China Social Science Academy. It was undertaken at the laboratory for Computational Linguistics of the Institute for Natural Language Processing of Jiao Tong University in Shangha, from 1987 to 1990. The basic techniques of the AGTS tagger are very similar to the CLAWS tagger in terms of the principles of Constituent Likelihood Grammar (Atwell 1987). One the of purposes of the AGTS project is to tag the JDEST (Jiao Da English for Science and Technology) Corpus, which consists of one million words. The JDEST Corpus was built up by Jiao Tong University in Shanghai in 1985. The JDEST Corpus collected English texts covering ten subject areas in sci
A Probabilistic Chunker
- In: Proceedings of ROCLING VI
, 1993
"... This paper proposes a probabilistic partial parser, which we call chunker. The chunker partitions the input sentence into segments. This idea is motivated by the fact that when we read a sentence, we read it chunk by chunk. We train the chunker from Susanne Corpus, which is a modified but shrunk ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper proposes a probabilistic partial parser, which we call chunker. The chunker partitions the input sentence into segments. This idea is motivated by the fact that when we read a sentence, we read it chunk by chunk. We train the chunker from Susanne Corpus, which is a modified but shrunk version of Brown Corpus, underlying bi-gram language model. The experiment is evaluated by outside test and inside test. The preliminary results show the chunker has more than 98% chunk correct rate and 94% sentence correct rate in outside test, and 99% chunk correct rate and 97% sentence correct rate in inside test. The simple but effective chunker design has shown to be promising and can be extended to complete parsing and many applications. 1. Introduction A probabilistic approach to natural language processing is not new [1]. Recently, many parsers based on this line have been proposed [2-9]. Garside and Leech [2] apply the constituentlikehood grammar of Atwell [10] to probabilist...
Proposal for a Mutual-Information Based Language Model
"... We propose a probabilistic language model that is intended to overcome some of the limitations of the well-known n-gram models, namely the strong dependence of the parameter values of the model on the discourse domain and the constant size of word context taken into account. The new model is based o ..."
Abstract
- Add to MetaCart
We propose a probabilistic language model that is intended to overcome some of the limitations of the well-known n-gram models, namely the strong dependence of the parameter values of the model on the discourse domain and the constant size of word context taken into account. The new model is based on the mutual information (MI) measurement for the correlation of events and derives a hierarchy of categories from unlabelled training text. It has close analogies to the bi-gram model and is therefore explained by comparing it with this model. 1 Introduction Language models (LMs) are used to capture regularities in languages and in this way to provide information about the possibility or likelihood of certain language constructs. For large-vocabulary speech and handwriting recognition, the acoustic or graphemic evidence gained by the input device may not be sufficient to decide on the word spoken or written with a reasonable amount of certainty. Such devices therefore usually output a set ...
A Fast Partial Parse of Natural Language Sentences Using a Connectionist Method
, 1995
"... The pattern matching capabilities of neural networks can be used to locate syntactic constituents of natural language. This paper describes a fully automated hybrid system, using neural nets operating within a grammatic framework. It addresses the representation of language for connectionist ..."
Abstract
- Add to MetaCart
The pattern matching capabilities of neural networks can be used to locate syntactic constituents of natural language. This paper describes a fully automated hybrid system, using neural nets operating within a grammatic framework. It addresses the representation of language for connectionist processing, and describes methods of constraining the problem size. The function of the network is briefly explained, and results are given. 1 Introduction The pattern matching capabilities of neural networks can be used to detect syntactic constituents of natural language. This approach bears comparison with probabilistic systems, but has the advantage that negative as well as positive information can be modelled. Also, most computation is done in advance, when the nets are trained, so the run time computational load is low. In this work neural networks are used as part of a fully automated system that finds a partial parse of declarative sentences. The connectionist processors operat...

