Results 1 - 10
of
85
The Generative Lexicon
- Computational Linguistics
, 1991
"... this paper, I will discuss four major topics relating to current research in lexical semantics: methodology, descriptive coverage, adequacy of the representation, and the computational usefulness of representations. In addressing these issues, I will discuss what I think are some of the central prob ..."
Abstract
-
Cited by 727 (23 self)
- Add to MetaCart
this paper, I will discuss four major topics relating to current research in lexical semantics: methodology, descriptive coverage, adequacy of the representation, and the computational usefulness of representations. In addressing these issues, I will discuss what I think are some of the central problems facing the lexical semantics community, and suggest ways of best approaching these issues. Then, I will provide a method for the decomposition of lexical categories and outline a theory of lexical semantics embodying a notion of cocompositionality and type coercion, as well as several levels of semantic description, where the semantic load is spread more evenly throughout the lexicon. I argue that lexical decomposition is possible if it is performed generatively. Rather than assuming a fixed set of primitives, I will assume a fixed number of generative devices that can be seen as constructing semantic expressions. I develop a theory of Qualia Structure, a representation language for lexical items, which renders much lexical ambiguity in the lexicon unnecessary, while still explaining the systematic polysemy that words carry. Finally, I discuss how individual lexical structures can be integrated into the larger lexical knowledge base through a theory of lexical inheritance. This provides us with the necessary principles of global organization for the lexicon, enabling us to fully integrate our natural language lexicon into a conceptual whole
Unsupervised Learning of the Morphology of a Natural Language
- COMPUTATIONAL LINGUISTICS
, 2001
"... This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in size from 5,000 words to 500,000 words. We develop a set of heuristics that rapidly develop a probabilist ..."
Abstract
-
Cited by 201 (9 self)
- Add to MetaCart
This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in size from 5,000 words to 500,000 words. We develop a set of heuristics that rapidly develop a probabilistic morphological grammar, and use MDL as our primary tool to determine whether the modifications proposed by the heuristics will be adopted or not. The resulting grammar matches well the analysis that would be developed by a human morphologist. In the final section, we discuss the relationship of this style of MDL grammatical analysis to the notion of evaluation metric in early generative grammar.
Unsupervised Language Acquisition: Theory and Practice
, 2001
"... In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the so-called Argument from the Poverty of the Stimulus advanced in favour of the p ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the so-called Argument from the Poverty of the Stimulus advanced in favour of the proposition that humans have language-specific innate knowledge. I start by examining an a priori argument based on Gold's theorem, that purports to prove that natural languages cannot be learned, and some formal issues related to the choice of statistical grammars rather than symbolic grammars. I present three novel algorithms for learning various parts of natural languages: first, an algorithm for the induction of syntactic categories from unlabelled text using distributional information, that can deal with ambiguous and rare words; secondly, a set of algorithms for learning morphological processes in a variety of languages, including languages such as Arabic with nonconcatenative morphology; thirdly an algorithm for the unsupervised induction of a context-free grammar from tagged text. I carefully examine the interaction between the various components, and show how these algorithms can form the basis for a empiricist model of language acquisition. I therefore conclude that the Argument from the Poverty of the Stimulus is unsupported by the evidence.
The Unsupervised Acquisition of a Lexicon from Continuous Speech
- MIT Artificial Intelligence Lab
, 1995
"... We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that havestymied p ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that havestymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from rawspeech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.
Emergence of Net-grammar in Communicating Agents
- BioSystems
, 1996
"... Evolution of symbolic language and grammar is studied in a network model. Language is expressed by words, i.e. strings of symbols, which are generated by agents with their own symbolic grammar system. Agents communicate with each other by deriving and accepting words in terms of their own grammar. T ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
Evolution of symbolic language and grammar is studied in a network model. Language is expressed by words, i.e. strings of symbols, which are generated by agents with their own symbolic grammar system. Agents communicate with each other by deriving and accepting words in terms of their own grammar. They are ranked according to their communicative effectiveness: an agent which can derive less frequent and less acceptable words and accept words in less computational time will have higher scores. They can evolve by mutational processes, which change rewriting rules in their symbolic grammars. Complexity and diversity of words increase in the course of time. The emergence of modules and loop structure enhances the evolution. On the other hand, ensemble structure lead to a net-grammar, restricting individual grammars and their evolution. Key words: Net-grammar; Algorithmic evolution; Module-type evolution; Evolution of language; Symbolic grammar systems 1 Introduction Linguistic expressions...
Unify and Merge in Fluid Construction Grammar
- EMERGENCE AND EVOLUTION OF LINGUISTIC COMMUNICATION, LECTURE NOTES IN COMPUTER SCIENCE
, 2006
"... Research into the evolution of grammar requires that we employ formalisms and processing mechanisms that are powerful enough to handle features found in human natural languages. But the formalism needs to have some additional properties compared to those used in other linguistics research that a ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Research into the evolution of grammar requires that we employ formalisms and processing mechanisms that are powerful enough to handle features found in human natural languages. But the formalism needs to have some additional properties compared to those used in other linguistics research that are specifically relevant for handling the emergence and progressive co-ordination of grammars in a population of agents. This document introduces Fluid Construction Grammar, a formalism with associated parsing, production, and learning processes designed for language evolution research. The present paper focuses on a formal definition of the unification and merging algorithms used in Fluid Construction Grammar. The complexity and soundness of the algorithms and their relation to unification in logic programming and other unification-based grammar formalisms are discussed.
Spatial random tree grammars for modeling hierarchal structure in images with . . .
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2004
"... We present a novel probabilistic model for the hierarchical structure of an image and its regions. We call this model spatial random tree grammars (SRTGs). We develop algorithms for the exact computation of likelihood and maximum a posteriori (MAP) estimates and the exact expectation-maximization ( ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
We present a novel probabilistic model for the hierarchical structure of an image and its regions. We call this model spatial random tree grammars (SRTGs). We develop algorithms for the exact computation of likelihood and maximum a posteriori (MAP) estimates and the exact expectation-maximization (EM) updates for model-parameter estimation. We collectively call these algorithms the center-surround algorithm. We use the center-surround algorithm to automatically estimate the maximum likelihood (ML) parameters of SRTGs and classify images based on their likelihood and based on the MAP estimate of the associated hierarchical structure. We apply our method to the task of classifying natural images and demonstrate that the addition of hierarchical structure significantly improves upon the performance of a baseline model that lacks such structure.
The Iterative Learning of Phonological Constraints
- Computational Linguistics
, 1991
"... This paper presents a simplicity measure for violable phonological constraints based on the minimum message length method. This measure captures the intuitive desiderata of conciseness, accuracy and precision. A family of constraints can be specified by parameterising a specific constraint, and so f ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This paper presents a simplicity measure for violable phonological constraints based on the minimum message length method. This measure captures the intuitive desiderata of conciseness, accuracy and precision. A family of constraints can be specified by parameterising a specific constraint, and so forming a template. The combination of this measure with a search algorithm is a powerful learning method for finding the best constraint matching a template and fitting a corpus. This method may be applied iteratively, using the same template, to learn a number of different constraints. Five applications of an implementation show some of the successes of this learning method: from learning consonant cluster constraints to vowel harmony.
Syntax in Language Production: An Approach Using Tree-Adjoining Grammars
, 1999
"... this paper states that different levels of processing can work on different pieces of an utterance at the same time. Thus, the phonological encoder can work on the early part of the clause while the syntactic encoder works on filling out what remains. As a result, once the syntactic representation f ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
this paper states that different levels of processing can work on different pieces of an utterance at the same time. Thus, the phonological encoder can work on the early part of the clause while the syntactic encoder works on filling out what remains. As a result, once the syntactic representation for the sentence is done, its corresponding phonological representation is likely close to complete as well. 29

