Results 1 - 10
of
19
Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach
- IN PROCEEDINGS OF THE 31ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 1993
"... In this paper we describe a new technique for parsing free text: a transformational grammar is automatically learned that is capable of accurately parsing text into binary-branching syntactic trees with nonterminals unlabelled. The algorithm works by beginning in a very naive state of knowledge abo ..."
Abstract
-
Cited by 120 (8 self)
- Add to MetaCart
In this paper we describe a new technique for parsing free text: a transformational grammar is automatically learned that is capable of accurately parsing text into binary-branching syntactic trees with nonterminals unlabelled. The algorithm works by beginning in a very naive state of knowledge about phrase structure. By repeatedly comparing the results of bracketing in the current state to proper bracketing provided in the training corpus, the system learns a set of simple structural transformations that can be applied to reduce error. After describing the algorithm, we present results and compare these results to other recent results in automatic grammar induction.
Figures of Merit for Best-First Probabilistic Chart Parsing
- Computational Linguistics
, 1996
"... Best-first parsing methods for natural language try to parse efficiently by considering the most likely constituents first. Some figure of merit is needed by which to compare the likelihood of constituents, and the choice of this figure has a substantial impact on the efficiency of the parser. While ..."
Abstract
-
Cited by 65 (3 self)
- Add to MetaCart
Best-first parsing methods for natural language try to parse efficiently by considering the most likely constituents first. Some figure of merit is needed by which to compare the likelihood of constituents, and the choice of this figure has a substantial impact on the efficiency of the parser. While several parsers described in the literature have used such techniques, there is no published data on their efficacy, much less attempts to judge their relative merits. We propose and evaluate several figures of merit for best-first parsing.
Context-Sensitive Statistics for Improved Grammatical Language Models
- In Proceedings of the Twelfth National Conference on Artificial Intelligence
, 1994
"... We develop a language model using probabilistic context-free grammars (PCFGs) that is "pseudo context-sensitive" in that the probability that a non-terminal N expands using a rule r depends on N 's parent. We derive the equations for estimating the necessary probabilities using a variant of the insi ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
We develop a language model using probabilistic context-free grammars (PCFGs) that is "pseudo context-sensitive" in that the probability that a non-terminal N expands using a rule r depends on N 's parent. We derive the equations for estimating the necessary probabilities using a variant of the inside-outside algorithm. We give experimental results showing that, beginning with a high-performance PCFG, one can develop a pseudo PCSG that yields significant performance gains. Analysis shows that the benefits from the context-sensitive statistics are localized, suggesting that we can use them to extend the original PCFG. Experimental results confirm that this is both feasible and the resulting grammar retains the performance gains. This implies that our scheme may be useful as a novel method for PCFG induction. 1 Introduction Like its non-stochastic brethren, probabilistic parsing has been based upon context-free grammars (CFGs), and for similar reasons: CFGs support a simple and efficien...
Discovery of Linguistic Relations Using Lexical Attraction
, 1998
"... This work has been motivated by two long term goals: to understand how humans learn language and to build programs that can understand language. Using a representation that makes the relevant features explicit is a prerequisite for successful learning and understanding. Therefore, I chose to represe ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
This work has been motivated by two long term goals: to understand how humans learn language and to build programs that can understand language. Using a representation that makes the relevant features explicit is a prerequisite for successful learning and understanding. Therefore, I chose to represent relations between individual words explicitly in my model. Lexical attraction is defined as the likelihood of such relations. I introduce a new class of probabilistic language models named lexical attraction models which can represent long distance relations between words and I formalize this new class of models using information theory. Within the
Learning Restricted Probabilistic Link Grammars
, 1995
"... We describe a language model employing a new headeddisjuncts formulationof Lafferty et al.'s (1992) probabilistic link grammar, together with (1) an EM training method for estimating the probabilities, and (2) a procedure for learning some simple lexicalized grammar structures. The model in its sim ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We describe a language model employing a new headeddisjuncts formulationof Lafferty et al.'s (1992) probabilistic link grammar, together with (1) an EM training method for estimating the probabilities, and (2) a procedure for learning some simple lexicalized grammar structures. The model in its simplest form is a generalization of n-gram models, but in its general form possesses context-free expressiveness. Unlike the original experiments on probabilistic link grammars, we assume that no hand-coded grammar is initially available (as with n-gram models). We employ untyped links to concentrate the learning on lexical dependencies, and our formulation uses the lexical identities of heads to influence the structure of the parse graph. After learning, the language model consists of grammatical rules in the form of a set of simple disjuncts for each word, plus several sets of probability parameters. The formulation extends cleanly toward learning more powerful context-free grammars. Several issues relating to generalization bias, linguistic constraints, and parameter smoothing are considered. Preliminary experimental results on small artificial corpora are supportive of our approach.
Linguistic Structure as Composition and Perturbation
- In Meeting of the Association for Computational Linguistics
, 1996
"... This paper discusses the problem of learning language from unprocessed text and speech signals, concentrating on the problem of learning a lexicon. In particular, it argues for a representation of language in which linguistic parameters like words are built by perturbing a composition of exist ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper discusses the problem of learning language from unprocessed text and speech signals, concentrating on the problem of learning a lexicon. In particular, it argues for a representation of language in which linguistic parameters like words are built by perturbing a composition of existing parameters. The power of the representation is demonstrated by several examples in text segmentation and compression, acquisition of a lexicon from raw speech, and the acquisition of mappings between text and artificial representations of meaning.
Rapid Grammar Development and Parsing: Constraint Dependency Grammars with Abstract Role Values
, 2000
"... ROLE VALUES A Thesis Submitted to the Faculty Purdue University by Christopher M. White In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy May 2000 - ii - To my loving wife Margit. ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
ROLE VALUES A Thesis Submitted to the Faculty Purdue University by Christopher M. White In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy May 2000 - ii - To my loving wife Margit.
Two principles and six techniques for rapid mt development
- Proc. of AMTA-96
, 1996
"... In this paper we describe a range of techniques used at NMSU CRL for accelerating the development of MT systems. These techniques enable semi-automatic development of a number of components of a multilingual MT system, thereby enabling rapid deployment of MT capabilities in a new language. First, we ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
In this paper we describe a range of techniques used at NMSU CRL for accelerating the development of MT systems. These techniques enable semi-automatic development of a number of components of a multilingual MT system, thereby enabling rapid deployment of MT capabilities in a new language. First, we describe the core multi-engine, multilingual architecture that enables the different techniques to be rapidly integrated to build an MT system. We show how off-the-shelf components were used in this architecture for fast development. Then we illustrate a set of techniques for semi-automatic acquisition of static resources: (a) automatic induction of grammars, (b) corpus-based acquisition of bilingual glossaries, and automatic acquisition of semantic lexicons through (c) lexical rules and (d) reversal of analysis lexicons to generation lexicons. Finally we describe an automatic testing environment that enables rapid validation of automatically acquired resources. 1 Rapid Development Techniques Static knowledge sources — grammars, lexicons, world knowledge bases — are the most time-consuming concerns in any rule-based machine translation system. It is, therefore, imperative to find ways of speeding up the creation and updating of high-quality, useful static knowledge sources. It is equally imperative to

