## Head Automata and Bilingual Tiling: Translation with Minimal Representations (1996)

### Cached

### Download Links

Citations: | 44 - 3 self |

### BibTeX

@MISC{Alshawi96headautomata,

author = {Hiyan Alshawi},

title = {Head Automata and Bilingual Tiling: Translation with Minimal Representations },

year = {1996}

}

### OpenURL

### Abstract

We present a language model consisting of a collection of costed bidirectional finite state automata associated with the head words of phrases. The model is suitable for incremental application of lexical associations in a dynamic programming search for optimal dependency tree derivations. We also

### Citations

1276 | The Mathematics of Statistical Machine Translation: Parameter Estimation - Brown, Pietra, et al. - 1993 |

862 | Accurate Methods for the Statistics of Surprise and Coincidence
- Dunning
- 1993
(Show Context)
Citation Context ...)sln(n + (c)) \Gamma ln(n + (ejc)). Discriminative model: The costs in this model are likelihood ratios comparing positive and negative solutions, for example correct and incorrect translations. (See =-=Dunning 1993-=- on the application of likelihood ratios in computational linguistics.) Let n \Gamma (ejc) be the count for choice (ejc) leading to negative solutions. The cost function for the discriminative model i... |

636 | A statistical approach to machine translation - Brown, Cocke, et al. - 1990 |

472 |
Generalized phrase structure grammar
- Gazdar, Klein, et al.
- 1985
(Show Context)
Citation Context ...sitions will yield empty sequences, corresponding to a leaf node of the dependency tree. From a linguistic perspective, head automata allow for a compact, graded, notion of lexical subcategorization (=-=Gazdar et al. 1985-=-) and the linear order of a head and its dependent phrases. Lexical parameters can control the saturation of a lexical item (for example a verb that is both transitive and intransitive) by starting th... |

236 |
The Core Language Engine
- Alshawi
- 1992
(Show Context)
Citation Context ... natural language words, although this is not a necessary property of our models. Information coded explicitly in sentence representations by word senses and feature constraints in our previous work (=-=Alshawi 1992-=-) is implicit in the models used to derive the dependency trees and translations. In particular, dependency parameters and context-dependent transfer parameters give rise to an implicit, graded notion... |

232 | The wake-sleep algorithm for unsupervised neural networks
- Hinton, Dayan, et al.
- 1995
(Show Context)
Citation Context ...on and word order in t s andst s . In order to train the model parameters without a manually translated corpus, we use a "reflexive " training method (similar in spirit to the "wake-sle=-=ep" algorithm, Hinton et al. 1995-=-). In this method, our search process translates a source sentence s to t s in the target language and then translates t s back to a source language sentence s 0 . The original sentence s can then act... |

227 |
X-bar syntax: a study of phrase structure
- Jackendoff
- 1977
(Show Context)
Citation Context ...ansitive and intransitive) by starting the same automaton in different states. Head automata can also be used to code a grammar in which states of an automaton for word w corresponds to X-bar levels (=-=Jackendoff 1977-=-) for phrases headed by w. Head automata are formally more powerful than finite state automata that accept regular languages in the following sense. Each head automaton defines a formal language with ... |

224 | English Word Grammar
- Hudson
- 1990
(Show Context)
Citation Context ...scribe the model in terms of the familiar paradigm of a generative statistical model, presenting the parameters as conditional probabilities. This gives us a stochastic version of dependency grammar (=-=Hudson 1984-=-). Each derivation in the generative statistical model produces an ordered dependency tree, that is, a tree in which nodes dominate ordered sequences of left and right subtrees and in which the nodes ... |

141 | Prepositional phrase attachment through a backed-off model - Collins, Brooks - 1995 |

123 |
Stochastic lexicalized treeadjoining grammars
- Schabes
- 1992
(Show Context)
Citation Context ...ram models (Jelinek et al. 1992) and the structural properties of statistical context free grammars (Booth 1969) without the computational overhead of statistical lexicalized tree-adjoining grammars (=-=Schabes 1992-=-, Resnik 1992). The quantitative dependency model described here grew out of the model presented in Alshawi 1996a. An alternative model based on transducer versions of the automata is described in Als... |

108 | Machine translation divergences: a formal description and proposed solution
- Dorr
- 1994
(Show Context)
Citation Context ...s, that is, it treats the dependents of a word as an unordered bag. The model is general enough to cover 6 the common translation problems discussed in the literature (e.g. Lindop and Tsujii 1991 and =-=Dorr 1994-=-) including many-to-many word mapping, argument switching, and head switching. A transfer model consists of a bilingual lexicon and a transfer parameter table. The model uses dependency tree fragments... |

88 |
Probabilistic tree-adjoining grammar as a framework for statistical natural language processing
- Resnik
- 1992
(Show Context)
Citation Context ...linek et al. 1992) and the structural properties of statistical context free grammars (Booth 1969) without the computational overhead of statistical lexicalized tree-adjoining grammars (Schabes 1992, =-=Resnik 1992-=-). The quantitative dependency model described here grew out of the model presented in Alshawi 1996a. An alternative model based on transducer versions of the automata is described in Alshawi 1996b. F... |

83 | Grammatical Trigrams: A Probabilistic Model of Link Grammar - Lafferty, Sleator, et al. - 1992 |

63 |
rinciples of lexical language modeling for speech recognion
- Jelinek, Mercer, et al.
- 1991
(Show Context)
Citation Context ...tic association costs, providing a practical solution to the problem of combinatoric disambiguation (Church and Patil 1982). The model is intended to combine the lexical sensitivity of N-gram models (=-=Jelinek et al. 1992-=-) and the structural properties of statistical context free grammars (Booth 1969) without the computational overhead of statistical lexicalized tree-adjoining grammars (Schabes 1992, Resnik 1992). The... |

58 |
Recognition and parsing of context-free languages
- Younger
- 1967
(Show Context)
Citation Context ...l time in the length of the string. In our experimental system we use a more general version of the algorithm to allow input in the form of word lattices. The algorithm is a bottom-up tabular parser (=-=Younger 1967-=-, Early 1970) in which constituentsare constructed “head-outwards” (Kay 1989, Sata and Stock 1989). Since we are analyzing bottomup with generative model automata, the algorithm ‘runs’ the automata b... |

50 |
Memory and Context for Language Interpretation
- Alshawi
- 1987
(Show Context)
Citation Context ...nts (such as our head automata) for capturing grammatical constraints, or the identity of other words in a phrase for capturing sense distinctions. For larger scale context, we have argued elsewhere (=-=Alshawi 1987-=-) that memory activation patterns resulting from the process of carrying out an understanding task can act as global context without explicit representations of discourse. Under this view, the challen... |

50 |
Head-driven parsing
- Kay
- 1990
(Show Context)
Citation Context ...l version of the algorithm to allow input in the form of word lattices. 4 The algorithm is a bottom-up tabular parser (Younger 1967, Early 1970) in which constituents are constructed "head-outwar=-=ds" (Kay 1989, Sata and-=- Stock 1989). Since we are analyzing bottomup with generative model automata, the algorithm `runs' the automata backwards. Edges in the parsing lattice (or "chart") are tuples representing p... |

45 | Shake-and-Bake Translation
- Whitelock
- 1992
(Show Context)
Citation Context ...get dependency graph with entries from the bilingual lexicon. Dynamic programming is again used to make exhaustive search tractable, avoiding the combinatoric explosion of shake-and-bake translation (=-=Whitelock 1992-=-, Brew 1992). In Section 5 we present a general framework for associating costs with the solutions of search processes, pointing out some benefits of cost functions other than log likelihood, includin... |

42 |
Probabilistic Representation of Formal Languages
- Booth
- 1969
(Show Context)
Citation Context ...mbiguation (Church and Patil 1982). The model is intended to combine the lexical sensitivity of N-gram models (Jelinek et al. 1992) and the structural properties of statistical context free grammars (=-=Booth 1969-=-) without the computational overhead of statistical lexicalized tree-adjoining grammars (Schabes 1992, Resnik 1992). The quantitative dependency model described here grew out of the model presented in... |

37 | Letting the cat out of the bag: generation for Shake-and-Bake MT
- Brew
- 1992
(Show Context)
Citation Context ...raph with entries from the bilingual lexicon. Dynamic programming is again used to make exhaustive search tractable, avoiding the combinatoric explosion of shake-and-bake translation (Whitelock 1992, =-=Brew 1992-=-). In Section 5 we present a general framework for associating costs with the solutions of search processes, pointing out some benefits of cost functions other than log likelihood, including an error-... |

28 |
A Statistical Approach to Machine Translation, Computational Linguistics
- Brown, Cocke, et al.
- 1990
(Show Context)
Citation Context ...f joining the target fragments in a consistent fashion. The node mapping function f for the entire tree thus has a different role from the alignment function in the IBM statistical translation model (=-=Brown et al. 1990-=-, 1993); the role of the latter includes the linear ordering of words in the target string. In our approach, target word order is handled exclusively by the target monolingual model. 4.3 Transfer Algo... |

22 | Head automata for speech translation
- Alshawi
- 1996
(Show Context)
Citation Context ... without the computational overhead of statistical lexicalized tree-adjoining grammars (Schabes 1992, Resnik 1992). The quantitative dependency model described here grew out of the model presented in =-=Alshawi 1996-=-a. An alternative model based on transducer versions of the automata is described in Alshawi 1996b. For translation, we use a model for mapping dependency graphs written by the source language head au... |

19 | Multi-site data collection and evaluation in spoken language understanding
- Hirschman
- 1993
(Show Context)
Citation Context ... s. 12 6 Experimental System We have built an experimental translation system using the monolingual and translation models described in this paper. The system translates sentences in the ATIS domain (=-=Hirschman et al. 1993-=-) between English and Mandarin Chinese. The translator is in fact a subsystem of a speech translation prototype, though the experiments we describe here are for transcribed spoken utterances. (We info... |

18 |
An efficient context-free parsing algorithm
- Early
- 1970
(Show Context)
Citation Context ...ngth of the string. In our experimental system we use a more general version of the algorithm to allow input in the form of word lattices. 4 The algorithm is a bottom-up tabular parser (Younger 1967, =-=Early 1970) in which-=- constituents are constructed "head-outwards" (Kay 1989, Sata and Stock 1989). Since we are analyzing bottomup with generative model automata, the algorithm `runs' the automata backwards. Ed... |

10 |
Coping With Syntactic Ambiguity or How to Put the
- Church, Patil
- 1982
(Show Context)
Citation Context ... These head automata are applied by an algorithm with admissible incremental pruning based on semantic association costs, providing a practical solution to the problem of combinatoric disambiguation (=-=Church and Patil 1982-=-). The model is intended to combine the lexical sensitivity of N-gram models (Jelinek et al. 1992) and the structural properties of statistical context free grammars (Booth 1969) without the computati... |

9 |
SwedishEnglish QLF Translation
- Alshawi, Carter, et al.
- 1992
(Show Context)
Citation Context ...ng step of the overall search control. This way we can keep the benefits of monolingual/bilingual modularity (Isabelle and Macklovitch 1986) without the computational overhead of transfer-and-filter (=-=Alshawi et al. 1992-=-). It is possible to apply the subtree search directly to the whole graph starting with the initial runtime entries from lexical matching. However, this would result in an exponential search, specific... |

9 |
Complex Transfer in MT: A Survey of Examples
- Lindop, Tsujii
- 1992
(Show Context)
Citation Context ...h unordered dependency trees, that is, it treats the dependents of a word as an unordered bag. The model is general enough to cover 6 the common translation problems discussed in the literature (e.g. =-=Lindop and Tsujii 1991-=- and Dorr 1994) including many-to-many word mapping, argument switching, and head switching. A transfer model consists of a bilingual lexicon and a transfer parameter table. The model uses dependency ... |

8 | Qualitative and Quantitative Models of Speech Translation”. In The Balancing Act: Combining Symbolic and Statistical Approaches to Language, edited by - Alshawi |

8 |
Attachment and Transfer of Prepositional Phrases with Constraint Propogation
- Chen, Chen
- 1992
(Show Context)
Citation Context ...ences were hand-tagged for prepositional attachment points. (Prepositional phrase attachment is a major cause of ambiguity in the ATIS corpus, and moreover can affect English-Chinese translation, see =-=Chen and Chen 1992-=-.) The attachment information was used to generate additional negative and positive counts for dependency choices. The unsupervised training set consisted of approximately 13,000 sentences; it was use... |

5 |
Underspecified First Order Logics
- Alshawi
- 1995
(Show Context)
Citation Context ...expressions of a formalism for coding meaning independently of context or intended use. There is now greater understanding of the formal semantics of under-specified and ambiguous representations. In =-=Alshawi 1995-=-, we provide a denotational semantics for a simple under-specified language and argue for extending this treatment to a formal semantics of natural language strings as expressions of an under-specifie... |

5 |
Roossin: A Statistical Approach to Machine Transla tion
- Brown, John, et al.
- 1990
(Show Context)
Citation Context ...f joining the target fragments in a consistent fashion. The node mapping function f for the entire tree thus has a different role from the alignment function in the IBM statistical translation model (=-=Brown et al. 1990-=-, 1993); the role of the latter includes the linear ordering of words in the target string. In our approach, target word order is handled exclusively by the target monolingual model. 8 4.3 Transfer Al... |

3 |
Head-Driven Bidirectional Parsing
- Sata, Stock
- 1989
(Show Context)
Citation Context ...of the algorithm to allow input in the form of word lattices. 4 The algorithm is a bottom-up tabular parser (Younger 1967, Early 1970) in which constituents are constructed "head-outwards" (=-=Kay 1989, Sata and Stock 1989). Since w-=-e are analyzing bottomup with generative model automata, the algorithm `runs' the automata backwards. Edges in the parsing lattice (or "chart") are tuples representing partial or complete ph... |

2 |
Transfer and
- Isabelle, Macklovitch
- 1986
(Show Context)
Citation Context ...pendency costs c G 0 during the search, so these costs are taken into account in the pruning step of the overall search control. This way we can keep the benefits of monolingual/bilingual modularity (=-=Isabelle and Macklovitch 1986-=-) without the computational overhead of transfer-and-filter (Alshawi et al. 1992). It is possible to apply the subtree search directly to the whole graph starting with the initial runtime entries from... |

2 |
Attachment and Transfer
- Chen, Chen
- 1992
(Show Context)
Citation Context ...ences were hand-tagged for prepositional attachment points. (Prepositional phrase attachment is a major cause of ambiguity in the ATIS corpus, and moreover can affect English-Chinese translation, see =-=Chen and Chen 1992-=-.) The attachment information was used to generate additional negative and positive counts for dependency choices. The unsupervised training set consisted of approximately 13,000 sentences; it was use... |

1 | Probabilistic Representation of Forreal Languages - Booth - 1969 |