Results 11 -
18 of
18
Weighted Extended Tree Transducers
, 2010
"... The first systematic treatment of weighted extended tree transducers (wxtt) over countably complete semirings is provided. It is proved that the extension in the left-hand sides of a wxtt can be simulated by the inverse of a linear and nondeleting tree homomorphism. In addition, a characterization o ..."
Abstract
- Add to MetaCart
The first systematic treatment of weighted extended tree transducers (wxtt) over countably complete semirings is provided. It is proved that the extension in the left-hand sides of a wxtt can be simulated by the inverse of a linear and nondeleting tree homomorphism. In addition, a characterization of weighted tree transformations computed by bottom up wxtt in terms of bimorphisms is provided. Backward and forward application of wxtt to recognizable weighted tree languages are considered. It is shown that the backward application of a linear wxtt preserves recognizability and that the domain of an arbitrary bottom-up wxtt is recognizable. Examples demonstrate that neither backward nor forward application of arbitrary wxtt preserves recognizability. Finally, a Hasse diagram relates most of the important subclasses of weighted tree transformations computed by wxtt.
An Alternative to Synchronous Tree Substitution Grammars †
, 2010
"... Synchronous tree substitution grammars (stsg) are a (formal) tree transformation model that is used in the area of syntax-based machine translation. A competitor that is at least as expressive as stsg is proposed and compared to stsg. The competitor is the extended multi bottom-up tree transducer (m ..."
Abstract
- Add to MetaCart
Synchronous tree substitution grammars (stsg) are a (formal) tree transformation model that is used in the area of syntax-based machine translation. A competitor that is at least as expressive as stsg is proposed and compared to stsg. The competitor is the extended multi bottom-up tree transducer (mbot), which is the bottom-up analogue with the additional feature that states have non-unary ranks. Unweighted mbot have already been investigated with respect to their basic properties, but the particular properties of the constructions that are required in the machine translation task are largely unknown. stsg and mbot are compared with respect to binarization, regular restriction, and application. Particular attention is paid to the complexity of the constructions. 1
How to train your multi bottom-up tree transducer
"... The local multi bottom-up tree transducer is introduced and related to the (non-contiguous) synchronous tree sequence substitution grammar. It is then shown how to obtain a weighted local multi bottom-up tree transducer from a bilingual and biparsed corpus. Finally, the problem of non-preservation o ..."
Abstract
- Add to MetaCart
The local multi bottom-up tree transducer is introduced and related to the (non-contiguous) synchronous tree sequence substitution grammar. It is then shown how to obtain a weighted local multi bottom-up tree transducer from a bilingual and biparsed corpus. Finally, the problem of non-preservation of regularity is addressed. Three properties that ensure preservation are introduced, and it is discussed how to adjust the rule extraction process such that they are automatically fulfilled. 1
Survey: Weighted Extended Top-down Tree Transducers Part II -- Application in Machine Translation
, 2011
"... In this second part of the survey, we present the application of weighted extended topdown tree transducers in machine translation, which is the automatic translation of natural language texts. We present several formal properties that are relevant in machine translation and evaluate the weighted e ..."
Abstract
- Add to MetaCart
In this second part of the survey, we present the application of weighted extended topdown tree transducers in machine translation, which is the automatic translation of natural language texts. We present several formal properties that are relevant in machine translation and evaluate the weighted extended top-down tree transducer along those criteria. In addition, we demonstrate how to extract rules for an extended top-down tree transducer from existing linguistic data and how to obtain suitable rule weights automatically from similar information. Overall, the aim of the survey is twofold. It should provide a synopsis that illustrates how theory (tree transducers) and practice (machine translation) interact on this particular example. Secondly, it presents a uniform and simplified treatment of the rule extraction and training algorithms that is accessible to the nonexpert. Additional details can be found in the original results that are referenced throughout the text.
Composing extended top-down tree transducers ∗
"... A composition procedure for linear and nondeleting extended top-down tree transducers is presented. It is demonstrated that the new procedure is more widely applicable than the existing methods. In general, the result of the composition is an extended top-down tree transducer that is no longer linea ..."
Abstract
- Add to MetaCart
A composition procedure for linear and nondeleting extended top-down tree transducers is presented. It is demonstrated that the new procedure is more widely applicable than the existing methods. In general, the result of the composition is an extended top-down tree transducer that is no longer linear or nondeleting, but in a number of cases these properties can easily be recovered by a post-processing step. 1
Varro: An Algorithm and Toolkit for Regular Structure Discovery in
"... The Varro toolkit is a system for identifying and counting a major class of regularity in treebanks and annotated natural language data in the form of treestructures: frequently recurring unordered subtrees. This software has been designed for use in linguistics to be maximally applicable to actuall ..."
Abstract
- Add to MetaCart
The Varro toolkit is a system for identifying and counting a major class of regularity in treebanks and annotated natural language data in the form of treestructures: frequently recurring unordered subtrees. This software has been designed for use in linguistics to be maximally applicable to actually existing treebanks and other stores of tree-structurable natural language data. It minimizes memory use so that moderately large treebanks are tractable on commonly available computer hardware. This article introduces condensed canonically ordered trees as a data structure for efficiently discovering frequently recurring unordered subtrees.
Tree Parsing with Synchronous Tree-Adjoining Grammars
, 2011
"... Restricting the input or the output of a grammar-induced translation to a given set of trees plays an important role in statistical machine translation. The problem for practical systems is to find a compact (and in particular, finite) representation of said restriction. For the class of synchronous ..."
Abstract
- Add to MetaCart
Restricting the input or the output of a grammar-induced translation to a given set of trees plays an important role in statistical machine translation. The problem for practical systems is to find a compact (and in particular, finite) representation of said restriction. For the class of synchronous treeadjoining grammars, partial solutions to this problem have been described, some being restricted to the unweighted case, some to the monolingual case. We introduce a formulation of this class of grammars which is effectively closed under input and output restrictions to regular tree languages, i.e., the restricted translations can again be represented by grammars. Moreover, we present an algorithm that constructs these grammars for input and output restriction, which is inspired by Earley’s algorithm.

