Results 1  10
of
72
Better kbest parsing
, 2005
"... We discuss the relevance of kbest parsing to recent applications in natural language processing, and develop efficient algorithms for kbest trees in the framework of hypergraph parsing. To demonstrate the efficiency, scalability and accuracy of these algorithms, we present experiments on Bikel’s i ..."
Abstract

Cited by 190 (17 self)
 Add to MetaCart
We discuss the relevance of kbest parsing to recent applications in natural language processing, and develop efficient algorithms for kbest trees in the framework of hypergraph parsing. To demonstrate the efficiency, scalability and accuracy of these algorithms, we present experiments on Bikel’s implementation of Collins ’ lexicalized PCFG model, and on Chiang’s CFGbased decoder for hierarchical phrasebased translation. We show in particular how the improved output of our algorithms has the potential to improve results from parse reranking systems and other applications. 1
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of humanproduced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract

Cited by 86 (6 self)
 Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of humanproduced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of stateoftheart SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Synchronous binarization for machine translation
 In Proc. HLTNAACL
, 2006
"... Systems based on synchronous grammars and tree transducers promise to improve the quality of statistical machine translation output, but are often very computationally intensive. The complexity is exponential in the size of individual grammar rules due to arbitrary reorderings between the two langu ..."
Abstract

Cited by 51 (11 self)
 Add to MetaCart
(Show Context)
Systems based on synchronous grammars and tree transducers promise to improve the quality of statistical machine translation output, but are often very computationally intensive. The complexity is exponential in the size of individual grammar rules due to arbitrary reorderings between the two languages, and rules extracted from parallel corpora can be quite large. We devise a lineartime algorithm for factoring syntactic reorderings by binarizing synchronous rules when possible and show that the resulting rule set significantly improves the speed and accuracy of a stateoftheart syntaxbased machine translation system. 1
THE POWER OF EXTENDED TOPDOWN TREE TRANSDUCERS
"... Extended topdown tree transducers (transducteurs generalises descendants [Arnold, Dauchet: Bitransductions de forets. ICALP'76. Edinburgh University Press. 1976]) received renewed interest in the field of Natural Language Processing. Here those transducers are extensively and systematically s ..."
Abstract

Cited by 37 (23 self)
 Add to MetaCart
(Show Context)
Extended topdown tree transducers (transducteurs generalises descendants [Arnold, Dauchet: Bitransductions de forets. ICALP'76. Edinburgh University Press. 1976]) received renewed interest in the field of Natural Language Processing. Here those transducers are extensively and systematically studied. Their main properties are identified and their relation to classical topdown tree transducers is exactly characterized. The obtained properties completely explain the Hasse diagram of the induced classes of tree transformations. In addition, it is shown that most interesting classes of transformations computed by extended topdown tree transducers are not closed under composition.
Tiburon: A Weighted Tree Automata Toolkit
, 2006
"... The availability of weighted finitestate string automata toolkits made possible great advances in natural language processing. However, recent advances in syntaxbased NLP model design are unsuitable for these toolkits. To combat this problem, we introduce a weighted finitestate tree automata to ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
The availability of weighted finitestate string automata toolkits made possible great advances in natural language processing. However, recent advances in syntaxbased NLP model design are unsuitable for these toolkits. To combat this problem, we introduce a weighted finitestate tree automata toolkit, which incorporates recent developments in weighted tree automata theory and is useful for natural language applications such as machine translation, sentence compression, question answering, and many more.
Capturing Practical Natural Language Transformations
"... We study automata for capturing transformations employed by practical natural language processing systems, such as those that translate between human languages. For several variations of finitestate string and tree transducers, we ask formal questions about expressiveness, modularity, teachability, ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
We study automata for capturing transformations employed by practical natural language processing systems, such as those that translate between human languages. For several variations of finitestate string and tree transducers, we ask formal questions about expressiveness, modularity, teachability, and generalization.
The power of extended topdown tree transducers
 SIAM J. COMPUT
, 2008
"... Unfortunately, the class of transformations computed by linear extended topdown tree transducers with regular lookahead is not closed under composition. It is shown that the class of transformations computed by certain linear bimorphisms coincides with the previously mentioned class. Moreover, it ..."
Abstract

Cited by 20 (16 self)
 Add to MetaCart
(Show Context)
Unfortunately, the class of transformations computed by linear extended topdown tree transducers with regular lookahead is not closed under composition. It is shown that the class of transformations computed by certain linear bimorphisms coincides with the previously mentioned class. Moreover, it is demonstrated that every linear epsilonfree extended topdown tree transducer with regular lookahead can be implemented by a linear multi bottomup tree transducer. The class of transformations computed by the latter device is shown to be closed under composition, and to be included in the composition of the class of transformations computed by topdown tree transducers with itself. More precisely, it constitutes the composition closure of the class of transformations computed by nitecopying topdown tree transducers.
An Introduction to Synchronous Grammars
, 2006
"... Synchronous contextfree grammars are a generalization of contextfree grammars (CFGs) that generate ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
Synchronous contextfree grammars are a generalization of contextfree grammars (CFGs) that generate
Robust web extraction: an approach based on a probabilistic treeedit model
 In SIGMOD
"... On scriptgenerated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus the tree structure evolve over time, causing wrappers to break repeatedly, and resulting in a high cost of maintaining wra ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
On scriptgenerated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus the tree structure evolve over time, causing wrappers to break repeatedly, and resulting in a high cost of maintaining wrappers. In this paper, we explore a novel approach: we use temporal snapshots of web pages to develop a treeedit model of HTML, and use this model to improve wrapper construction. We view the changes to the tree structure as suppositions of a series of edit operations: deleting nodes, inserting nodes and substituting labels of nodes. The tree structures evolve by choosing these edit operations stochastically. Our model is attractive in that the probability that a source tree has evolved into a target tree can be estimated efficiently—in quadratic time in the size of the trees—making it a potentially useful tool for a variety of treeevolution problems. We give an algorithm to learn the probabilistic model from training examples consisting of pairs of trees, and apply this algorithm to collections of webpage snapshots to derive HTMLspecific tree edit models. Finally, we describe a novel wrapperconstruction framework that takes the treeedit model into account, and compare the quality of resulting wrappers to that of traditional wrappers on synthetic and real HTML document examples. 1.
Backward and forward bisimulation minimisation of tree automata
, 2007
"... Abstract. We improve an existing bisimulation minimisation algorithm for tree automata by introducing backward and forward bisimulations and developing minimisation algorithms for them. Minimisation via forward bisimulation is also effective for deterministic automata and faster than the previous al ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We improve an existing bisimulation minimisation algorithm for tree automata by introducing backward and forward bisimulations and developing minimisation algorithms for them. Minimisation via forward bisimulation is also effective for deterministic automata and faster than the previous algorithm. Minimisation via backward bisimulation generalises the previous algorithm and is thus more effective but just as fast. We demonstrate implementations of these algorithms on a typical task in natural language processing.