Results 11  20
of
50
Learning for semantic parsing using statistical machine translation techniques
, 2005
"... Semantic parsing is the construction of a complete, formal, symbolic meaning representation of a sentence. While it is crucial to natural language understanding, the problem of semantic parsing has received relatively little attention from the machine learning community. Recent work on natural langu ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Semantic parsing is the construction of a complete, formal, symbolic meaning representation of a sentence. While it is crucial to natural language understanding, the problem of semantic parsing has received relatively little attention from the machine learning community. Recent work on natural language understanding has mainly focused on shallow semantic analysis, such as wordsense disambiguation and semantic role labeling. Semantic parsing, on the other hand, involves deep semantic analysis in which word senses, semantic roles and other components are combined to produce useful meaning representations for a particular application domain (e.g. database query). Prior research in machine learning for semantic parsing is mainly based on inductive logic programming or deterministic parsing, which lack some of the robustness that characterizes statistical learning. Existing statistical approaches to semantic parsing, however, are mostly concerned with relatively simple application domains in which a meaning representation is no more than a single semantic frame. In this proposal, we present a novel statistical approach to semantic parsing, WASP, which can handle meaning representations with a nested structure. The WASP algorithm learns a semantic parser given a set of sentences annotated with their correct meaning representations. The parsing model is based on the
Robust web extraction: an approach based on a probabilistic treeedit model
 In SIGMOD
"... On scriptgenerated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus the tree structure evolve over time, causing wrappers to break repeatedly, and resulting in a high cost of maintaining wra ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
On scriptgenerated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus the tree structure evolve over time, causing wrappers to break repeatedly, and resulting in a high cost of maintaining wrappers. In this paper, we explore a novel approach: we use temporal snapshots of web pages to develop a treeedit model of HTML, and use this model to improve wrapper construction. We view the changes to the tree structure as suppositions of a series of edit operations: deleting nodes, inserting nodes and substituting labels of nodes. The tree structures evolve by choosing these edit operations stochastically. Our model is attractive in that the probability that a source tree has evolved into a target tree can be estimated efficiently—in quadratic time in the size of the trees—making it a potentially useful tool for a variety of treeevolution problems. We give an algorithm to learn the probabilistic model from training examples consisting of pairs of trees, and apply this algorithm to collections of webpage snapshots to derive HTMLspecific tree edit models. Finally, we describe a novel wrapperconstruction framework that takes the treeedit model into account, and compare the quality of resulting wrappers to that of traditional wrappers on synthetic and real HTML document examples. 1.
Bisimulation Minimisation for Weighted Tree Automata
, 2007
"... We generalise existing forward and backward bisimulation minimisation algorithms for tree automata to weighted tree automata. The obtained algorithms work for all semirings and retain the time complexity of their unweighted variants for all additively cancellative semirings. On all other semirings t ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
We generalise existing forward and backward bisimulation minimisation algorithms for tree automata to weighted tree automata. The obtained algorithms work for all semirings and retain the time complexity of their unweighted variants for all additively cancellative semirings. On all other semirings the time complexity is slightly higher (linear instead of logarithmic in the number of states). We discuss implementations of these algorithms on a typical task in natural language processing.
Bayesian inference for finitestate transducers
 in HLTNAACL
, 2010
"... We describe a Bayesian inference algorithm that can be used to train any cascade of weighted finitestate transducers on endtoend data. We also investigate the problem of automatically selecting from among multiple training runs. Our experiments on four different tasks demonstrate the genericity of ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
We describe a Bayesian inference algorithm that can be used to train any cascade of weighted finitestate transducers on endtoend data. We also investigate the problem of automatically selecting from among multiple training runs. Our experiments on four different tasks demonstrate the genericity of this framework, and, where applicable, large improvements in performance over EM. We also show, for unsupervised partofspeech tagging, that automatic run selection gives a large improvement over previous Bayesian approaches. 1
Learning for Semantic Parsing and Natural Language Generation Using Statistical Machine Translation Techniques
, 2007
"... ..."
Minimizing Deterministic Weighted Tree Automata
, 2008
"... The problem of efficiently minimizing deterministic weighted tree automata (wta) is investigated. Such automata have found promising applications as language models in Natural Language Processing. A polynomialtime algorithm is presented that given a deterministic wta over a commutative semifield, o ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
The problem of efficiently minimizing deterministic weighted tree automata (wta) is investigated. Such automata have found promising applications as language models in Natural Language Processing. A polynomialtime algorithm is presented that given a deterministic wta over a commutative semifield, of which all operations including the computation of the inverses are polynomial, constructs an equivalent minimal (with respect to the number of states) deterministic and total wta. If the semifield operations can be performed in constant time, then the algorithm runs in time O(rmn 4) where r is the maximal rank of the input symbols, m is the number of transitions, and n is the number of states of the input wta.
Fluency Constraints for Minimum BayesRisk Decoding of Statistical Machine Translation Lattices
"... A novel and robust approach to improving statistical machine translation fluency is developed within a minimum Bayesrisk decoding framework. By segmenting translation lattices according to confidence measures over the maximum likelihood translation hypothesis we are able to focus on regions with pot ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
A novel and robust approach to improving statistical machine translation fluency is developed within a minimum Bayesrisk decoding framework. By segmenting translation lattices according to confidence measures over the maximum likelihood translation hypothesis we are able to focus on regions with potential translation errors. Hypothesis space constraints based on monolingual coverage are applied to the low confidence regions to improve overall translation fluency. 1
A Tree Transducer Model for Synchronous TreeAdjoining Grammars
"... A characterization of the expressive power of synchronous treeadjoining grammars (STAGs) in terms of tree transducers (or equivalently, synchronous tree substitution grammars) is developed. Essentially, a STAG corresponds to an extended tree transducer that uses explicit substitution in both the in ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
A characterization of the expressive power of synchronous treeadjoining grammars (STAGs) in terms of tree transducers (or equivalently, synchronous tree substitution grammars) is developed. Essentially, a STAG corresponds to an extended tree transducer that uses explicit substitution in both the input and output. This characterization allows the easy integration of STAG into toolkits for extended tree transducers. Moreover, the applicability of the characterization to several representational and algorithmic problems is demonstrated. 1
MACHINE TRANSLATION BY PATTERN MATCHING
, 2008
"... The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amoun ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amount of data we can exploit and the complexity of models we can use are available memory and CPU time, and current state of the art already pushes these limits. With data size and model complexity continually increasing, a scalable solution to this problem is central to future improvement. CallisonBurch et al. (2005) and Zhang and Vogel (2005) proposed a solution that we call translation by pattern matching, which we bring to fruition in this dissertation. The training data itself serves as a proxy to the model; rules and parameters are computed on demand. It achieves our desiderata of minimal offline computation and compact representation, but is dependent on fast pattern matching algorithms on text. They demonstrated its application to a common model based on the translation of contiguous substrings, but leave some open problems. Among these is a question: can this approach match the performance of conventional methods despite unavoidable differences that it induces in the model? We show how to answer this question affirmatively. The main
Weighted Extended Tree Transducers
, 2010
"... The first systematic treatment of weighted extended tree transducers (wxtt) over countably complete semirings is provided. It is proved that the extension in the lefthand sides of a wxtt can be simulated by the inverse of a linear and nondeleting tree homomorphism. In addition, a characterization o ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
The first systematic treatment of weighted extended tree transducers (wxtt) over countably complete semirings is provided. It is proved that the extension in the lefthand sides of a wxtt can be simulated by the inverse of a linear and nondeleting tree homomorphism. In addition, a characterization of weighted tree transformations computed by bottom up wxtt in terms of bimorphisms is provided. Backward and forward application of wxtt to recognizable weighted tree languages are considered. It is shown that the backward application of a linear wxtt preserves recognizability and that the domain of an arbitrary bottomup wxtt is recognizable. Examples demonstrate that neither backward nor forward application of arbitrary wxtt preserves recognizability. Finally, a Hasse diagram relates most of the important subclasses of weighted tree transformations computed by wxtt.