Results 1  10
of
93
Hierarchical phrasebased translation
 Computational Linguistics
, 2007
"... We present a statistical machine translation model that uses hierarchical phrases—phrases that contain subphrases. The model is formally a synchronous contextfree grammar but is learned from a parallel text without any syntactic annotations. Thus it can be seen as combining fundamental ideas from b ..."
Abstract

Cited by 375 (7 self)
 Add to MetaCart
We present a statistical machine translation model that uses hierarchical phrases—phrases that contain subphrases. The model is formally a synchronous contextfree grammar but is learned from a parallel text without any syntactic annotations. Thus it can be seen as combining fundamental ideas from both syntaxbased translation and phrasebased translation. We describe our system’s training and decoding methods in detail, and evaluate it for translation speed and translation accuracy. Using BLEU as a metric of translation accuracy, we find that our system performs significantly better than the Alignment Template System, a stateoftheart phrasebased system. 1.
Verification on Infinite Structures
, 2000
"... In this chapter, we present a hierarchy of infinitestate systems based on the primitive operations of sequential and parallel composition; the hierarchy includes a variety of commonlystudied classes of systems such as contextfree and pushdown automata, and Petri net processes. We then examine the ..."
Abstract

Cited by 69 (2 self)
 Add to MetaCart
In this chapter, we present a hierarchy of infinitestate systems based on the primitive operations of sequential and parallel composition; the hierarchy includes a variety of commonlystudied classes of systems such as contextfree and pushdown automata, and Petri net processes. We then examine the equivalence and regularity checking problems for these classes, with special emphasis on bisimulation equivalence, stressing the structural techniques which have been devised for solving these problems. Finally, we explore the model checking problem over these classes with respect to various linear and branchingtime temporal logics.
Robust Grammatical Analysis for Spoken Dialogue Systems
 Natural Language Engineering
, 1997
"... We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of inform ..."
Abstract

Cited by 50 (8 self)
 Add to MetaCart
We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of information. We discuss test results suggesting that grammatical processing allows fast and accurate processing of spoken input.
On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing
 In Proc. EMNLP
, 2010
"... This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamicprogramming algorithms as oracle solvers for subproblems, together with a simple method for forcing agreement between the different oracles. The approa ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamicprogramming algorithms as oracle solvers for subproblems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram partofspeech tagger. 1
Recognition can be Harder than Parsing
 Computational Intelligence
, 1992
"... this paper is to discuss the scope and limitations of this approach, and to examine the suitability of several syntactic formalisms on the criterion of their ability to handle it. 2 Parsing as intersection ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
this paper is to discuss the scope and limitations of this approach, and to examine the suitability of several syntactic formalisms on the criterion of their ability to handle it. 2 Parsing as intersection
Propositional Dynamic Logic of Nonregular Programs
 Journal of Computer and System Sciences
, 1983
"... this paper indicate that this line is extremely close to the original regular PDL ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
this paper indicate that this line is extremely close to the original regular PDL
The intersection of Finite State Automata and Definite Clause Grammars
, 1995
"... Bernard Lang defines parsing as the calculation of the intersection of a FSA (the input) and a CFG. Viewing the input for parsing as a FSA rather than as a string combines well with some approaches in speech understanding systems, in which parsing takes a word lattice as input (rather than a word st ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
Bernard Lang defines parsing as the calculation of the intersection of a FSA (the input) and a CFG. Viewing the input for parsing as a FSA rather than as a string combines well with some approaches in speech understanding systems, in which parsing takes a word lattice as input (rather than a word string). Furthermore, certain techniques for robust parsing can be modelled as finite state transducers.
Famous trails to Paul Erdös
 MATHEMATICAL INTELLIGENCER
, 1999
"... The notion of Erdös number has floated around the mathematical research community for more than thirty years, as a way to quantify the common knowledge that mathematical and scientific research has become a very collaborative process in the twentieth century, not an activity engaged in solely by ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
The notion of Erdös number has floated around the mathematical research community for more than thirty years, as a way to quantify the common knowledge that mathematical and scientific research has become a very collaborative process in the twentieth century, not an activity engaged in solely by isolated individuals. In this paper we explore some (fairly short) collaboration paths that one can follow from Paul Erdös to researchers inside and outside of mathematics. In particular, we find that all the Fields Medalists up through 1998 have Erdos numbers less than 6, and that over 60 Nobel Prize winners in physics, chemistry, economics, and medicine have Erdös numbers less than 9.
A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing
"... Via an oracle experiment, we show that the upper bound on accuracy of a CCG parser is significantly lowered when its search space is pruned using a supertagger, though the supertagger also prunes many bad parses. Inspired by this analysis, we design a single model with both supertagging and parsing ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Via an oracle experiment, we show that the upper bound on accuracy of a CCG parser is significantly lowered when its search space is pruned using a supertagger, though the supertagger also prunes many bad parses. Inspired by this analysis, we design a single model with both supertagging and parsing features, rather than separating them into distinct models chained together in a pipeline. To overcome the resulting increase in complexity, we experiment with both belief propagation and dual decomposition approaches to inference, the first empirical comparison of these algorithms that we are aware of on a structured natural language processing problem. On CCGbank we achieve a labelled dependency Fmeasure of 88.8 % on gold POS tags, and 86.7 % on automatic partofspeeoch tags, the best reported results for this task. 1
Predicate invention and learning from positive examples only
 In Proceedings of the Tenth European Conference on Machine Learning
, 1998
"... Abstract. Previous bias shift approaches to predicate invention are not applicable to learning from positive examples only, if a complete hypothesis can be found in the given language, as negative examples are required to determine whether new predicates should be invented or not. One approach to th ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Abstract. Previous bias shift approaches to predicate invention are not applicable to learning from positive examples only, if a complete hypothesis can be found in the given language, as negative examples are required to determine whether new predicates should be invented or not. One approach to this problem is presented, MERLIN 2.0, which is a successor of a system in which predicate invention is guided by sequences of input clauses in SLDrefutations of positive and negative examples w.r.t. an overly general theory. In contrast to its predecessor which searches for the minimal nitestate automaton that can generate all positive and no negative sequences, MERLIN 2.0 uses a technique for inducing Hidden Markov Modelsfrom positive sequences only. This enables the system to invent new predicates without being triggered by negative examples. Another advantage of using this induction technique is that it allows for incremental learning. Experimental results are presented comparing MERLIN 2.0 with the positive only learning framework of Progol 4.2 and comparing the original induction technique with a new version that produces deterministic Hidden Markov Models. The results show that predicate invention may indeed be both necessary and possible when learning from positive examples only as well as it can be bene cial to keep the induced model deterministic. 1