Results 1 - 10
of
48
A hierarchical phrase-based model for statistical machine translation
- In ACL
, 2005
"... We present a statistical phrase-based translation model that uses hierarchical phrases— phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of ..."
Abstract
-
Cited by 257 (7 self)
- Add to MetaCart
We present a statistical phrase-based translation model that uses hierarchical phrases— phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of syntaxbased translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical phrasebased model achieves a relative improvement of 7.5 % over Pharaoh, a state-of-the-art phrase-based system. 1
The Computational Analysis of the Syntax and Interpretation of "Free" Word Order in Turkish
, 1995
"... ..."
Statistical Machine Translation by Parsing
, 2004
"... In an ordinary syntactic parser, the input is a string, and the grammar ranges over strings. This paper explores generalizations of ordinary parsing algorithms that allow the input to consist of string tuples and/or the grammar to range over string tuples. Such algorithms can infer the synchronous s ..."
Abstract
-
Cited by 55 (6 self)
- Add to MetaCart
In an ordinary syntactic parser, the input is a string, and the grammar ranges over strings. This paper explores generalizations of ordinary parsing algorithms that allow the input to consist of string tuples and/or the grammar to range over string tuples. Such algorithms can infer the synchronous structures hidden in parallel texts. It turns out that these generalized parsers can do most of the work required to train and apply a syntax-aware statistical machine translation system.
Multitext Grammars and Synchronous Parsers
- In Proceedings of the Human Language Technology Conference and the North American Association for Computational Linguistics (HLT-NAACL
, 2003
"... Multitext Grammars (MTGs) generate arbitrarily many parallel texts via production rules of arbitrary length. Both ordinary MTGs and their bilexical subclass admit relatively efficient parsers. Yet, MTGs are more expressive than other synchronous formalisms for which parsers have been described in th ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
Multitext Grammars (MTGs) generate arbitrarily many parallel texts via production rules of arbitrary length. Both ordinary MTGs and their bilexical subclass admit relatively efficient parsers. Yet, MTGs are more expressive than other synchronous formalisms for which parsers have been described in the literature. The combination of greater expressive power and relatively low cost of inference makes MTGs an attractive foundation for practical models of translational equivalence.
Global inference for sentence compression: An integer linear programming approach
- Journal of Artificial Intelligence Research (JAIR
, 2008
"... Sentence compression holds promise for many applications ranging from summarization to subtitle generation. Our work views sentence compression as an optimization problem and uses integer linear programming (ILP) to infer globally optimal compressions in the presence of linguistically motivated cons ..."
Abstract
-
Cited by 41 (2 self)
- Add to MetaCart
Sentence compression holds promise for many applications ranging from summarization to subtitle generation. Our work views sentence compression as an optimization problem and uses integer linear programming (ILP) to infer globally optimal compressions in the presence of linguistically motivated constraints. We show how previous formulations of sentence compression can be recast as ILPs and extend these models with novel global constraints. Experimental results on written and spoken texts demonstrate improvements over state-of-the-art models. 1.
Unsupervised Language Acquisition: Theory and Practice
, 2001
"... In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the so-called Argument from the Poverty of the Stimulus advanced in favour of the p ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the so-called Argument from the Poverty of the Stimulus advanced in favour of the proposition that humans have language-specific innate knowledge. I start by examining an a priori argument based on Gold's theorem, that purports to prove that natural languages cannot be learned, and some formal issues related to the choice of statistical grammars rather than symbolic grammars. I present three novel algorithms for learning various parts of natural languages: first, an algorithm for the induction of syntactic categories from unlabelled text using distributional information, that can deal with ambiguous and rare words; secondly, a set of algorithms for learning morphological processes in a variety of languages, including languages such as Arabic with nonconcatenative morphology; thirdly an algorithm for the unsupervised induction of a context-free grammar from tagged text. I carefully examine the interaction between the various components, and show how these algorithms can form the basis for a empiricist model of language acquisition. I therefore conclude that the Argument from the Poverty of the Stimulus is unsupported by the evidence.
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Lexicalized Markov grammars for sentence compression
, 2007
"... We present a sentence compression system based on synchronous context-free grammars (SCFG), following the successful noisy-channel approach of (Knight and Marcu, 2000). We define a headdriven Markovization formulation of SCFG deletion rules, which allows us to lexicalize probabilities of constituent ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
We present a sentence compression system based on synchronous context-free grammars (SCFG), following the successful noisy-channel approach of (Knight and Marcu, 2000). We define a headdriven Markovization formulation of SCFG deletion rules, which allows us to lexicalize probabilities of constituent deletions. We also use a robust approach for tree-to-tree alignment between arbitrary document-abstract parallel corpora, which lets us train lexicalized models with much more data than previous approaches relying exclusively on scarcely available document-compression corpora. Finally, we evaluate different Markovized models, and find that our selected best model is one that exploits head-modifier bilexicalization to accurately distinguish adjuncts from complements, and that produces sentences that were judged more grammatical than those generated by previous work. 1
Generalized Multitext Grammar
, 2004
"... Generalized Multitext Grammar (GMTG) is a synchronous grammar formalism that is weakly equivalent to ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
Generalized Multitext Grammar (GMTG) is a synchronous grammar formalism that is weakly equivalent to
Synchronous Models of Language
- IN PROCEEDINGS OF THE 34TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL
, 1996
"... In synchronous rewriting, the productions of two rewriting systems are paired and applied synchronously in the derivation of a pair of strings. We present a new synchronous rewriting system and argue that it can handle certain phenomena that are not covered by existing synchronous systems. ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
In synchronous rewriting, the productions of two rewriting systems are paired and applied synchronously in the derivation of a pair of strings. We present a new synchronous rewriting system and argue that it can handle certain phenomena that are not covered by existing synchronous systems.

