• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages

Cached

  • Download as a PDF

Download Links

  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www-2.cs.cmu.edu]
  • [www-2.cs.cmu.edu]
  • [www-2.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www-2.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www-2.cs.cmu.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Vamshi Ambati , Alon Lavie , Jaime Carbonell
Citations:5 - 1 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Ambati_extractionof,
    author = {Vamshi Ambati and Alon Lavie and Jaime Carbonell},
    title = {Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages},
    year = {}
}

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

We propose a generic rule induction framework that is informed by syntax from both sides of a parsed parallel corpus, as sets of structural, boundary and labeling related constraints. Factoring syntax in this manner empowers our framework to work with independent annotations coming from multiple resources and not necessarily a single syntactic structure. We then explore the issue of lexical coverage of translation models learned in different scenarios using syntax from one side vs. both sides. We specifically look at how the non-isomorphic nature of parse trees for the two languages affects coverage. We propose a novel technique for restructuring targetside parse trees, that generates alternate isomorphic target trees that preserve the syntactic boundaries of constituents that were aligned in the original parse trees. We also show that combining rules extracted by restructuring syntactic trees on both sides produces significantly better translation models. The improved precision and coverage of our syntax tables particularly fill in for the lack of lexical coverage in Syntax based Machine Translation approaches. 1

Citations

805 A systematic comparison of various statistical alignment models - Och, Ney - 2003
257 A hierarchical phrase-based model for statistical machine translation - Chiang - 2005
202 K.: A syntax-based statistical translation model - Yamada, Knight - 2001
162 D.: What’s in a translation rule - Galley, Hopkins, et al. - 2004
63 SPMT: Statistical machine translation with syntactified target language phrases - Marcu, Wang, et al. - 2006
19 Binarizing syntax trees to improve syntax-based machine translation accuracy - Wang, Knight, et al.
9 Callison-Burch et al. 2007. Moses: Open source toolkit for statistical machine translation - Koehn, Hoang, et al.
9 2008. Syntaxdriven learning of sub-sentential translation equivalents and translation rules from parsed parallel corpora - Lavie, Parlikar, et al.
8 Robust language-pair independent sub-tree alignment - Tinsley, Zhechev, et al. - 2007
2 decoding with syntactic and non-syntactic phrases in a syntax-based machine translation system - Hanneman, Lavie - 2009
2 Fast extract inference with a factored model for natural language parsing - Klein, Manning - 2002
2 Salm: Suffix array and its applications in empirical language processing - Zhang, Vogel - 2006
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University