Results 1 -
5 of
5
Faster beam-search decoding for phrasal statistical machine translation
- In Proceedings of MT Summit XI
, 2007
"... Pharaoh is a widely-used state-of-the-art decoder for phrasal statistical machine translation. In this paper, we present two modifications to the algorithm used by Pharaoh that together permit much faster decoding without losing translation quality as measured by BLEU score. The first modification i ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Pharaoh is a widely-used state-of-the-art decoder for phrasal statistical machine translation. In this paper, we present two modifications to the algorithm used by Pharaoh that together permit much faster decoding without losing translation quality as measured by BLEU score. The first modification improves the estimated translation model score used by Pharaoh to evaluate partial hypotheses, by incorporating an estimate of the distortion penalty to be incurred in translating the rest of the sentence. The second modification uses early pruning of possible next-phrase translations to cut down the overall size of the search space. These modifications enable decoding speed-ups of an order of magnitude or more, with no reduction in the BLEU score of the resulting translations. 1.
Statistical query translation models for cross-language information retrieval
- ACM Transactions on Asian Language Information Processing (TALIP
, 2006
"... Query translation is an important task in cross-language information retrieval (CLIR), which aims to determine the best translation words and weights for a query. This paper presents three statistical query translation models that focus on resolution of query translation ambiguities. All the models ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Query translation is an important task in cross-language information retrieval (CLIR), which aims to determine the best translation words and weights for a query. This paper presents three statistical query translation models that focus on resolution of query translation ambiguities. All the models assume that the selection of the translation of a query term depends on the translations of other terms in the query. They differ in the way linguistic structures are detected and exploited. The co-occurrence model treats a query as a bag of words, and use all the other terms in the query as the context for translation disambiguation. The other two models exploit linguistic dependencies among terms. The noun phrase (NP) translation model detects NPs in a query, and translates each NP as a unit by assuming that the translation of a term only depends on other terms within the same NP. Similarly, the dependency translation model detects and translates dependency triples, such as verb-object, as units. The evaluations show that linguistic structures always lead to more precise translations. The experiments of CLIR on TREC Chinese collections show that all the three models have a positive impact on query translation, and lead to significant improvements of CLIR performance over the simple dictionary-based translation method. The best results are obtained by combining the three models.
Spoken language technologies applied to digital talking books
- in Proceedings of Interspeech
, 2006
"... Digital Talking Books (DTBs) offer to visually impaired users an evolution of analogue talking books that mimics the interaction possibilities of print books. This paper describes a new DTB player which tries to improve the usability and accessibility of current players, through the combination of t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Digital Talking Books (DTBs) offer to visually impaired users an evolution of analogue talking books that mimics the interaction possibilities of print books. This paper describes a new DTB player which tries to improve the usability and accessibility of current players, through the combination of the possibilities offered by multimodal interaction and interface adaptability, and the integration of several language processing components. Besides the potential for a greater enjoyment of the reader in general, these modifications also pave the way to the use of DTBs in different domains, from e-inclusion to e-learning applications. Index Terms: digital talking books, Portuguese. 1.
Web-Based Machine Translation
, 2003
"... Abstract This chapter has two main aims: (i) to present the state-of-the-art in Machine Translation (MT), namely Phrase-Based Statistical MT, together with the major competing paradigms used in MT research and development today; and (ii) to provide an overview of the MT research carried out by my te ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract This chapter has two main aims: (i) to present the state-of-the-art in Machine Translation (MT), namely Phrase-Based Statistical MT, together with the major competing paradigms used in MT research and development today; and (ii) to provide an overview of the MT research carried out by my team here at DCU, characterised here in terms of ‘hybrid MT’. In addition, we provide our views on the directions that MT research might take in the near future, and conclude the chapter with lists of further reading for the interested reader.
Textual Representations for Corpus-Based Bilingual Retrieval
, 2008
"... The traditional approach to information retrieval is based on using words as the indexing and search terms for documents. However, word-based representations have difficulty addressing morphological processes that confound retrieval, such as inflection, derivation, and compounding. One part of this ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The traditional approach to information retrieval is based on using words as the indexing and search terms for documents. However, word-based representations have difficulty addressing morphological processes that confound retrieval, such as inflection, derivation, and compounding. One part of this research investigates alternative methods for representing text, including a method based on overlapping sequences of characters called n-gram tokenization. N-grams are studied in depth and one notable finding is that they achieve a 20 % improvement in retrieval effectiveness over words in certain situations. The other focus of this research is improving retrieval performance when foreign language documents must be searched and translation is required. In this scenario bilingual dictionaries are often used to translate user queries; however even among the most commonly spoken languages, for which large bilingual lexicons exist, dictionary-based translation suffers from several significant problems. These include: difficulty handling proper names, which are often missing; issues related to morphological variation since entries, or query terms, may not be lemmatized; and, an inability to robustly handle multiword phrases, especially non-compositional expressions. These problems can be addressed when

