## Iterative translation disambiguation for cross-language information retrieval (2005)

### Cached

### Download Links

- [www.dcs.qmul.ac.uk]
- [www.eecs.qmul.ac.uk]
- [www.umiacs.umd.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval |

Citations: | 20 - 0 self |

### BibTeX

@INPROCEEDINGS{Monz05iterativetranslation,

author = {Christof Monz},

title = {Iterative translation disambiguation for cross-language information retrieval},

booktitle = {In SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval},

year = {2005},

pages = {520--527},

publisher = {ACM Press}

}

### OpenURL

### Abstract

Finding a proper distribution of translation probabilities is one of the most important factors impacting the effectiveness of a crosslanguage information retrieval system. In this paper we present a new approach that computes translation probabilities for a given query by using only a bilingual dictionary and a monolingual corpus in the target language. The algorithm combines term association measures with an iterative machine learning approach based on expectation maximization. Our approach considers only pairs of translation candidates and is therefore less sensitive to datasparseness issues than approaches using higher n-grams. The learned translation probabilities are used as query term weights and integrated into a vector-space retrieval system. Results for English-German cross-lingual retrieval show substantial improvements over a baseline using dictionary lookup without term weighting.

### Citations

9054 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...-strength computation with the prior probability of a translation given the other words in the query by computing them iteratively, in a fashion similar to the Expectation Maximization (EM) algorithm =-=[7]-=-. Initially, all possible translations of a source term are considered equally likely, where we associate with each translation candidate a weight wT (·|·) that it is indeed the appropriate translatio... |

2970 |
WordNet: an Electronic Lexical Database
- Fellbaum
- 1998
(Show Context)
Citation Context ...at are associated with the appropriate word sense. Unfortunately, word-sense disambiguation is a non-trivial enterprise and for most languages the appropriate resources, e.g., ontologies like WordNet =-=[10]-=-, do not exist. Also sense-annotated corpora that are used to train a word-sense disambiguation system are rare in foreign languages, and the process of building them is very laborious. Our alternativ... |

2401 | The PageRank citation ranking: Bringing order to the web
- Page, Brin, et al.
- 1999
(Show Context)
Citation Context ...ctor, and |Vk| is the absolute value of Vk. Then, the iteration stops if |wn T − wn−1 T |1 < θ. Note that the algorithm described above can also be considered a modification of the PageRank algorithm =-=[21]-=-, allowing for nodes in the network to be clustered. There are a number of ways to compute the association strength between two terms. We focus here on three alternatives: Pointwise mutual information... |

1976 |
An algorithm for suffix stripping
- PORTER
- 1980
(Show Context)
Citation Context ...h is a part-of-speech tagger that also provides the lemma (or base form) for each word. This form of morphological normalization is less aggressive than a rule-based stemmer, such as Porter’s stemmer =-=[23]-=-. Since the target language is German—a morphologically more complex language than English—additional normalization steps are required. The translation candidates need not be mapped to their 524 <num>... |

1091 |
Bootstrapping methods: Another look at the jackknife
- Efron
- 1979
(Show Context)
Citation Context ...e whether the observed differences between two retrieval approaches are statistically significant and not just caused by chance, we used the bootstrap method, a powerful non-parametric inference test =-=[9]-=-. The method was previously applied to retrieval evaluation [24, 28]. The basic idea of the bootstrap is to simulate the underlying distribution by randomly drawing (with replacement) a large number o... |

940 | An empirical study of smoothing techniques for language modeling. Computer Speech and Language
- SF, Goodman
- 1999
(Show Context)
Citation Context ...for higher n-gram models. The other approach for tackling data sparseness is smoothing. Several smoothing techniques have been developed and many of them are successfully applied in language modeling =-=[4]-=-. The problem is that smoothing techniques are generally evaluated with respect to bi- or tri-gram models. It is unclear to what extent these techniques scale up successfully to models using a larger ... |

900 | Word association norms, mutual information, and lexicography
- Church, Hanks
- 1990
(Show Context)
Citation Context ...erms. We focus here on three alternatives: Pointwise mutual information, Dice coefficient, and Log Likelihood Ratio. The point-wise mutual information between to terms t and t ′ is defined as follows =-=[5]-=-: MI(t, t ′ ) = log 2 k p(t, t ′ ) p(t) · p(t ′ ) where p(t, t ′ ) is the probability that the terms t and t ′ occur in the same document. Thus, as this value gets larger, the joint probability of t a... |

872 | Accurate methods for the statistics of surprise and coincidence
- Dunning
- 1993
(Show Context)
Citation Context ...formalizes independence between w 2 and w 1 . H2 states that the two probabilities are not the same and hence w 2 and w 1 do not occur independent of each other. The log likelihood is then defined as =-=[8]-=-: logλ = log L(H1) L(H2) = logL(c1,2, c1, p) + logL(c2 − c1,2, p) (6) −logL(c1,2, c1, p1) − logL(c2 − c1,2, N − c1, p2) (7) where p, p1, and p2 are defined as in H1 and H2 above, c1 is the frequency o... |

737 | Probabilistic part-of-speech tagging using decision trees
- Schmid
- 1994
(Show Context)
Citation Context ...tion for verbs and plural information for nouns. Since the dictionary only contains base forms, the words in the topics must be mapped to their respective base forms as well. Here, we used TreeTagger =-=[25]-=-, which is a part-of-speech tagger that also provides the lemma (or base form) for each word. This form of morphological normalization is less aggressive than a rule-based stemmer, such as Porter’s st... |

534 |
Bootstrap Methods and Their Application
- Davison, Hinkley
- 1997
(Show Context)
Citation Context ...mples are called bootstrap samples; we set the number of bootstrap samples to 2,000 as using the standard size of 1,000 has been shown to be a less reliable approach to inducing a normal distribution =-=[6]-=-. The mean and the standard error of the bootstrap samples allow computation of a confidence interval for different levels of confidence (typically 0.95 and higher). We compare two retrieval methods a... |

402 |
The Alignment Template Approach to Statistical Machine Translation
- Och, Ney
- 2004
(Show Context)
Citation Context ...e frequency of co-occurrences between a source-language word and a target-language word in the parallel corpus. Most successful statistical machine-translation systems exploit parallel corpora (e.g., =-=[20, 27]-=-). On the other hand, there are several drawbacks to the parallelcorpus approach. First, although parallel corpora are available for many of the European languages as well as for Arabic and Chinese, t... |

154 | Using the web to obtain frequencies for unseen bigrams
- Keller, Lapata
- 2003
(Show Context)
Citation Context ... use Internet search engines to compute frequencies for larger sizes of co-occurring terms. Researchers have shown that the Internet can be used to address the problem of data-sparseness for bi-grams =-=[14]-=- but it is unclear to what extent this approach will resolve the issue of data-sparseness for higher n-gram models. The other approach for tackling data sparseness is smoothing. Several smoothing tech... |

149 | New Retrieval Approaches Using SMART:TREC4
- Buckley, Singhal, et al.
- 1996
(Show Context)
Citation Context ...he CLEF 2003 test set. 4.1.3 Retrieval Model The model underlying our retrieval system is the standard vector space model. All our mono- and bi-lingual runs were based on the Lnu.ltc weighting scheme =-=[2]-=-. That is, to compute the similarity between between a query (q) and a document (d):ssim(q, d) = (8) 1+log(freqi,d ) � 1+log(avg j∈dfreq j,d ) i∈q∩d · freqi,q · log max j∈qfreq j,q � � N ni � � �i∈q f... |

106 |
The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval
- Pirkola
- 1998
(Show Context)
Citation Context ...ently evolved to the point where they are ripe for investigation in the context of cross-language retrieval, although others have not yet used these to the extent that we have. For example, Pirkola’s =-=[22]-=- approach does not consider disambiguation during query formulation at all. Pirkola uses structured queries to cluster together all translations of a word or phrase in the source topic. Disambiguation... |

48 | Shallow Morphological Analysis in Monolingual Information Retrieval for
- Monz, Rijke
- 2001
(Show Context)
Citation Context ...ual dictionary which contains only the base forms. On the other hand, compounds are very frequent in German and it has been shown that de-compounding can improve retrieval effectiveness substantially =-=[19]-=-. Instead of de-compounding, we use character n-grams, an approach that yields almost the same retrieval performance as decompounding. Specifically, it has been shown that using 5-grams leads to the b... |

43 |
Statistical inference in retrieval effectiveness evaluation
- Savoy
- 1997
(Show Context)
Citation Context ...ches are statistically significant and not just caused by chance, we used the bootstrap method, a powerful non-parametric inference test [9]. The method was previously applied to retrieval evaluation =-=[24, 28]-=-. The basic idea of the bootstrap is to simulate the underlying distribution by randomly drawing (with replacement) a large number of samples of size N from the original sample of N observations. Thes... |

40 | Combining Query Translation and Document Translation in Cross-Language Retrieval
- Chen, Gey
- 2003
(Show Context)
Citation Context ... translated query on the original document collection in the target language. The second approach, query translation, is by far the most common cross-lingual retrieval approach. However, Chen and Gey =-=[3]-=- showed that translating the entire document collection outperforms query translation and also that a combination of query translation and document translation can lead to further improvements in retr... |

32 |
Resolving query translation ambiguity using a decaying cooccurrence model and syntactic dependence relations
- Gao, Nie, et al.
- 2002
(Show Context)
Citation Context ...the maximum similarity scores between translation candidates for different query terms. Similar to the approach by Jang et al. her approach does not benefit from using multiple iterations. Gao et al. =-=[11]-=- use a decaying mutual-information score in combination with syntactic dependency relations. The decay factor is based on the average distance between two words in the target language. In our model, w... |

32 |
Rijke. Monolingual document retrieval for European languages
- Hollink, Kamps, et al.
(Show Context)
Citation Context ...formance as decompounding. Specifically, it has been shown that using 5-grams leads to the best performance among n-gram approaches, almost equalling the performance of a de-compounding approach, see =-=[12]-=-. Thus, we split all tokens in documents and translated queries into 5-grams, without crossing word boundaries, for all mono-lingual and cross-lingual runs. For the runs involving term weights, we mus... |

29 |
Using Mutual Information to Resolve Query Translation Ambiguities and Query Term Weighting
- Jang, Myaeng, et al.
- 1999
(Show Context)
Citation Context ...ions of other source words from the topic. This bias reduces the effect that co-occurrence with translations of other source words has on selecting an appropriate translation. The work by Jang et al. =-=[13]-=- is closely related to ours in that they also use a word-association measure, mutual information in their case, to re-compute translation probabilities for cross-language retrieval. Their approach dif... |

27 | Effective phrase translation extraction from alignment models
- Venugopal, Vogel, et al.
- 2004
(Show Context)
Citation Context ...e frequency of co-occurrences between a source-language word and a target-language word in the parallel corpus. Most successful statistical machine-translation systems exploit parallel corpora (e.g., =-=[20, 27]-=-). On the other hand, there are several drawbacks to the parallelcorpus approach. First, although parallel corpora are available for many of the European languages as well as for Arabic and Chinese, t... |

20 | Using statistical term similarity for sense disambiguation in cross-language information retrieval
- Adriani
(Show Context)
Citation Context ...nefit from the power of multiple iterations, as in our approach, where disambiguated information from a previous iteration induces more accurate decisions in the current iteration. Adriani’s approach =-=[1]-=- is similar to the approach by Jang et al. in that her approach also only uses the maximum similarity scores between translation candidates for different query terms. Similar to the approach by Jang e... |

19 | Query term disambiguation for Web cross-language information retrieval using a search engine
- Maeda, Sadat, et al.
- 2000
(Show Context)
Citation Context ...ics) are not in this form, but are typically just simple lists of noun phrases. For that reason we did not try to carry out any deeper linguistic analysis between the words in the topic. Maeda et al. =-=[17]-=- compare a number of co-occurrence statistics with respect to their usefulness for improving retrieval effectiveness. As in our own approach, they consider all pairs of possible translations of words ... |

13 |
Non-parametric significance tests of retrieval performance comparisons
- Wilbur
- 1994
(Show Context)
Citation Context ...ches are statistically significant and not just caused by chance, we used the bootstrap method, a powerful non-parametric inference test [9]. The method was previously applied to retrieval evaluation =-=[24, 28]-=-. The basic idea of the bootstrap is to simulate the underlying distribution by randomly drawing (with replacement) a large number of samples of size N from the original sample of N observations. Thes... |

3 |
Exploring statistics: a modern introduction to data analysis and inference. 2nd edition
- Kitchens
- 1998
(Show Context)
Citation Context ... is computed as described in Section 3. 4.1.4 Statistical Significance There are many techniques for drawing statistical inferences. The paired t-test is probably the best-known technique (see, e.g., =-=[16]-=-). Many of the inference techniques make certain assumptions about the data to which they are applied. The most common assumption, which also underlies the paired t-test, is that the data is taken fro... |

2 | Term-list translation using monolingual word co-occurrence vectors
- Kikui
- 1998
(Show Context)
Citation Context ...s of a word in the source topic. By contrast, our approach allows for a more fine-grained estimation of the usefulness of a particular translation in the context of the given topic. The work by Kikui =-=[15]-=- is also closely related to our work as it relies, in addition to a dictionary, only on monolingual resources in the target language in order to estimate translation weights. This approach computes th... |