Results 11 - 20
of
34
A Probabilistic Multimedia Retrieval Model and Its Evaluation
- EURASIP Journal on Applied Signal Processing
, 2003
"... In this paper we present a probabilistic model for the retrieval of multimodal documents. The model is based on Bayesian decision theory and combines models for text based search with models for visual search. The textual model is based on the language modelling approach to text retrieval and the vi ..."
Abstract
-
Cited by 18 (11 self)
- Add to MetaCart
In this paper we present a probabilistic model for the retrieval of multimodal documents. The model is based on Bayesian decision theory and combines models for text based search with models for visual search. The textual model is based on the language modelling approach to text retrieval and the visual information is modelled as a mixture of Gaussian densities. Both models have been proved successful on various standard retrieval tasks. We evaluate the multimodal model on the search task of TREC's video track. We found that the disclosure of video material based on visual information only is still too di#cult. Even with purely visual information needs, text based retrieval still outperforms visual approaches. The probabilistic model is useful for text, visual and multimedia retrieval. Unfortunately, simplifying assumptions that reduce its computational complexity degrade retrieval e#ectiveness. Regarding the question whether the model can e#ectively combine information from di#erent modalities, we conclude that whenever both modalities yield reasonable scores, a combined run outperforms the individual runs.
Information Fusion For Spoken Document Retrieval
- in Proc. ICASSP
, 2000
"... In this paper we investigate the fusion of different information sources with the goal of improving performance on spoken document retrieval (SDR) tasks. In particular, we explore the use of multiple transcriptions from different automatic speech recognizers, the combination of different types of su ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In this paper we investigate the fusion of different information sources with the goal of improving performance on spoken document retrieval (SDR) tasks. In particular, we explore the use of multiple transcriptions from different automatic speech recognizers, the combination of different types of subword unit indexing terms, and the combination of word and subword-based units. To perform retrieval, we use a novel probabilistic information retrieval model which retrieves documents based on maximum likelihood ratio scores. Experiments on the 1998 TREC-7 SDR task show that the use of these different information fusion approaches can result in significantly improved retrieval performance. 1. INTRODUCTION Spoken document retrieval (SDR) is the task of searching a static collection of recorded speech messages in response to a userspecified natural language text query and returning an ordered list of messages ranked according to their relevance to the query. The development of automatic met...
Empirical Development of an Exponential Probabilistic Model for Text Retrieval: Using Textual Analysis to Build a Better Model
- In Proceedings of the 26th Annual ACM Conference on Research and Development in Information Retrieval
, 2003
"... Much work in information retrieval focuses on using a model of documents and queries to derive retrieval algorithms. Model based development is a useful alternative to heuristic development because in a model the assumptions are explicit and can be examined and refined independent of the particular ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Much work in information retrieval focuses on using a model of documents and queries to derive retrieval algorithms. Model based development is a useful alternative to heuristic development because in a model the assumptions are explicit and can be examined and refined independent of the particular retrieval algorithm. We explore the explicit assumptions underlying the naive Bayesian framework by performing computational analysis of actual corpora and queries to devise a generative document model that closely matches text. Our thesis is that a model so developed will be more accurate than existing models, and thus more useful in retrieval, as well as other applications. We test this by learning from a corpus the best document model. We find the learned model better predicts the existence of text data and has improved performance on certain IR tasks.
Relating the New Language Models of Information Retrieval to the Traditional Retrieval Models
, 2000
"... During the last two years, exciting new approaches to information retrieval were introduced by a number of different research groups that use statistical language models for retrieval. ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
During the last two years, exciting new approaches to information retrieval were introduced by a number of different research groups that use statistical language models for retrieval.
Relevance Feedback for Best Match Term Weighting Algorithms in Information Retrieval
- DUBLIN CITY UNIVERSITY
, 2001
"... Personalisation in full text retrieval or full text filtering implies reweighting of the query terms based on some explicit or implicit feedback from the user. Relevance feedback inputs the user's judgements on previously retrieved documents to construct a personalised query or user profile. This ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Personalisation in full text retrieval or full text filtering implies reweighting of the query terms based on some explicit or implicit feedback from the user. Relevance feedback inputs the user's judgements on previously retrieved documents to construct a personalised query or user profile. This paper studies relevance feedback within two probabilistic models of information retrieval: the first based on statistical language models and the second based on the binary independence probabilistic model. The paper shows the resemblance of the approaches to relevance feedback of these models, introduces new approaches to relevance feedback for both models, and evaluates the new relevance feedback algorithms on the TREC collection. The paper shows that there are no significant differences between simple and sophisticated approaches to relevance feedback.
ITC-irst at CLEF 2000: Italian Monolingual Track
- Cross-Language Information Retrieval and Evaluation, LNCS 2069
, 2000
"... This paper presents work on document retrieval for Italian carried out at ITC-irst. Two di#erent approaches to information retrieval were investigated, one based on the Okapi weighting formula and one based on a statistical model. Development experiments were carried out using the Italian sample ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper presents work on document retrieval for Italian carried out at ITC-irst. Two di#erent approaches to information retrieval were investigated, one based on the Okapi weighting formula and one based on a statistical model. Development experiments were carried out using the Italian sample of the TREC-8 CLIR track. Performance evaluation was done on the Cross Language Evaluation Forum (CLEF) 2000 Italian monolingual track. The two methods achieved mean average precisions of 49.0% and 47.5%, respectively, which were the two best scores of their track.
Improving Information Retrieval with Textual Analysis: Bayesian Models and Beyond
- Master’s thesis, MIT
, 2001
"... Information retrieval (IR) is a difficult problem. While many have attempted to model text documents and improve search results by doing so, the most successful text retrieval to date has been developed in an ad-hoc manner. One possible reason for this is that in developing these models very little ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Information retrieval (IR) is a difficult problem. While many have attempted to model text documents and improve search results by doing so, the most successful text retrieval to date has been developed in an ad-hoc manner. One possible reason for this is that in developing these models very little focus has been placed on the actual properties of text. In this thesis, we discuss a principled Bayesian approach we take to information retrieval, which we base on the standard IR probabilistic model. Not surprisingly, we find this approach to be less successful than traditional ad-hoc retrieval. Using data analysis to highlight the discrepancies between our model and the actual properties of text documents, we hope to arrive at a better model for our corpus, and thus a better information retrieval strategy. Specifically, we believe we will find it is inaccurate to assume that whether a term occurs in a document is independent of whether it has already occurred, and we will suggest a way to improve upon this without adding complexity to the solution.
The Dual Role of Smoothing in the Language Modeling Approach
- Proceedings of the Workshop on Language Models for Information Retrieval (LMIR) 2001
, 2001
"... In this paper, we study the role of smoothing in the language modeling approach to text retrieval. We show that the involving of the collection language model in smoothing generally forces an implicit TF-IDF weighting in the retrieval formula. We empirically compare several dierent smoothing metho ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper, we study the role of smoothing in the language modeling approach to text retrieval. We show that the involving of the collection language model in smoothing generally forces an implicit TF-IDF weighting in the retrieval formula. We empirically compare several dierent smoothing methods and examine the sensitivity of retrieval precision to the setting of smoothing parameters. Experiment results indicate that retrieval performance can be very sensitive to smoothing. The results also suggest that smoothing plays two quite dierent roles in the query likelihood ranking function. One role is to avoid assigning zero probabilities to words that have not occurred in a document and the other is to accommodate generation of common words in a query. We propose a two-stage smoothing strategy to explicitly de-couple these two roles of smoothing, and show that this strategy not only gives us better control over the smoothing parameters, but also often outperforms any of the s...
Nymble: a high performance learning name-finder
- Proceeding of the fifth Conference on Applied Language Processing
, 1997
"... In this report, we unify two quite distinct approaches to information retrieval: region models and language models. Region models were developed for structured document retrieval. They provide a well-defined behaviour as well as a simple query language that allows application developers to rapidly d ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this report, we unify two quite distinct approaches to information retrieval: region models and language models. Region models were developed for structured document retrieval. They provide a well-defined behaviour as well as a simple query language that allows application developers to rapidly develop applications. Language models are particularly useful to reason about the ranking of search results, and for developing new ranking approaches. The unified model allows application developers to define complex language modeling approaches as logical queries on a textual database. We show a remarkable one-to-one relationship between region queries and the language models they represent for a wide variety of applications: simple ad-hoc search, cross-language retrieval, video retrieval, and web search. 1
ITC-irst at CLEF 2001: Monolingual and Bilingual Tracks
- Evaluation of Cross-Language Information Retrieval Systems, LNCS 2406
, 2002
"... This paper reports on the participation of ITC-irst in the Cross Language Evaluation Forum (CLEF) of 2001. ITC-irst has taken part to two tracks: the monolingual retrieval task, and the bilingual retrieval task. In both cases, Italian was chosen as the query language, while English was chosen as ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper reports on the participation of ITC-irst in the Cross Language Evaluation Forum (CLEF) of 2001. ITC-irst has taken part to two tracks: the monolingual retrieval task, and the bilingual retrieval task. In both cases, Italian was chosen as the query language, while English was chosen as the document language of the bilingual task. The employed retrieval engine combines scores computed by an Okapi model and a statistical language model. The cross language system employes a statistical query translation model, which is estimated on the target document collection and on a translation dictionary.

