Results 1 -
6 of
6
A Maximum Likelihood Ratio Information Retrieval Model
, 1999
"... ... model that scores documents based on the relative change in the document likelihoods, expressed as the ratio of the conditional probability of the document given the query and the prior probability of the document before the query is specified. The document likelihoods are computed using statist ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
... model that scores documents based on the relative change in the document likelihoods, expressed as the ratio of the conditional probability of the document given the query and the prior probability of the document before the query is specified. The document likelihoods are computed using statistical language modeling techniques and the model parameters are estimated automatically and dynamically for each query to optimize well-specified (maximum likelihood) objective functions. We derive the basic retrieval model, describe the details of the model, and present some extensions to the model including a method to perform automatic feedback. Development experiments are performed using the TREC-6 ad hoc text retrieval task and performance is measured using the TREC-7 ad hoc task. Official evaluation results on the 1999 TREC-8 ad hoc task are also reported. The performance results demonstrate that the model is competitive with current state-of-the-art retrieval approaches.
Subword-based Approaches for Spoken Document Retrieval
, 2000
"... This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of subword unit representations for SDR as an alternative to words generated by either keyword spotting or continuous speech recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for a priori, or requiring a very large recognition vocabulary in order to cover the contents of growing and diverse message collections. The use of subword units in the recognizer constrains the size of the vocabulary needed to cover the language; and the use of subword units as indexing terms allows for the detection of new user-specified query terms during retrieval. Four
Ontology suitability for uncertain extraction of information from multi-record web documents
- In Proceedings of the Workshop on Agenten, Datenbanken und Information Retrieval (ADI’99
, 1999
"... Ontology based data extraction from multi-record Web documents works well [ECLS98, ECJ + 98, ECJ + 99, EJN99], but only if the ontology is suitable for the Web document. How do we know whether the ontology is suitable? To resolve this question, we present an approach based on three heuristics: densi ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Ontology based data extraction from multi-record Web documents works well [ECLS98, ECJ + 98, ECJ + 99, EJN99], but only if the ontology is suitable for the Web document. How do we know whether the ontology is suitable? To resolve this question, we present an approach based on three heuristics: density, schema, and grouping. We encode the first heuristic as a density function and use probabilistic models for the second and third. We argue that these heuristics and our computational models for these heuristics correctly determine the suitability of a Web document for a given ontology. 1
Working Session: Information Retrieval Based Approaches in Software Evolution
"... During software evolution a collection of related artifacts with different representations are created. Some of these are composed of structured data (e.g., analysis data), some contain semi-structured information (e.g., source code), and many include unstructured information (e.g., text). Research ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
During software evolution a collection of related artifacts with different representations are created. Some of these are composed of structured data (e.g., analysis data), some contain semi-structured information (e.g., source code), and many include unstructured information (e.g., text). Research efforts exist that are trying to extract, represent, and analyze the unstructured information in software. Information retrieval (IR) techniques are used quite successfully in the past years to represent and extract textual information from software artifacts, with application to many maintenance tasks. This working session will focus on the state on the art in the application of IR-based techniques to support software maintenance activities. The session aims to identify the main research and practical issues in the field, to determine future work directions, and to foster collaborations among the participants.
CLIPS at TREC-11: Experiments in Filtering
"... At the TREC9 conference, we presented a new adaptive filtering system called RELIEFS. This system which is based on the idea of resonance, combines for each term t, the relative frequency of relevance knowing t and the relative frequency of t kwowing relevance. On the basis of other experiments, ..."
Abstract
- Add to MetaCart
At the TREC9 conference, we presented a new adaptive filtering system called RELIEFS. This system which is based on the idea of resonance, combines for each term t, the relative frequency of relevance knowing t and the relative frequency of t kwowing relevance. On the basis of other experiments, several changes have been made. We improved our threshold adaption, we slightly changed our relevance evaluation function and we gave up the use of conjunctions and thesaurus. The system is now focusing more exclusively on the combination of both reverse frequencies that we believe to represent the fundamental aspects of relevance estimation. This year we used the system in its new version and we tested it on the Reuters corpus. Focusing on the combination of the two frequencies, we varied their relative importance.

