Results 1 - 10
of
785
Cumulated Gain-based Evaluation of IR Techniques
- ACM Transactions on Information Systems
, 2002
"... Modem large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation to the users. In order to develop IR techniques to this direction, i ..."
Abstract
-
Cited by 694 (3 self)
- Add to MetaCart
measures are defined and discussed and then their use is demonstrated in a case study using TREC data - sample system run results for 20 queries in TREC-7. As relevance base we used novel graded relevance assessments on a four-point scale. The test results indicate that the proposed measures credit IR
An extensive empirical study of feature selection metrics for text classification
- J. of Machine Learning Research
, 2003
"... Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison ..."
Abstract
-
Cited by 496 (15 self)
- Add to MetaCart
of twelve feature selection methods (e.g. Information Gain) evaluated on a benchmark of 229 text classification problem instances that were gathered from Reuters, TREC, OHSUMED, etc. The results are analyzed from multiple goal perspectives—accuracy, F-measure, precision, and recall—since each is appropriate
Query clustering and IR system detection. Experiments on TREC data
"... This paper investigates two aspects in this experiment. Linguistic techniques are used to categorize queries in a first step. This classification is then used to analyze systems performances in a TREC context. More precisely, we cluster TREC topics with 13 linguistic features (Mothe and al, 2005), a ..."
Abstract
- Add to MetaCart
This paper investigates two aspects in this experiment. Linguistic techniques are used to categorize queries in a first step. This classification is then used to analyze systems performances in a TREC context. More precisely, we cluster TREC topics with 13 linguistic features (Mothe and al, 2005
Collection selection and results merging with topically organized U.S. patents and TREC data
- In CIKM 2000
, 2000
"... We investigate three issues in distributed information retrieval, considering both TREC data and U.S. Patents: (1) topical organization of large text collections, (2) collection ranking and selection with topically organized collections (3) results merging, particularly document score normalization, ..."
Abstract
-
Cited by 50 (8 self)
- Add to MetaCart
We investigate three issues in distributed information retrieval, considering both TREC data and U.S. Patents: (1) topical organization of large text collections, (2) collection ranking and selection with topically organized collections (3) results merging, particularly document score normalization
A Probabilistic Model of Information Retrieval: Development and Status
, 1998
"... The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Eac ..."
Abstract
-
Cited by 360 (25 self)
- Add to MetaCart
The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions
Information Retrieval as Statistical Translation
"... We propose a new probabilistic approach to information retrieval based upon the ideas and methods of statistical machine translation. The central ingredient in this approach is a statistical model of how a user might distill or "translate" a given document into a query. To assess the rele ..."
Abstract
-
Cited by 313 (6 self)
- Add to MetaCart
by Ponte and Croft. In a series of experiments on TREC data, a simple translation-based retrieval system performs well in compari...
Predicting Query Performance
, 2002
"... We develop a method for predicting query performance by computing the relative entropy between a query language model and the corresponding collection language model. The resulting clarity score measures the coherence of the language usage in documents whose models are likely to generate the query. ..."
Abstract
-
Cited by 269 (16 self)
- Add to MetaCart
information. We develop an algorithm for automatically setting the clarity score threshold between predicted poorly-performing queries and acceptable queries and validate it using TREC data. In particular, we compare the automatic thresholds to optimum thresholds and also check how frequently results as good
A Markov random field model for term dependencies
"... This paper develops a general, formal framework for modeling term dependencies via Markov random fields. The model allows for arbitrary text features to be incorporated as evidence. In particular, we make use of features based on occurrences of single terms, ordered phrases, and unordered phrases. W ..."
Abstract
-
Cited by 289 (55 self)
- Add to MetaCart
. We explore full independence, sequential dependence, and full dependence variants of the model. A novel approach is developed to train the model that directly maximizes the mean average precision rather than maximizing the likelihood of the training data. Ad hoc retrieval experiments are presented
The TREC-5 Filtering Track
- The Fifth Text REtrieval Conference (TREC-5
, 1997
"... The TREC-5 filtering track, an evaluation of binary text classification systems, was a repeat of the filtering evaluation run in a trial version for TREC-4, with only the data set and participants changing. Seven sites took part, submitting a total of ten runs. We review the nature of the task, the ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
The TREC-5 filtering track, an evaluation of binary text classification systems, was a repeat of the filtering evaluation run in a trial version for TREC-4, with only the data set and participants changing. Seven sites took part, submitting a total of ten runs. We review the nature of the task
TREC Genomics Track Overview
, 2003
"... The first year of TREC Genomics Track featured two tasks: ad hoc retrieval and information extraction. Both tasks centered around the Gene Reference into Function (GeneRIF) resource of the National Library of Medicine, which was used as both pseudorelevance judgments for ad hoc document retrieval as ..."
Abstract
-
Cited by 43 (1 self)
- Add to MetaCart
with the growth of new information needs (e.g., question-answering, cross-lingual), data types (e.g., video) and platforms (e.g., the Web). This paper describes the events leading up to the first year of TREC Genomics Track, the first year’s results, and future directions for subsequent years. Genomics
Results 1 - 10
of
785