Results 1 - 10
of
34
Web-centric language models
- In Proceedings of the Fourteenth ACM Conference on Information and Knowledge Management (CIKM 2005
, 2005
"... We investigates language models for informational and navigational web search. Retrieval on the web is a task that differs substantially from ordinary ad hoc retrieval. We perform an analysis of prior probability of relevance for a wide range of non-content features, shedding further light on the im ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
We investigates language models for informational and navigational web search. Retrieval on the web is a task that differs substantially from ordinary ad hoc retrieval. We perform an analysis of prior probability of relevance for a wide range of non-content features, shedding further light on the importance of non-content features for web retrieval. Language models can naturally incorporate multiple document representations, as well as non-content information. For the former, we employ mixture language models based on document full-text, incoming anchor-text, and document titles. For the latter, we study a range of priors based on document length, URL structure, and link topology. We look at three types of topics—distillation, home page, and named page— as well as for a mixed query set. We find that the mixture models lead to considerable improvement of retrieval effectiveness for all topic types. The web-centric priors generally lead to further improvement of retrieval effectiveness.
LETOR: A Benchmark Collection for Research on Learning to Rank for Information Retrieval
"... LETOR is a benchmark collection for the research on learning to rank for information retrieval, released by Microsoft Research Asia. In this paper, we describe the details of the LETOR collection and show how it can be used in different kinds of researches. Specifically, we describe how the documen ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
LETOR is a benchmark collection for the research on learning to rank for information retrieval, released by Microsoft Research Asia. In this paper, we describe the details of the LETOR collection and show how it can be used in different kinds of researches. Specifically, we describe how the document corpora and query sets in LETOR are selected, how the documents are sampled, how the learning features and meta information are extracted, and how the datasets are partitioned for comprehensive evaluation. We then compare several state-of-the-art learning to rank algorithms on LETOR, report their ranking performances, and make discussions on the results. After that, we discuss possible new research topics that can be supported by LETOR, in addition to algorithm comparison. We hope that this paper can help people to gain deeper understanding of LETOR, and enable more interesting research projects on learning to rank and related topics.
Beyond the Web: Retrieval in Social Information Spaces
- IN PROCEEDINGS OF THE 28 TH EUROPEAN CONFERENCE ON INFORMATION RETRIEVAL (ECIR 2006
, 2006
"... We research whether the inclusion of information about an information user’s social environment and his position in the social network of his peers leads to an improval in search effectiveness. Traditional information retrieval methods fail to address the fact that information production and consum ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We research whether the inclusion of information about an information user’s social environment and his position in the social network of his peers leads to an improval in search effectiveness. Traditional information retrieval methods fail to address the fact that information production and consumption are social activities. We ameliorate this problem by extending the domain model of information retrieval to include social networks. We describe a technique for information retrieval in such an enviroment and evaluate it in comparison to vector space retrieval.
Automatic Classification of Web Queries Using Very Large Unlabeled Query Logs
"... Accurate topical classification of user queries allows for increased effectiveness and efficiency in general-purpose Web search systems. Such classification becomes critical if the system must route queries to a subset of topic-specific and resource-constrained back-end databases. Successful query c ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Accurate topical classification of user queries allows for increased effectiveness and efficiency in general-purpose Web search systems. Such classification becomes critical if the system must route queries to a subset of topic-specific and resource-constrained back-end databases. Successful query classification poses a challenging problem, as Web queries are short, thus providing few features. This feature sparseness, coupled with the constantly changing distribution and vocabulary of queries, hinders traditional text classification. We attack this problem by combining multiple classifiers, including exact lookup and partial matching in databases of manually classified frequent queries, linear models trained by supervised learning, and a novel approach based on mining selectional preferences from a large unlabeled query log. Our approach classifies queries without using external sources of information, such as online Web directories or the contents of retrieved pages, making it viable for use in demanding operational environments, such as large-scale Web search services. We evaluate our approach using a large sample of queries from an operational Web search engine and show that our combined method increases recall by nearly 40 % over the best single method while maintaining adequate precision. Additionally, we compare our results to those from the 2005 KDD Cup and find that we perform competitively despite our operational restrictions. This suggests it is possible to topically classify a significant portion of the query stream without requiring external sources of information, allowing for deployment in operationally restricted environments.
A Machine Learning Approach for Improved BM25 Retrieval
"... Despite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine the effectiveness of BM25 on various document fields. We find that BM25 models relevance on popularity fields such as anchor ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Despite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine the effectiveness of BM25 on various document fields. We find that BM25 models relevance on popularity fields such as anchor text and query click information no better than a linear function of the field attributes. We also find query click information to be the single most important field for retrieval. In response, we develop a machine learning approach to BM25-style retrieval that learns, using LambdaRank, from the input attributes of BM25. Our model significantly improves retrieval effectiveness over BM25 and BM25F. Our data-driven approach is fast, effective, avoids the problem of parameter tuning, and can directly optimize for several common information retrieval measures. We demonstrate the advantages of our model on a very large real-world Web data collection.
Combination of Document Priors in Web Information Retrieval
- In Proceedings of ECIR 2007
, 2007
"... Query-independent features (also called document priors), such as the number of incoming links to a document, its Page-Rank, or the type of its associated URL, have been successfully integrated into Web Information Retrieval systems in order to enhance the retrieval effectiveness. The combination of ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Query-independent features (also called document priors), such as the number of incoming links to a document, its Page-Rank, or the type of its associated URL, have been successfully integrated into Web Information Retrieval systems in order to enhance the retrieval effectiveness. The combination of several document priors could further enhance the retrieval performance. However, most current combination of priors approaches are based on heuristics, and often ignore the possible dependence between the document priors. In this paper, we present a novel and robust method for conditionally combining document priors in a principled way. The approach adjusts the distribution of document priors for one source of evidence according to the distribution of document priors for other sources of evidence. We investigate the retrieval performance attainable by our combination of priors method, in comparison to the use of single priors and to a heuristic combination of document priors method, which assumes that document priors are independent. Furthermore, we investigate how sensitive the proposed method is to the training data. Using two standard Web test collections, including the large-scale.GOV2 test collection, we find that some of the document priors used in our experiments, have a considerably high correlation, suggesting that the dependency between documents priors should indeed be taken into account. Through extensive experiments on these two large-scale collections, we observe that our proposed conditional combination method is overall effective and robust. 1
An Outranking Approach for Rank Aggregation in Information Retrieval
, 2007
"... Research in Information Retrieval usually shows performance improvement when many sources of evidence are combined to produce a ranking of documents (e.g., texts, pictures, sounds, etc.). In this paper, we focus on the rank aggregation problem, also called data fusion problem, where rankings of docu ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Research in Information Retrieval usually shows performance improvement when many sources of evidence are combined to produce a ranking of documents (e.g., texts, pictures, sounds, etc.). In this paper, we focus on the rank aggregation problem, also called data fusion problem, where rankings of documents, searched into the same collection and provided by multiple methods, are combined in order to produce a new ranking. In this context, we propose a rank aggregation method within a multiple criteria framework using aggregation mechanisms based on decision rules identifying positive and negative reasons for judging whether a document should get a better rank than another. We show that the proposed method deals well with the Information Retrieval distinctive features. Experimental results are reported showing that the suggested method performs better than the well-known CombSUM and CombMNZ operators.
Learning to select a ranking function
"... Abstract. Learning To Rank (LTR) techniques aim to learn an effective document ranking function by combining several document features. While the function learned may be uniformly applied to all queries, many studies have shown that different ranking functions favour different queries, and the retri ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract. Learning To Rank (LTR) techniques aim to learn an effective document ranking function by combining several document features. While the function learned may be uniformly applied to all queries, many studies have shown that different ranking functions favour different queries, and the retrieval performance can be significantly enhanced if an appropriate ranking function is selected for each individual query. In this paper, we propose a novel Learning To Select framework that selectively applies an appropriate ranking function on a per-query basis. The approach employs a query feature to identify similar training queries for an unseen query. The ranking function which performs the best on this identified training query set is then chosen for the unseen query. In particular, we propose the use of divergence, which measures the extent that a document ranking function alters the scores of an initial ranking of documents for a given query, as a query feature. We evaluate our method using tasks from the TREC Web and Million Query tracks, in combination with the LETOR 3.0 and LETOR 4.0 feature sets. Our experimental results show that our proposed method is effective and robust for selecting an appropriate ranking function on a per-query basis. In particular, it always outperforms three state-of-the-art LTR techniques, namely Ranking SVM, AdaRank, and the automatic feature selection method. 1
Intent-aware search result diversification
- In Proceedings of the 34th ACM SIGIR
, 2011
"... Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved documents with respect to the different aspects. ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved documents with respect to the different aspects. However, such aspects themselves may represent information needs with rather distinct intents (e.g., informational or navigational). Hence, a diverse ranking could benefit from applying intent-aware retrieval models when estimating the relevance of documents to different aspects. In this paper, we propose to diversify the results retrieved for a given query, by learning the appropriateness of different retrieval models for each of the aspects underlying this query. Thorough experiments within the evaluation framework provided by the diversity task of the TREC 2009 and 2010 Web tracks show that the proposed approach can significantly improve state-of-the-art diversification approaches.
A Meta-Learning Approach for Robust Rank Learning
"... Learning effective feature-based ranking functions is a fundamental task for search engines, and has recently become an active area of research [10, 3, 2]. Many of these recent algorithms are based on the pairwise preference framework, in which instead of taking documents in isolation, document pair ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Learning effective feature-based ranking functions is a fundamental task for search engines, and has recently become an active area of research [10, 3, 2]. Many of these recent algorithms are based on the pairwise preference framework, in which instead of taking documents in isolation, document pairs are used as instances in the learning process. One disadvantage of this process is that a noisy relevance judgment on a single document can lead to a large number of mis-labeled document pairs. This can jeopardize robustness and deteriorate overall ranking performance. In this paper we study the effects of outlying pairs in rank learning with pairwise preferences and introduce a new meta-learning algorithm capable of suppressing these undesirable effects. This algorithm works as a second optimization step in which any linear baseline ranker can be used as input. Experiments on eight different ranking datasets show that this optimization step produces statistically significant performance gains over various state-of-the-art baseline rankers.

