Results 1 - 10
of
22
Query Reformulation Using Anchor Text
"... Query reformulation techniques based on query logs have been studied as a method of capturing user intent and improving retrieval effectiveness. The evaluation of these techniques has primarily, however, focused on proprietary query logs and selected samples of queries. In this paper, we suggest tha ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Query reformulation techniques based on query logs have been studied as a method of capturing user intent and improving retrieval effectiveness. The evaluation of these techniques has primarily, however, focused on proprietary query logs and selected samples of queries. In this paper, we suggest that anchor text, which is readily available, can be an effective substitute for a query log and study the effectiveness of a range of query reformulation techniques (including log-based stemming, substitution, and expansion) using standard TREC collections. Our results show that logbased query reformulation techniques are indeed effective with standard collections, but expansion is a much safer form of query modification than word substitution. We also show that using anchor text as a simulated query log is as least as effective as a real log for these techniques.
Reducing Long Queries Using Query Quality Predictors
"... Long queries frequently contain many extraneous terms that hinder retrieval of relevant documents. We present techniques to reduce long queries to more effective shorter ones that lack those extraneous terms. Our work is motivated by the observation that perfectly reducing long TREC description quer ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Long queries frequently contain many extraneous terms that hinder retrieval of relevant documents. We present techniques to reduce long queries to more effective shorter ones that lack those extraneous terms. Our work is motivated by the observation that perfectly reducing long TREC description queries can lead to an average improvement of 30 % in mean average precision. Our approach involves transforming the reduction problem into a problem of learning to rank all sub-sets of the original query (sub-queries) based on their predicted quality, and select the top sub-query. We use various measures of query quality described in the literature as features to represent sub-queries, and train a classifier. Replacing the original long query with the top-ranked subquery chosen by the ranking classifier results in a statistically significant average improvement of 8 % on our test sets. Analysis of the results shows that query reduction is wellsuited for moderately-performing long queries, and a small set of query quality predictors are well-suited for the task of ranking sub-queries.
Learning Concept Importance Using a Weighted Dependence Model
"... Modeling query concepts through term dependencies has been shown to have a significant positive effect on retrieval performance, especially for tasks such as web search, where relevance at high ranks is particularly critical. Most previous work, however, treats all concepts as equally important, an ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Modeling query concepts through term dependencies has been shown to have a significant positive effect on retrieval performance, especially for tasks such as web search, where relevance at high ranks is particularly critical. Most previous work, however, treats all concepts as equally important, an assumption that often does not hold, especially for longer, more complex queries. In this paper, we show that one of the most effective existing term dependence models can be naturally extended by assigning weights to concepts. We demonstrate that the weighted dependence model can be trained using existing learning-to-rank techniques, even with a relatively small number of training queries. Our study compares the effectiveness of both endogenous (collectionbased) and exogenous (based on external sources) features for determining concept importance. To test the weighted dependence model, we perform experiments on both publicly available TREC corpora and a proprietary web corpus. Our experimental results indicate that our model consistently and significantly outperforms both the standard bag-of-words model and the unweighted term dependence model, and that combining endogenous and exogenous features generally results in the best retrieval effectiveness.
Eddi: Interactive Topic-based Browsing of Social Status Streams
"... Twitter streams are on overload: active users receive hundreds of items per day, and existing interfaces force us to march through a chronologically-ordered morass to find tweets of interest. We present an approach to organizing a user's own feed into coherently clustered trending topics for more di ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Twitter streams are on overload: active users receive hundreds of items per day, and existing interfaces force us to march through a chronologically-ordered morass to find tweets of interest. We present an approach to organizing a user's own feed into coherently clustered trending topics for more directed exploration. Our Twitter client, called Eddi, groups tweets in a user’s feed into topics mentioned explicitly or implicitly, which users can then browse for items of interest. To implement this topic clustering, we have developed a novel algorithm for discovering topics in short status updates powered by linguistic syntactic transformation and callouts to a search engine. An algorithm evaluation reveals that search engine callouts outperform other approaches when they employ simple syntactic transformation and backoff strategies. Active Twitter users evaluated Eddi and found it to be a more efficient and enjoyable way to browse an overwhelming status update feed than the standard chronological interface. ACM Classification: H5.2. Information interfaces and presentation (e.g., HCI): User interfaces.
A torrent of tweets: Managing information overload in online social streams
- In Workshop on Microblogging: What and How Can We Learn From It? (CHI ’10
, 2010
"... Copyright is held by the author/owner(s). ..."
Exploring Reductions for Long Web Queries
"... Long queries form a difficult, but increasingly important segment for web search engines. Query reduction, a technique for dropping unnecessary query terms from long queries, improves performance of ad-hoc retrieval on TREC collections. Also, it has great potential for improving long web queries (up ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Long queries form a difficult, but increasingly important segment for web search engines. Query reduction, a technique for dropping unnecessary query terms from long queries, improves performance of ad-hoc retrieval on TREC collections. Also, it has great potential for improving long web queries (upto 25 % improvement in
Improving verbose queries using subset distribution
- In Proc. CIKM
, 2010
"... Dealing with verbose (or long) queries poses a new challenge for information retrieval. Selecting a subset of the original query (a “sub-query”) has been shown to be an effective method for improving these queries. In this paper, the distribution of sub-queries (“subset distribution”) is formally mo ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Dealing with verbose (or long) queries poses a new challenge for information retrieval. Selecting a subset of the original query (a “sub-query”) has been shown to be an effective method for improving these queries. In this paper, the distribution of sub-queries (“subset distribution”) is formally modeled within a well-grounded framework. Specifically, sub-query selection is considered as a sequential labeling problem, where each query word in a verbose query is assigned a label of “keep ” or “don’t keep”. A novel Conditional Random Field model is proposed to generate the distribution of sub-queries. This model captures the local and global dependencies between query words and directly optimizes the expected retrieval performance on a training set. The experiments, based on different retrieval models and performance measures, show that the proposed model can generate high-quality sub-query distributions and can significantly outperform state-of-the-art techniques.
Term Necessity Prediction
"... The probability that a term appears in relevant documents (P� � | ��) is a fundamental quantity in several probabilistic retrieval models, however it is difficult to estimate without relevance judgments or a relevance model. We call this value term necessity because it measures the percentage of re ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The probability that a term appears in relevant documents (P� � | ��) is a fundamental quantity in several probabilistic retrieval models, however it is difficult to estimate without relevance judgments or a relevance model. We call this value term necessity because it measures the percentage of relevant documents retrieved by the term – how necessary a term’s occurrence is to document relevance. Prior research typically either set this probability to a constant, or estimated it based on the term's inverse document frequency, neither of which was very effective. This paper identifies several factors that affect term necessity, for example, a term’s topic centrality, synonymy and abstractness. It develops term- and query-dependent features for each factor that enable supervised learning of a predictive model of term necessity from training data. Experiments with two popular retrieval models and 6 standard datasets demonstrate that using predicted term necessity estimates as user term weights of the original query terms leads to significant improvements in retrieval accuracy.
UMass Amherst and UT Austin @ The TREC 2009 Relevance Feedback Track
"... We present a new supervised method for estimating term-based retrieval models and apply it to weight expansion terms from relevance feedback. While previous work on supervised feedback [Cao et al., 2008] demonstrated significantly improved retrieval accuracy over standard unsupervised approaches [La ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present a new supervised method for estimating term-based retrieval models and apply it to weight expansion terms from relevance feedback. While previous work on supervised feedback [Cao et al., 2008] demonstrated significantly improved retrieval accuracy over standard unsupervised approaches [Lavrenko and Croft, 2001, Zhai and Lafferty, 2001], feedback terms were assumed to be independent in order to reduce training time. In contrast, we adapt the AdaRank learning algorithm [Xu and Li, 2007] to simultaneously estimate parameterization of all feedback terms. While not evaluated here, the method can be more generally applied for joint estimation of both query and feedback terms. To apply our method to a large web collection, we also investigate use of sampling to reduce feature extraction time while maintaining robust learning. 1

