Results 1 - 10
of
15
Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs
"... Users frequently modify a previous search query in hope of retrieving better results. These modifications are called query reformulations or query refinements. Existing research has studied how web search engines can propose reformulations, but has given less attention to how people perform query re ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Users frequently modify a previous search query in hope of retrieving better results. These modifications are called query reformulations or query refinements. Existing research has studied how web search engines can propose reformulations, but has given less attention to how people perform query reformulations. In this paper, we aim to better understand how web searchers refine queries and form a theoretical foundation for query reformulation. We study users ’ reformulation strategies in the context of the AOL query logs. We create a taxonomy of query refinement strategies and build a high precision rule-based classifier to detect each type of reformulation. Effectiveness of reformulations is measured using user click behavior. Most reformulation strategies result in some benefit to the user. Certain strategies like add/remove words, word substitution, acronym expansion, and spelling correction are more likely to cause clicks, especially on higher ranked results. In contrast, users often click the same result as their previous query or select no results when forming acronyms and reordering words. Perhaps the most surprising finding is that some reformulations are better suited to helping users when the current results are already fruitful, while other reformulations are more effective when the results are lacking. Our findings inform the design of applications that can assist searchers; examples are described in this paper.
Two-stage query segmentation for information retrieval
- In Proc. of SIGIR
, 2009
"... Modeling term dependence has been shown to have a significant positive impact on retrieval. Current models, however, use sequential term dependencies, leading to an increased query latency, especially for long queries. In this paper, we examine two query segmentation models that reduce the number of ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Modeling term dependence has been shown to have a significant positive impact on retrieval. Current models, however, use sequential term dependencies, leading to an increased query latency, especially for long queries. In this paper, we examine two query segmentation models that reduce the number of dependencies. We find that two-stage segmentation based on both query syntactic structure and external information sources such as query logs, attains retrieval performance comparable to the sequential dependence model, while achieving a 50 % reduction in query latency.
Using Web-scale N-grams to Improve Base NP Parsing Performance
"... We use web-scale N-grams in a base NP parser that correctly analyzes 95.4 % of the base NPs in natural text. Web-scale data improves performance. That is, there is no data like more data. Performance scales log-linearly with the number of parameters in the model (the number of unique N-grams). The w ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We use web-scale N-grams in a base NP parser that correctly analyzes 95.4 % of the base NPs in natural text. Web-scale data improves performance. That is, there is no data like more data. Performance scales log-linearly with the number of parameters in the model (the number of unique N-grams). The web-scale N-grams are particularly helpful in harder cases, such as NPs that contain conjunctions. 1
Improving verbose queries using subset distribution
- In Proc. CIKM
, 2010
"... Dealing with verbose (or long) queries poses a new challenge for information retrieval. Selecting a subset of the original query (a “sub-query”) has been shown to be an effective method for improving these queries. In this paper, the distribution of sub-queries (“subset distribution”) is formally mo ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Dealing with verbose (or long) queries poses a new challenge for information retrieval. Selecting a subset of the original query (a “sub-query”) has been shown to be an effective method for improving these queries. In this paper, the distribution of sub-queries (“subset distribution”) is formally modeled within a well-grounded framework. Specifically, sub-query selection is considered as a sequential labeling problem, where each query word in a verbose query is assigned a label of “keep ” or “don’t keep”. A novel Conditional Random Field model is proposed to generate the distribution of sub-queries. This model captures the local and global dependencies between query words and directly optimizes the expected retrieval performance on a training set. The experiments, based on different retrieval models and performance measures, show that the proposed model can generate high-quality sub-query distributions and can significantly outperform state-of-the-art techniques.
Structural annotation of search queries using pseudo-relevance feedback
- In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM ’10
, 2010
"... Marking up queries with annotations such as part-of-speech tags, capitalization, and segmentation, is an important part of many approaches to query processing and understanding. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing annotation tools that are co ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Marking up queries with annotations such as part-of-speech tags, capitalization, and segmentation, is an important part of many approaches to query processing and understanding. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing annotation tools that are commonly trained on full-length documents. To address this challenge, we view the query as an explicit representation of a latent information need, which allows us to use pseudorelevance feedback, and to leverage additional information from the document corpus, in order to improve the quality of query annotation.
Unsupervised Acquisition of Lexical Knowledge From N-grams: Final Report of the 2009 JHU CLSP Workshop
"... This report describes a variety of work that uses web-scale N-gram data. This ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This report describes a variety of work that uses web-scale N-gram data. This
Managing Misspelled Queries in IR Applications
"... Our work concerns the design of robust information retrieval environments that can successfully handle queries containing misspelled words. Our aim is to perform a comparative analysis of the efficacy of two possible strategies that can be adopted. A first strategy involves those approaches based on ..."
Abstract
- Add to MetaCart
Our work concerns the design of robust information retrieval environments that can successfully handle queries containing misspelled words. Our aim is to perform a comparative analysis of the efficacy of two possible strategies that can be adopted. A first strategy involves those approaches based on correcting the misspelled query, thus requiring the integration of linguistic information in the system. This solution has been studied from complementary standpoints, according to whether contextual information of a linguistic nature is integrated in the process or not, the former implying a higher degree of complexity. A second strategy involves the use of character n-grams as the basic indexing unit, which guarantees the robustness of the information retrieval process whilst at the same time eliminating the need for a specific query correction stage. This is a knowledgelight and language-independent solution which requires no linguistic information for its application. Both strategies have been subjected to experimental testing, with Spanish being
Context-Aware Query Classification
"... Understanding users ’ search intent expressed through their search queries is crucial to Web search and online advertisement. Web query classification (QC) has been widely studied for this purpose. Most previous QC algorithms classify individual queries without considering their context information. ..."
Abstract
- Add to MetaCart
Understanding users ’ search intent expressed through their search queries is crucial to Web search and online advertisement. Web query classification (QC) has been widely studied for this purpose. Most previous QC algorithms classify individual queries without considering their context information. However, as exemplified by the well-known example on query “jaguar”, many Web queries are short and ambiguous, whose real meanings are uncertain without the context information. In this paper, we incorporate context information into the problem of query classification by using conditional random field (CRF) models. In our approach, we use neighboring queries and their corresponding clicked URLs (Web pages) in search sessions as the context information. We perform extensive experiments on real world search logs and validate the effectiveness and efficiency of our approach. We show that we can improve the F1 score by 52 % as compared to other state-of-the-art baselines.
Search and Retrieval—Query formulation
"... Query recommendation has been considered as an effective way to help search users in their information seeking activities. Traditional approaches mainly focused on recommending alternative queries with close search intent to the original query. However, to only take relevance into account may genera ..."
Abstract
- Add to MetaCart
Query recommendation has been considered as an effective way to help search users in their information seeking activities. Traditional approaches mainly focused on recommending alternative queries with close search intent to the original query. However, to only take relevance into account may generate redundant recommendations to users. It is better to provide diverse as well as relevant query recommendations, so that we can cover multiple potential search intents of users and minimize the risk that users will not be satisfied. Besides, previous query recommendation approaches mostly relied on measuring the relevance or similarity between queries in the Euclidean space. However, there is no convincing evidence that the query space is Euclidean. It is more natural and reasonable to assume that the query space is a manifold. In this paper, therefore, we aim to recommend diverse and relevant queries based on the intrinsic query manifold. We propose a unified model, named manifold ranking with stop points, for query recommendation. By turning ranked queries into stop points on the query manifold, our approach can generate query recommendations by simultaneously considering both diversity and relevance in a unified way. Empirical experimental results on a large scale query log of a commercial search engine show that our approach can effectively generate highly diverse as well as closely related query recommendations.
Joint Annotation of Search Queries
"... Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an important part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to exist ..."
Abstract
- Add to MetaCart
Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an important part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. To address this challenge, we propose a probabilistic approach for performing joint query annotation. First, we derive a robust set of unsupervised independent annotations, using queries and pseudo-relevance feedback. Then, we stack additional classifiers on the independent annotations, and exploit the dependencies between them to further improve the accuracy, even with a very limited amount of available training data. We evaluate our method using a range of queries extracted from a web search log. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries. 1

