Results 1 - 10
of
77
Learning query intent from regularized click graphs
- In SIGIR 2008
, 2008
"... This work presents the use of click graphs in improving query intent classifiers, which are critical if vertical search and general-purpose search services are to be offered in a unified user interface. Previous works on query classification have primarily focused on improving feature representation ..."
Abstract
-
Cited by 39 (10 self)
- Add to MetaCart
This work presents the use of click graphs in improving query intent classifiers, which are critical if vertical search and general-purpose search services are to be offered in a unified user interface. Previous works on query classification have primarily focused on improving feature representation of queries, e.g., by augmenting queries with search engine results. In this work, we investigate a completely orthogonal approach — instead of enriching feature representation, we aim at drastically increasing the amounts of training data by semi-supervised learning with click graphs. Specifically, we infer class memberships of unlabeled queries from those of labeled ones according to their proximities in a click graph. Moreover, we regularize the learning with click graphs by content-based classification to avoid propagating erroneous labels. We demonstrate the effectiveness of our algorithms in two different applications, product intent and job intent classification. In both cases, we expand the training data with automatically labeled queries by over two orders of magnitude, leading to significant improvements in classification performance. An additional finding is that with a large amount of training data obtained in this fashion, classifiers using only query words/phrases as features can work remarkably well.
Query suggestion using hitting time
- in Proc. of conf. on Inf. and Knowledge Manage. (CIKM’08
"... Generating alternative queries, also known as query suggestion, has long been proved useful to help a user explore and express his information need. In many scenarios, such suggestions can be generated from a large scale graph of queries and other accessory information, such as the clickthrough. How ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
Generating alternative queries, also known as query suggestion, has long been proved useful to help a user explore and express his information need. In many scenarios, such suggestions can be generated from a large scale graph of queries and other accessory information, such as the clickthrough. However, how to generate suggestions while ensuring their semantic consistency with the original query remains a challenging problem. In this work, we propose a novel query suggestion algorithm based on ranking queries with the hitting time on a large scale bipartite graph. Without involvement of twisted heuristics or heavy tuning of parameters, this method clearly captures the semantic consistency between the suggested query and the original query. Empirical experiments on a large scale query log of a commercial search engine and a scientific literature collection show that hitting time is effective to generate semantically consistent query suggestions. The proposed algorithm and its variations can successfully boost long tail queries, accommodating personalized query suggestion, as well as finding related authors in research.
Using the wisdom of the crowds for keyword generation
- In WWW
, 2008
"... In the sponsored search model, search engines are paid by businesses that are interested in displaying ads for their site alongside the search results. Businesses bid for keywords, and their ad is displayed when the keyword is queried to the search engine. An important problem in this process is key ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
In the sponsored search model, search engines are paid by businesses that are interested in displaying ads for their site alongside the search results. Businesses bid for keywords, and their ad is displayed when the keyword is queried to the search engine. An important problem in this process is keyword generation: given a business that is interested in launching a campaign, suggest keywords that are related to that campaign. We address this problem by making use of the query logs of the search engine. We identify queries related to a campaign by exploiting the associations between queries and URLs as they are captured by the user’s clicks. These queries form good keyword suggestions since they capture the “wisdom of the crowd ” as to what is related to a site. We formulate the problem as a semi-supervised learning problem, and propose algorithms within the Markov Random Field model. We perform experiments with real query logs, and we demonstrate that our algorithms scale to large query logs and produce meaningful results.
The Query-flow Graph: Model and Applications
, 2008
"... Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as well as their implicit feedback to search-engine results. Mining the wealth of information available in ..."
Abstract
-
Cited by 25 (13 self)
- Add to MetaCart
Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as well as their implicit feedback to search-engine results. Mining the wealth of information available in the query logs has many important applications including querylog analysis, user profiling and personalization, advertising, query recommendation, and more. In this paper we introduce the query-flow graph, a graph representation of the interesting knowledge about latent querying behavior. Intuitively, in the query-flow graph a directed edge from query qi to query qj means that the two queries are likely to be part of the same “search mission”. Any path over the query-flow graph may be seen as a searching behavior, whose likelihood is given by the strength of the edges along the path. The query-flow graph is an outcome of query-log mining and, at the same time, a useful tool for it. We propose a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users. Using this approach we build a real-world queryflow graph from a large-scale query log and we demonstrate its utility in concrete applications, namely, finding logical sessions, and query recommendation. We believe, however, that the usefulness of the query-flow graph goes beyond these two applications.
Query Suggestions Using Query-Flow Graphs
"... The query-flow graph [Boldi et al., CIKM 2008] is an aggregated representation of the latent querying behavior contained in a query log. Intuitively, in the query-flow graph a directed edge from query qi to query qj means that the two queries are likely to be part of the same search mission. Any pat ..."
Abstract
-
Cited by 18 (8 self)
- Add to MetaCart
The query-flow graph [Boldi et al., CIKM 2008] is an aggregated representation of the latent querying behavior contained in a query log. Intuitively, in the query-flow graph a directed edge from query qi to query qj means that the two queries are likely to be part of the same search mission. Any path over the query-flow graph may be seen as a possible search task, whose likelihood is given by the strength of the edges along the path. An edge (qi, qj) is also labelled with some information: e.g., the probability that user moves from qi to qj, or the type of the transition, for instance, the fact that qj is a specialization of qi. In this paper we propose, and experimentally study, query recommendations based on short random walks on the queryflow graph. Our experiments show that these methods can match in precision, and often improve, recommendations based on query-click graphs, without using users ’ clicks. Our experiments also show that it is important to consider transition-type labels on edges for having good quality recommendations. Finally, one feature that we had in mind while devising our methods was that of providing diverse sets of recommendations: the experimentation that we conducted provides encouraging results in this sense. 1.
Smoothing Clickthrough Data for Web Search Ranking
"... Incorporating features extracted from clickthrough data (called clickthrough features) has been demonstrated to significantly improve the performance of ranking models for Web search applications. Such benefits, however, are severely limited by the data sparseness problem, i.e., many queries and doc ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
Incorporating features extracted from clickthrough data (called clickthrough features) has been demonstrated to significantly improve the performance of ranking models for Web search applications. Such benefits, however, are severely limited by the data sparseness problem, i.e., many queries and documents have no or very few clicks. The ranker thus cannot rely strongly on clickthrough features for document ranking. This paper presents two smoothing methods to expand clickthrough data: query clustering via Random Walk on click graphs and a discounting method inspired by the Good-Turing estimator. Both methods are evaluated on real-world data in three Web search domains. Experimental results show that the ranking models trained on smoothed clickthrough features consistently outperform those trained on unsmoothed features. This study demonstrates both the importance and the benefits of dealing with the sparseness problem in clickthrough data.
From “dango” to “japanese cakes”: Query reformulation models and patterns. Submitted for publication
, 2008
"... Abstract—Understanding query reformulation patterns is a key step towards next generation web search engines: it can help improving users ’ web-search experience by predicting their intent, and thus helping them to locate information more effectively. As a step in this direction, we build an accurat ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Abstract—Understanding query reformulation patterns is a key step towards next generation web search engines: it can help improving users ’ web-search experience by predicting their intent, and thus helping them to locate information more effectively. As a step in this direction, we build an accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92 % accuracy. We apply the model to automatically label two large query logs, creating annotated queryflow graphs. We study the resulting reformulation patterns, finding results consistent with previous studies done on smaller manually annotated datasets, and discovering new interesting patterns, including connections between reformulation types and topical categories. Finally, applying our findings to a third query log that is publicly available for research purposes, we demonstrate that our reformulation classifier leads to improved recommendations in a query recommendation system. I.
Clustering query refinements by user intent
- In 19th International World Wide Web Conference, WWW
, 2010
"... We address the problem of clustering the refinements of a user search query. The clusters computed by our proposed algorithm can be used to improve the selection and placement of the query suggestions proposed by a search engine, and can also serve to summarize the different aspects of information r ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We address the problem of clustering the refinements of a user search query. The clusters computed by our proposed algorithm can be used to improve the selection and placement of the query suggestions proposed by a search engine, and can also serve to summarize the different aspects of information relevant to the original user query. Our algorithm clusters refinements based on their likely underlying user intents by combining document click and session cooccurrence information. At its core, our algorithm operates by performing multiple random walks on a Markov graph that approximates user search behavior. A user study performed on top search engine queries shows that our clusters are rated better than corresponding clusters computed using approaches that use only document click or only sessions co-occurrence information. 1.
Ready to Buy or Just Browsing? Detecting Web Searcher Goals from Interaction Data
"... An improved understanding of the relationship between search intent, result quality, and searcher behavior is crucial for improving the effectiveness of web search. While recent progress in user behavior mining has been largely focused on aggregate server-side click logs, we present a new class of s ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
An improved understanding of the relationship between search intent, result quality, and searcher behavior is crucial for improving the effectiveness of web search. While recent progress in user behavior mining has been largely focused on aggregate server-side click logs, we present a new class of search behavior models that also exploit fine-grained user interactions with the search results. We show that mining these interactions, such as mouse movements and scrolling, can enable more effective detection of the user’s search goals. Potential applications include automatic search evaluation, improving search ranking, result presentation, and search advertising. We describe extensive experimental evaluation over both controlled user studies, and logs of interaction data collected from hundreds of real users. The results show that our method is more effective than the current state-of-the-art techniques, both for detection of searcher goals, and for an important practical application of predicting ad clicks for a given search session.
On Social Networks and Collaborative Recommendation
"... Social network systems, like last.fm, play a significant role in Web 2.0, containing large amounts of multimedia-enriched data that are enhanced both by explicit user-provided annotations and implicit aggregated feedback describing the personal preferences of each user. It is also a common tendency ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Social network systems, like last.fm, play a significant role in Web 2.0, containing large amounts of multimedia-enriched data that are enhanced both by explicit user-provided annotations and implicit aggregated feedback describing the personal preferences of each user. It is also a common tendency for these systems to encourage the creation of virtual networks among their users by allowing them to establish bonds of friendship and thus provide a novel and direct medium for the exchange of data. We investigate the role of these additional relationships in developing a track recommendation system. Taking into account both the social annotation and friendships inherent in the social graph established among users, items and tags, we created a collaborative recommendation system that effectively adapts to the personal information needs of each user. We adopt the generic framework of Random Walk with Restarts in order to provide with a more natural and efficient way to represent social networks. In this work we collected a representative enough portion of the music social network last.fm, capturing explicitly expressed bonds of friendship of the user as well as social tags. We performed a series of comparison experiments between the Random Walk with Restarts model and a user-based collaborative filtering method using the Pearson Correlation similarity. The results show that the graph model system benefits from the additional information embedded in social knowledge. In addition, the graph model outperforms the standard collaborative filtering method.

