Results 1 - 10
of
64
Personalizing search via automated analysis of interests and activities
, 2005
"... We formulate and study search algorithms that consider a user’s prior interactions with a wide variety of content to personalize that user’s current Web search. Rather than relying on the unrealistic assumption that people will precisely specify their intent when searching, we pursue techniques that ..."
Abstract
-
Cited by 134 (18 self)
- Add to MetaCart
We formulate and study search algorithms that consider a user’s prior interactions with a wide variety of content to personalize that user’s current Web search. Rather than relying on the unrealistic assumption that people will precisely specify their intent when searching, we pursue techniques that leverage implicit information about the user’s interests. This information is used to re-rank Web search results within a relevance feedback framework. We explore rich models of user interests, built from both search-related information, such as previously issued queries and previously visited Web pages, and other information about the user such as documents and email the user has read and created. Our research suggests that rich representations of the user and the corpus are important for personalization, but that it is possible to approximate these representations and provide efficient client-side algorithms for personalizing search. We show that such personalization algorithms can significantly improve on current Web search.
Generating query substitutions
- In WWW
, 2006
"... We introduce the notion of query substitution, that is, generating a new query to replace a user’s original search query. Our technique uses modifications based on typical substitutions web searchers make to their queries. In this way the new query is strongly related to the original query, containi ..."
Abstract
-
Cited by 124 (5 self)
- Add to MetaCart
We introduce the notion of query substitution, that is, generating a new query to replace a user’s original search query. Our technique uses modifications based on typical substitutions web searchers make to their queries. In this way the new query is strongly related to the original query, containing terms closely related to all of the original terms. This contrasts with query expansion through pseudo-relevance feedback, which is costly and can lead to query drift. This also contrasts with query relaxation through boolean or TFIDF retrieval, which reduces the specificity of the query. We define a scale for evaluating query substitution, and show that our method performs well at generating new queries related to the original queries. We build a model for selecting between candidates, by using a number of features relating the query-candidate pair, and by fitting the model to human judgments of relevance of query suggestions. This further improves the quality of the candidates generated. Experiments show that our techniques significantly increase coverage and effectiveness in the setting of sponsored search.
Studying the use of popular destinations to enhance Web search interaction
- ACM SIGIR '07. ACM
, 2007
"... We present a novel Web search interaction feature which, for a given query, provides links to websites frequently visited by other users with similar information needs. These popular destinations complement traditional search results, allowing direct navigation to authoritative resources for the que ..."
Abstract
-
Cited by 44 (10 self)
- Add to MetaCart
We present a novel Web search interaction feature which, for a given query, provides links to websites frequently visited by other users with similar information needs. These popular destinations complement traditional search results, allowing direct navigation to authoritative resources for the query topic. Destinations are identified using the history of search and browsing behavior of many users over an extended time period, whose collective behavior provides a basis for computing source authority. We describe a user study which compared the suggestion of destinations with the previously proposed suggestion of related queries, as well as with traditional, unaided Web search. Results show that search enhanced by destination suggestions outperforms other systems for exploratory tasks, with best performance obtained from mining past user behavior at query-level granularity.
Mining Anchor Text for Query Refinement
- WWW2004
, 2004
"... When searching large hypertext document collections, it is often possible that there are too many results available for ambiguous queries. Query refinement is an interactive process of query modification that can be used to narrow down the scope of search results. We propose a new method for automat ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
When searching large hypertext document collections, it is often possible that there are too many results available for ambiguous queries. Query refinement is an interactive process of query modification that can be used to narrow down the scope of search results. We propose a new method for automatically generating refinements or related terms to queries by mining anchor text for a large hypertext document collection. We show that the usage of anchor text as a basis for query refinement produces high quality refinement suggestions that are significantly better in terms of perceived usefulness compared to refinements that are derived using the document content. Furthermore, our study suggests that anchor text refinements can also be used to augment traditional query refinement algorithms based on query logs, since they typically differ in coverage and produce different refinements. Our results are based on experiments on an anchor text collection of a large corporate intranet.
Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs
- In Conference on Information and Knowledge Management (CIKM
, 2008
"... Most analysis of web search relevance and performance takes a single query as the unit of search engine interaction. When studies attempt to group queries together by task or session, a timeout is typically used to identify the boundary. However, users query search engines in order to accomplish tas ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
Most analysis of web search relevance and performance takes a single query as the unit of search engine interaction. When studies attempt to group queries together by task or session, a timeout is typically used to identify the boundary. However, users query search engines in order to accomplish tasks at a variety of granularities, issuing multiple queries as they attempt to accomplish tasks. In this work we study real sessions manually labeled into hierarchical tasks, and show that timeouts, whatever their length, are of limited utility in identifying task boundaries, achieving a maximum precision of only 70%. We report on properties of this search task hierarchy, as seen in a random sample of user interactions from a major web search engine’s log, annotated by human editors, learning that 17 % of tasks are interleaved, and 20 % are hierarchically organized. No previous work has analyzed or addressed automatic identification of interleaved and hierarchically organized search tasks. We propose and evaluate a method for the automated segmentation of users’ query streams into hierarchical units. Our classifiers can improve on timeout segmentation, as well as other previously published approaches, bringing the accuracy up to 92% for identifying fine-grained task boundaries, and 89-97 % for identifying pairs of queries from the same task when tasks are interleaved hierarchically. This is the first work to identify, measure and automatically segment sequences of user queries into their hierarchical structure. The ability to perform this kind of segmentation paves the way for evaluating search engines in terms of user task completion.
A novel log-based relevance feedback technique in content-based image retrieval
, 2004
"... Relevance feedback has been proposed as an important technique to boost the retrieval performance in content-based image retrieval (CBIR). However, since there exists a semantic gap between low-level features and high-level semantic concepts in CBIR, typical relevance feedback techniques need to per ..."
Abstract
-
Cited by 29 (10 self)
- Add to MetaCart
Relevance feedback has been proposed as an important technique to boost the retrieval performance in content-based image retrieval (CBIR). However, since there exists a semantic gap between low-level features and high-level semantic concepts in CBIR, typical relevance feedback techniques need to perform a lot of rounds of feedback for achieving satisfactory results. These procedures are time-consuming and may make the users bored in the retrieval tasks. For a long-term study purpose in CBIR, we notice that the users’ feedback logs can be available and employed for helping the retrieval tasks in CBIR systems. In this paper, we propose a novel scheme to study the log-based relevance feedback (LRF) technique for improving retrieval performance and reducing the semantic gap in CBIR. In order to effectively incorporate the users ’ feedback logs, we propose a modified support vector machine (SVM) technique called soft label support vector machine (SLSVM) to construct the LRF algorithm in CBIR. We conduct extensive experiments to evaluate the performance of our proposed algorithm. Compared with the typical approach using query expansion (QEX) technique, we demonstrate that our proposed scheme can significantly improve the retrieval performance of semantic image retrieval from detailed experiments.
Search advertising using web relevance feedback
- In Proc 17th. Intl. Conf. on Information and Knowledge Management
, 2008
"... The business of Web search, a $10 billion industry, relies heavily on sponsored search, whereas a few carefully-selected paid advertisements are displayed alongside algorithmic search results. A key technical challenge in sponsored search is to select ads that are relevant for the user’s query. Iden ..."
Abstract
-
Cited by 25 (10 self)
- Add to MetaCart
The business of Web search, a $10 billion industry, relies heavily on sponsored search, whereas a few carefully-selected paid advertisements are displayed alongside algorithmic search results. A key technical challenge in sponsored search is to select ads that are relevant for the user’s query. Identifying relevant ads is challenging because queries are usually very short, and because users, consciously or not, choose terms intended to lead to optimal Web search results and not to optimal ads. Furthermore, the ads themselves are short and usually formulated to capture the reader’s attention rather than to facilitate query matching. Traditionally, matching of ads to queries employed standard information retrieval techniques using the bag of words approach. Here we propose to go beyond the bag of words, and augment both queries and ads with additional knowledgerich features. We use Web search results initially returned for the query to create a pool of relevant documents. Classifying these documents with respect to an external taxonomy and identifying salient named entities give rise to two new feature types. Empirical evaluation based on over 9,000 query-ad pairwise judgments confirms that using augmented queries produces highly relevant ads. Our methodology also relaxes the requirement for each ad to explicitly specify the exhaustive list of queries (“bid phrases”) that can trigger it.
A unified log-based relevance feedback scheme for image retrieval
- IEEE Transactions on Knowledge and Data Engineering
, 2006
"... Abstract—Relevance feedback has emerged as a powerful tool to boost the retrieval performance in content-based image retrieval (CBIR). In the past, most research efforts in this field have focused on designing effective algorithms for traditional relevance feedback. Given that a CBIR system can coll ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
Abstract—Relevance feedback has emerged as a powerful tool to boost the retrieval performance in content-based image retrieval (CBIR). In the past, most research efforts in this field have focused on designing effective algorithms for traditional relevance feedback. Given that a CBIR system can collect and store users ’ relevance feedback information in a history log, an image retrieval system should be able to take advantage of the log data of users ’ feedback to enhance its retrieval performance. In this paper, we propose a unified framework for log-based relevance feedback that integrates the log of feedback data into the traditional relevance feedback schemes to learn effectively the correlation between low-level image features and high-level concepts. Given the error-prone nature of log data, we present a novel learning technique, named Soft Label Support Vector Machine, to tackle the noisy data problem. Extensive experiments are designed and conducted to evaluate the proposed algorithms based on the COREL image data set. The promising experimental results validate the effectiveness of our log-based relevance feedback scheme empirically. Index Terms—Content-based image retrieval, relevance feedback, log-based relevance feedback, log data, user issues, semantic gap, support vector machines.
Optimizing relevance and revenue in ad search: A query substitution approach
- In SIGIR’08
, 2008
"... The primary business model behind Web search is based on textual advertising, where contextually relevant ads are displayed alongside search results. We address the problem of selecting these ads so that they are both relevant to the queries and profitable to the search engine, showing that optimizi ..."
Abstract
-
Cited by 20 (8 self)
- Add to MetaCart
The primary business model behind Web search is based on textual advertising, where contextually relevant ads are displayed alongside search results. We address the problem of selecting these ads so that they are both relevant to the queries and profitable to the search engine, showing that optimizing ad relevance and revenue is not equivalent. Selecting the best ads that satisfy these constraints also naturally incurs high computational costs, and time constraints can lead to reduced relevance and profitability. We propose a novel two-stage approach, which conducts most of the analysis ahead of time. An offline preprocessing phase leverages additional knowledge that is impractical to use in real time, and rewrites frequent queries in a way that subsequently facilitates fast and accurate online matching. Empirical evaluation shows that our method optimized for relevance matches a state-of-the-art method while improving expected revenue. When optimizing for revenue, we see even more substantial improvements in expected revenue.
The loquacious user: A document-independent source of terms for query expansion
- In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval
, 2005
"... [dianek | vijayd | fu] @ email.unc.edu In this paper we investigate the effectiveness of a documentindependent technique for eliciting feedback from users about their information problems. We propose that such a technique can be used to elicit terms from users for use in query expansion and as a fo ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
[dianek | vijayd | fu] @ email.unc.edu In this paper we investigate the effectiveness of a documentindependent technique for eliciting feedback from users about their information problems. We propose that such a technique can be used to elicit terms from users for use in query expansion and as a follow-up when ambiguous queries are initially posed by users. We design a feedback form to obtain additional information from users, administer the form to users after initial querying, and create a series of experimental runs based on the information that we obtained from the form. Results demonstrate that the form was successful at eliciting more information from users and that this additional information significantly improved retrieval performance. Our results further demonstrate a strong relationship between query length and performance.

