Results 1 - 10
of
21
Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search
- ACM TRANSACTIONS ON INFORMATION SCIENCE (TOIS
, 2007
"... This paper examines the reliability of implicit feedback generated from clickthrough data and query reformulations in WWW search. Analyzing the users ’ decision process using eyetracking and comparing implicit feedback against manual relevance judgments, we conclude that clicks are informative but b ..."
Abstract
-
Cited by 64 (8 self)
- Add to MetaCart
This paper examines the reliability of implicit feedback generated from clickthrough data and query reformulations in WWW search. Analyzing the users ’ decision process using eyetracking and comparing implicit feedback against manual relevance judgments, we conclude that clicks are informative but biased. While this makes the interpretation of clicks as absolute relevance judgments difficult, we show that relative preferences derived from clicks are reasonably accurate on average. We find that such relative preferences are accurate not only between results from an individual query, but across multiple sets of results within chains of query reformulations.
Coverage, Relevance, and Ranking: The Impact of Query Operators on . . .
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2003
"... ..."
HyperLex: Lexical Cartography for Information Retrieval
- TO APPEAR IN COMPUTER SPEECH AND LANGUAGE SPECIAL ISSUE ON WORD SENSE DISAMBIGUATION
"... This article describes an algorithm called HyperLex that is capable of automatically determining word uses in a textbase without recourse to a dictionary. The algorithm makes use of the specific properties of word cooccurrence graphs, which are shown as having "small world" properties. Unl ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
This article describes an algorithm called HyperLex that is capable of automatically determining word uses in a textbase without recourse to a dictionary. The algorithm makes use of the specific properties of word cooccurrence graphs, which are shown as having "small world" properties. Unlike earlier dictionary-free methods based on word vectors, it can isolate highly infrequent uses (as rare as 1 % of all occurrences) by detecting "hubs " and high-density components in the cooccurrence graphs. The algorithm is applied here to information retrieval on the Web, using a set of highly ambiguous test words. An evaluation of the algorithm showed that it only omitted a very small number of relevant uses. In addition, HyperLex offers automatic tagging of word uses in context with excellent precision (97%, compared to 73 % for baseline tagging, with an 82 % recall rate). Remarkably good precision (96%) was also achieved on a selection of the 25 most relevant pages for each use (including highly infrequent ones). Finally, HyperLex is combined with a graphic display technique that allows the user to navigate visually through the lexicon and explore the various domains detected for each word use.
Understanding the Relationship between Searchers’ Queries and Information Goals
"... We describe results from Web search log studies aimed at elucidating user behaviors associated with queries and destination URLs that appear with different frequencies. We note the diversity of information goals that searchers have and the differing ways that goals are specified. We examine rare and ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
We describe results from Web search log studies aimed at elucidating user behaviors associated with queries and destination URLs that appear with different frequencies. We note the diversity of information goals that searchers have and the differing ways that goals are specified. We examine rare and common information goals that are specified using rare or common queries. We identify several significant differences in user behavior depending on the rarity of the query and the destination URL. We find that searchers are more likely to be successful when the frequencies of the query and destination URL are similar. We also establish that the behavioral differences observed for queries and goals of varying rarity persist even after accounting for potential confounding variables, including query length, search engine ranking, session duration, and task difficulty. Finally, using an information-theoretic measure of search difficulty, we show that the benefits obtained by search and navigation actions depend on the frequency of the information goal.
Evaluating implicit feedback models using searcher simulations
- ACM Transactions on Information Systems
, 2005
"... In this article we describe an evaluation of relevance feedback (RF) algorithms using searcher simulations. Since these algorithms select additional terms for query modification based on inferences made from searcher interaction, not on relevance information searchers explicitly provide (as in tradi ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
In this article we describe an evaluation of relevance feedback (RF) algorithms using searcher simulations. Since these algorithms select additional terms for query modification based on inferences made from searcher interaction, not on relevance information searchers explicitly provide (as in traditional RF), we refer to them as implicit feedback models. Weintroduce six different models that base their decisions on the interactions of searchers and use different approaches to rank query modification terms. The aim of this article is to determine which of these models should be used to assist searchers in the systems we develop. To evaluate these models we used searcher simulations that afforded us more control over the experimental conditions than experiments with human subjects and allowed complex interaction to be modeled without the need for costly human experimentation. The simulation-based evaluation methodology measures how well the models learn the distribution of terms across relevant documents (i.e., learn what information is relevant) and how well they improve search effectiveness (i.e., create effective search queries). Our findings show that an implicit feedback model based on Jeffrey’s rule of conditioning outperformed other
Interest-based personalized search
- ACM Trans. Inf. Syst
"... Web search engines typically provide search results without considering user interests or context. We propose a personalized search approach that can easily extend a conventional search engine on the client side. Our mapping framework automatically maps a set of known user interests onto a group of ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Web search engines typically provide search results without considering user interests or context. We propose a personalized search approach that can easily extend a conventional search engine on the client side. Our mapping framework automatically maps a set of known user interests onto a group of categories in the Open Directory Project (ODP) and takes advantage of manually edited data available in ODP for training text classifiers that correspond to, and therefore categorize and personalize search results according to user interests. In two sets of controlled experiments, we compare our personalized categorization system (PCAT) with a list interface system (LIST) that mimics a typical search engine and with a nonpersonalized categorization system (CAT). In both experiments, we analyze system performances on the basis of the type of task and query length. We find that PCAT is preferable to LIST for information gathering types of tasks and for searches with short queries, and PCAT outperforms CAT in both information gathering and finding types of tasks, and for searches associated with free-form queries. From the subjects ’ answers to a questionnaire, we find that PCAT is perceived as a system that can find relevant Web pages quicker and easier
Ranking function optimization for effective web search by genetic programming: An empirical study
- in Proceedings of 37th Hawaii International Conference on System Sciences
, 2004
"... Abstract — Web search engines have become indispensable in our daily life to help us find the information we need. Although search engines are very fast in search response time, their effectiveness in finding useful and relevant documents at the top of the search hit list needs to be improved. In th ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
Abstract — Web search engines have become indispensable in our daily life to help us find the information we need. Although search engines are very fast in search response time, their effectiveness in finding useful and relevant documents at the top of the search hit list needs to be improved. In this paper, we report our experience applying Genetic Programming (GP) to the ranking function discovery problem leveraging the structural information of HTML documents. Our empirical experiments using the web track data from recent TREC conferences show that we can discover better ranking functions than existing well-known ranking strategies from IR, such as Okapi, Ptfidf. The performance is even comparable to those obtained by Support Vector Machine. I.
Effective Profiling of Consumer Information Retrieval Needs: A Unified Framework and Empirical Comparison
"... Due to the overwhelming volume of information that is increasingly available, many people rely on current awareness systems to keep abreast of the latest developments in the fields that they are interested in, as evidenced in the popularity of subscriptions to news-monitoring and digital library ser ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Due to the overwhelming volume of information that is increasingly available, many people rely on current awareness systems to keep abreast of the latest developments in the fields that they are interested in, as evidenced in the popularity of subscriptions to news-monitoring and digital library services. The success of these services, however, often requires effective acquisition of users' personal standing interests as represented in personal profiles. Our objective in this paper is twofold. First, we have introduced a new method for profile generation and compared it against other well-known methods. We have found promising results. Second, although there are various methods proposed in information retrieval and machine learning literature to address the issue of profiling, a unified framework and systematic cross-system comparison to help users, especially service providers, to determine the most effective way of profiling consumers is still lacking in the literature. In this paper, we try to fill the gap by looking at these methods from a more integrated point of view based on statistical contingency theory. Variations of these methods are then systematically tested on three well-known routing systems and results are analyzed and reported.
The Comparative Effectiveness of Sponsored and Non-Sponsored Links for Web Ecommerce Queries
- ACM Transactions on the Web
, 2007
"... The predominant business model for Web search engines is sponsored search, which generates billions in yearly revenue. But are sponsored links providing online consumers with relevant choices for products and services? We address this and related issues by investigating the relevance of sponsored an ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
The predominant business model for Web search engines is sponsored search, which generates billions in yearly revenue. But are sponsored links providing online consumers with relevant choices for products and services? We address this and related issues by investigating the relevance of sponsored and nonsponsored links for e-commerce queries on the major search engines. The results show that average relevance ratings for sponsored and nonsponsored links are practically the same, although the relevance ratings for sponsored links are statistically higher. We used 108 ecommerce queries and 8,256 retrieved links for these queries from three major Web search engines: Yahoo!, Google, and MSN. In addition to relevance measures, we qualitatively analyzed the e-commerce queries, deriving five categorizations of underlying information needs. Product-specific queries are the most prevalent (48%). Title (62%) and summary (33%) are the primary basis for evaluating sponsored links with URL a distant third (2%). To gauge the effectiveness of sponsored search campaigns, we analyzed the sponsored links from various viewpoints. It appears that links from organizations with large sponsored search campaigns are more relevant than the average sponsored link. We discuss the implications for Web search engines and sponsored search as a long-term business model and as a mechanism for finding relevant information for searchers.
The Use of Dynamic Contexts to Improve Casual Internet Searching
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2003
"... ..."

