Results 1 - 10
of
104
The Query-flow Graph: Model and Applications
, 2008
"... Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as well as their implicit feedback to search-engine results. Mining the wealth of information available in ..."
Abstract
-
Cited by 112 (19 self)
- Add to MetaCart
(Show Context)
Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as well as their implicit feedback to search-engine results. Mining the wealth of information available in the query logs has many important applications including querylog analysis, user profiling and personalization, advertising, query recommendation, and more. In this paper we introduce the query-flow graph, a graph representation of the interesting knowledge about latent querying behavior. Intuitively, in the query-flow graph a directed edge from query qi to query qj means that the two queries are likely to be part of the same “search mission”. Any path over the query-flow graph may be seen as a searching behavior, whose likelihood is given by the strength of the edges along the path. The query-flow graph is an outcome of query-log mining and, at the same time, a useful tool for it. We propose a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users. Using this approach we build a real-world queryflow graph from a large-scale query log and we demonstrate its utility in concrete applications, namely, finding logical sessions, and query recommendation. We believe, however, that the usefulness of the query-flow graph goes beyond these two applications.
M.R.: #twittersearch: a comparison of microblog search and web search
- In: Proceedings of the fourth ACM international conference on Web search and data mining
, 2011
"... Social networking Web sites are not just places to maintain relationships; they can also be valuable information sources. However, little is known about how and why people search socially-generated content. In this paper we explore search behavior on the popular microblogging/social networking site ..."
Abstract
-
Cited by 90 (4 self)
- Add to MetaCart
(Show Context)
Social networking Web sites are not just places to maintain relationships; they can also be valuable information sources. However, little is known about how and why people search socially-generated content. In this paper we explore search behavior on the popular microblogging/social networking site Twitter. Using analysis of large-scale query logs and supplemental qualitative data, we observe that people search Twitter to find temporally relevant information (e.g., breaking news, real-time content, and popular trends) and information related to people (e.g., content directed at the searcher, information about people of interest, and general sentiment and opinion). Twitter queries are shorter, more popular, and less likely to evolve as part of a session than Web queries. It appears people repeat Twitter queries to monitor the associated search results, while changing and developing Web queries to learn about a topic. The results returned from the different corpora support these different uses, with Twitter results including more social chatter and social events, and Web results containing more basic facts and navigational content. We discuss the implications of these findings for the design of next-generation Web search tools that incorporate social media.
Large Scale Analysis of Web Revisitation Patterns
"... Our work examines Web revisitation patterns. Everybody revisits Web pages, but their reasons for doing so can differ depending on the particular Web page, their topic of interest, and their intent. To characterize how people revisit Web content, we analyzed five weeks of Web interaction logs of over ..."
Abstract
-
Cited by 74 (9 self)
- Add to MetaCart
(Show Context)
Our work examines Web revisitation patterns. Everybody revisits Web pages, but their reasons for doing so can differ depending on the particular Web page, their topic of interest, and their intent. To characterize how people revisit Web content, we analyzed five weeks of Web interaction logs of over 612,000 users. We supplemented these findings by a survey intended to identify the intent behind the observed revisitation. Our analysis reveals four primary revisitation patterns, each with unique behavioral, content, and structural characteristics. Through our analysis we illustrate how understanding revisitation patterns can enable Web sites to provide improved navigation, Web browsers to predict users ’ destinations, and search engines to better support fast, fresh, and effective finding and re-finding.
Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs
"... Users frequently modify a previous search query in hope of retrieving better results. These modifications are called query reformulations or query refinements. Existing research has studied how web search engines can propose reformulations, but has given less attention to how people perform query re ..."
Abstract
-
Cited by 51 (2 self)
- Add to MetaCart
(Show Context)
Users frequently modify a previous search query in hope of retrieving better results. These modifications are called query reformulations or query refinements. Existing research has studied how web search engines can propose reformulations, but has given less attention to how people perform query reformulations. In this paper, we aim to better understand how web searchers refine queries and form a theoretical foundation for query reformulation. We study users ’ reformulation strategies in the context of the AOL query logs. We create a taxonomy of query refinement strategies and build a high precision rule-based classifier to detect each type of reformulation. Effectiveness of reformulations is measured using user click behavior. Most reformulation strategies result in some benefit to the user. Certain strategies like add/remove words, word substitution, acronym expansion, and spelling correction are more likely to cause clicks, especially on higher ranked results. In contrast, users often click the same result as their previous query or select no results when forming acronyms and reordering words. Perhaps the most surprising finding is that some reformulations are better suited to helping users when the current results are already fruitful, while other reformulations are more effective when the results are lacking. Our findings inform the design of applications that can assist searchers; examples are described in this paper.
Large Scale Query Log Analysis of Re-Finding
"... Although Web search engines are targeted towards helping people find new information, people regularly use them to re-find Web pages they have seen before. Researchers have noted the existence of this phenomenon, but relatively little is understood about how re-finding behavior differs from the find ..."
Abstract
-
Cited by 45 (6 self)
- Add to MetaCart
(Show Context)
Although Web search engines are targeted towards helping people find new information, people regularly use them to re-find Web pages they have seen before. Researchers have noted the existence of this phenomenon, but relatively little is understood about how re-finding behavior differs from the finding of new information. This paper dives deeply into the differences via analysis of three large-scale data sources: 1) query logs (queries, clicks, result impressions), 2) Web browsing logs (URL visits), and 3) a daily Web crawl (page content). It appears that people learn valuable information about the pages they find that helps them re-find what they are looking for later; compared to the initial finding query, re-finding queries are typically shorter, and rank the re-found URL higher. While many instances of refinding probably serve as a type of bookmark for a known URL, others seem to represent the resumption of a previous task; results clicked at the end of a session are more likely than those at the beginning to be re-found during a later session, while re-finding is more likely to happen at the beginning of a session than at the end. Additionally, we observe differences in cross-session and intra-session re-finding that may indicate different types of refinding tasks. Our findings suggest there is a rich opportunity for search engines to take advantage of re-finding behavior as a means to improve the search experience.
The Web Changes Everything: Understanding the Dynamics of Web Content
"... The Web is a dynamic, ever changing collection of information. This paper explores changes in Web content by analyzing a crawl of 55,000 Web pages, selected to represent different user visitation patterns. Although change over long intervals has been explored on random (and potentially unvisited) sa ..."
Abstract
-
Cited by 44 (7 self)
- Add to MetaCart
(Show Context)
The Web is a dynamic, ever changing collection of information. This paper explores changes in Web content by analyzing a crawl of 55,000 Web pages, selected to represent different user visitation patterns. Although change over long intervals has been explored on random (and potentially unvisited) samples of Web pages, little is known about the nature of finer grained changes to pages that are actively consumed by users, such as those in our sample. We describe algorithms, analyses, and models for characterizing changes in Web content, focusing on both time (by using hourly and sub-hourly crawls) and structure (by looking at page-, DOM-, and term-level changes). Change rates are higher in our behavior-based sample than found in previous work on randomly sampled pages, with a large portion of pages changing more than hourly. Detailed content and structure analyses identify stable and dynamic content within each page. The understanding of Web change we develop in this paper has implications for tools designed to help people interact with dynamic Web content, such as search engines, advertising, and Web browsers.
Diversifying Web Search Results
"... Result diversity is a topic of great importance as more facets of queries are discovered and users expect to find their desired facets in the first page of the results. However, the underlying questions of how ‘diversity ’ interplays with ‘quality’ and when preference should be given to one or both ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
(Show Context)
Result diversity is a topic of great importance as more facets of queries are discovered and users expect to find their desired facets in the first page of the results. However, the underlying questions of how ‘diversity ’ interplays with ‘quality’ and when preference should be given to one or both are not well-understood. In this work, we model the problem as expectation maximization and study the challenges of estimating the model parameters and reaching an equilibrium. One model parameter, for example, is correlations between pages which we estimate using textual contents of pages and click data (when available). We conduct experiments on diversifying randomly selected queries from a query log and the queries chosen from the disambiguation topics of Wikipedia. Our algorithm improves upon Google in terms of the diversity of random queries, retrieving 14 % to 38% more aspects of queries in top 5, while maintaining a precision very close to Google. On a more selective set of queries that are expected to benefit from diversification, our algorithm improves upon Google in terms of precision and diversity of the results, and significantly outperforms another baseline system for result diversification.
Resonance on the web: web dynamics and revisitation patterns
, 2009
"... The Web is a dynamic, ever-changing collection of information accessed in a dynamic way. This paper explores the relationship between Web page content change (obtained from an hourly crawl of over 40K pages) and people’s revisitation to those pages (collected via a large scale log analysis of 2.3M u ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
(Show Context)
The Web is a dynamic, ever-changing collection of information accessed in a dynamic way. This paper explores the relationship between Web page content change (obtained from an hourly crawl of over 40K pages) and people’s revisitation to those pages (collected via a large scale log analysis of 2.3M users). We identify the relationship, or resonance, between revisitation behavior and the amount and type of changes on those pages. By coupling our large scale log analysis with a complementary user study we explore the intent behind the revisitation behavior we observed. Using the notion of resonance to identify the likely content of interest, we describe a number of ways interaction with changing and revisited information can be better supported. We illustrate how understanding the association between change and revisitation might improve browser, crawler, and search engine design, and present a specific example of how knowledge of both can enable relevant content to be highlighted.
A characterization of online browsing behavior.
- WWW
, 2010
"... ABSTRACT In this paper, we undertake a large-scale study of online user behavior based on search and toolbar logs. We propose a new CCS taxonomy of pageviews consisting of Content (news, portals, games, verticals, multimedia), Communication (email, social networking, forums, blogs, chat), and Searc ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT In this paper, we undertake a large-scale study of online user behavior based on search and toolbar logs. We propose a new CCS taxonomy of pageviews consisting of Content (news, portals, games, verticals, multimedia), Communication (email, social networking, forums, blogs, chat), and Search (Web search, item search, multimedia search). We show that roughly half of all pageviews online are content, one-third are communications, and the remaining one-sixth are search. We then give further breakdowns to characterize the pageviews within each high-level category. We then study the extent to which pages of certain types are revisited by the same user over time, and the mechanisms by which users move from page to page, within and across hosts, and within and across page types. We consider robust schemes for assigning responsibility for a pageview to ancestors along the chain of referrals. We show that mail, news, and social networking pageviews are insular in nature, appearing primarily in homogeneous sessions of one type. Search pageviews, on the other hand, appear on the path to a disproportionate number of pageviews, but cannot be viewed as the principal mechanism by which those pageviews were reached. Finally, we study the burstiness of pageviews associated with a URL, and show that by and large, online browsing behavior is not significantly affected by "breaking" material with non-uniform visit frequency.
Potential for Personalization
, 2009
"... Current Web search tools do a good job of retrieving documents that satisfy the most common intentions associated with a query, but do not do a very good job of discerning different individuals ’ unique search goals. We explore the variation in what different people consider relevant to the same que ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
(Show Context)
Current Web search tools do a good job of retrieving documents that satisfy the most common intentions associated with a query, but do not do a very good job of discerning different individuals ’ unique search goals. We explore the variation in what different people consider relevant to the same query by mining three data sources: 1) explicit relevance judgments, 2) clicks on search results (a behavior-based implicit measure of relevance), and 3) the similarity of desktop content to search results (a content-based implicit measure of relevance). We find that people’s explicit judgments for the same queries differ greatly. As a result, there is a large gap between how well search engines could perform if they were to tailor results to the individual, and how well they currently perform by returning results designed to satisfy everyone. We call this gap the potential for personalization. The two implicit indicators we studied provide complementary value for approximating this variation in result relevance among people. We discuss several uses of our findings, including a personalized search system that takes advantage of the implicit measures by ranking personally relevant results more highly and improving click-through rates.