Results 1 - 10
of
113
Searching The Web: The Public and Their Queries
, 2001
"... In studying actual Web searching by the public at large, we analyzed over one million Web queries by users of the Excite search engine. We found that most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. A small number of search terms ar ..."
Abstract
-
Cited by 188 (7 self)
- Add to MetaCart
In studying actual Web searching by the public at large, we analyzed over one million Web queries by users of the Excite search engine. We found that most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. A small number of search terms are used with high frequency, and a great many terms are unique; the language of Web queries is distinctive. Queries about recreation and entertainment rank highest. Findings are compared to data from two other large studies of Web queries. This study provides an insight into the public practices and choices in Web searching.
Agglomerative Clustering of a Search Engine Query Log
- In Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2000
"... This paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is "clickthrough data": each record consists of a user's query to a search engine along with the URL whic ..."
Abstract
-
Cited by 173 (0 self)
- Add to MetaCart
This paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is "clickthrough data": each record consists of a user's query to a search engine along with the URL which the user selected from among the candidates offered by the search engine. By viewing this dataset as a bipartite graph, with the vertices on one side corresponding to queries and on the other side to URLs, one can apply an agglomerative clustering algorithm to the graph's vertices to identify related queries and URLs. One noteworthy feature of the proposed algorithm is that it is "content-ignorant"---the algorithm makes no use of the actual content of the queries or URLs, but only how they co-occur within the clickthrough data. We describe how to enlist the discovered clusters to assist users in web search, and measure the effectiveness of the discovered clusters in the Lycos search engine...
Real Life, Real Users, and Real Needs: A Study and Analysis
- of user queries on the Web. Information Processing and Management
, 2000
"... We analyzed transaction logs containing 51,473 queries posed by 18,113 users of Excite, a major Internet search service. We provide data on: (i) sessions Ð changes in queries during a session, number of pages viewed, and use of relevance feedback; (ii) queries Ð the number of search terms, and the u ..."
Abstract
-
Cited by 166 (22 self)
- Add to MetaCart
We analyzed transaction logs containing 51,473 queries posed by 18,113 users of Excite, a major Internet search service. We provide data on: (i) sessions Ð changes in queries during a session, number of pages viewed, and use of relevance feedback; (ii) queries Ð the number of search terms, and the use of logic and modi®ers; and (iii) terms Ð their rank/frequency distribution and the most highly used search terms. We then shift the focus of analysis from the query to the user to gain insight to the characteristics of the Web user. With these characteristics as a basis, we then conducted a failure
Building efficient and effective metasearch engines
- ACM Computing Surveys
, 2002
"... Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support unified access to multiple search engines, a met ..."
Abstract
-
Cited by 107 (9 self)
- Add to MetaCart
Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support unified access to multiple search engines, a metasearch engine can be constructed. When a metasearch engine receives a query from a user, it invokes the underlying search engines to retrieve useful information for the user. Metasearch engines have other benefits as a search tool such as increasing the search coverage of the Web and improving the scalability of the search. In this article, we survey techniques that have been proposed to tackle several underlying challenges for building a good metasearch engine. Among the main challenges, the database selection problem is to identify search engines that are likely to return useful documents to a given query. The document selection problem is to determine what documents to retrieve from each identified search engine. The result merging problem is to combine the documents returned from multiple search engines. We will also point out some problems that need to be further researched.
COMBINING APPROACHES TO INFORMATION RETRIEVAL
"... The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. Combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “meta-search” engines used on the W ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. Combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “meta-search” engines used on the Web. This paper examines the development of this technique, including both experimental results and the retrieval models that have been proposed as formal frameworks for combination. We show that combining approaches for information retrieval can be modeled as combining the outputs of multiple classifiers based on one or more representations, and that this simple model can provide explanations for many of the experimental results. We also show that this view of combination is very similar to the inference net model, and that a new approach to retrieval based on language models supports combination and can be integrated with the inference net model.
Results and Challenges in Web Search Evaluation
, 1999
"... A frozen 18.5 million page snapshot of part of the Web has been created to enable and encourage meaningful and reproducible evaluation of Web search systems and techniques. This collection is being used in an evaluation framework within the Text Retrieval Conference (TREC) and will hopefully provide ..."
Abstract
-
Cited by 59 (7 self)
- Add to MetaCart
A frozen 18.5 million page snapshot of part of the Web has been created to enable and encourage meaningful and reproducible evaluation of Web search systems and techniques. This collection is being used in an evaluation framework within the Text Retrieval Conference (TREC) and will hopefully provide convincing answers to questions such as, "Can link information result in better rankings?", "Do longer queries result in better answers?", and, "Do TREC systems work well on Web data?" The snapshot and associated evaluation methods are described and an invitation is extended to participate. Preliminary results are presented for an effectivess comparison of six TREC systems working on the snapshot collection against five well-known Web search systems working over the current Web. These suggest that the standard of document rankings produced by public Web search engines is by no means state-of-the-art. Keywords: Evaluation; Search engines; Test collection; TREC; Methodology The authors wi...
Locality in Search Engine Queries and Its Implications for Caching
- In IEEE Infocom 2002
, 2002
"... Caching is a popular technique for reducing both server load and user response time in distributed systems. In this paper, we consider the question of whether caching might be effective for search engines as well. We study two real search engine traces by examining query locality and its implication ..."
Abstract
-
Cited by 56 (0 self)
- Add to MetaCart
Caching is a popular technique for reducing both server load and user response time in distributed systems. In this paper, we consider the question of whether caching might be effective for search engines as well. We study two real search engine traces by examining query locality and its implications for caching. Our trace analysis results show that: (1) Queries have significant locality, with query frequency following a Zipf distribution. Very popular queries are shared among different users and can be cached at servers or proxies, while 16% to 22% of the queries are from the same users and should be cached at the user side. Multiple-word queries are shared less and should be cached mainly at the user side. (2) If caching is to be done at the user side, short-term caching for hours will be enough to cover query temporal locality, while server/proxy caching should use longer periods, such as days. (3) Most users have small lexicons when submitting queries. Frequent users who submit many search requests tend to reuse a small subset of words to form queries. Thus, with proxy or user side caching, prefetching based on user lexicon looks promising.
Query recommendation using query logs in search engines
- In International Workshop on Clustering Information over the Web (ClustWeb, in conjunction with EDBT), Creete
, 2004
"... Abstract. In this paper we propose a method that, given a query submitted to a search engine, suggests a list of related queries. The related queries are based in previously issued queries, and can be issued by the user to the search engine to tune or redirect the search process. The method proposed ..."
Abstract
-
Cited by 50 (3 self)
- Add to MetaCart
Abstract. In this paper we propose a method that, given a query submitted to a search engine, suggests a list of related queries. The related queries are based in previously issued queries, and can be issued by the user to the search engine to tune or redirect the search process. The method proposed is based on a query clustering process in which groups of semantically similar queries are identified. The clustering process uses the content of historical preferences of users registered in the query log of the search engine. The method not only discovers the related queries, but also ranks them according to a relevance criterion. Finally, we show with experiments over the query log of a search engine the effectiveness of the method. 1
Interactive Internet search: Keyword, directory and query reformulation mechanisms compared
- In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 2000
"... This article compares search effectiveness when using query-based Internet search (via the Google search engine), directory-based search (via Yahoo) and phrasebased query reformulation assisted search (via the Hyperindex browser) by means of a controlled, userbased experimental study. The focus ..."
Abstract
-
Cited by 39 (3 self)
- Add to MetaCart
This article compares search effectiveness when using query-based Internet search (via the Google search engine), directory-based search (via Yahoo) and phrasebased query reformulation assisted search (via the Hyperindex browser) by means of a controlled, userbased experimental study. The focus was to evaluate aspects of the search process. Cognitive load was measured using a secondary digit-monitoring task to quantify the effort of the user in various search states; independent relevance judgements were employed to gauge the quality of the documents accessed during the search process. Time was monitored in various search states. Results indicated the directory-based search does not offer increased relevance over the query-based search (with or without query formulation assistance), and also takes longer. Query reformulation does significantly improve the relevance of the documents through which the user must trawl versus standard query-based internet search. However,...
Combining evidence for automatic Web session identification
- Information Processing and Management
, 2002
"... Contextual information provides an important basis for identifying and understanding users ’ information needs. Our previous work in traditional information retrieval systems has shown how using contextual information could improve retrieval performance. With the vast quantity and variety of informa ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
Contextual information provides an important basis for identifying and understanding users ’ information needs. Our previous work in traditional information retrieval systems has shown how using contextual information could improve retrieval performance. With the vast quantity and variety of information available on the Web, and the short query lengths within Web searches, it becomes even more crucial that appropriate contextual information is extracted to facilitate personalized services. However, finding users’ contextual information is not straightforward, especially in the Web search environment where less is known about the individual users. In this paper, we will present an approach that has significant potential for studying Web users ’ search contexts. The approach automatically groups a user’s consecutive search activities on the same search topic into one session. It uses Dempster–Shafer theory to combine evidence extracted from two sources, each of which is based on the statistical data from Web search logs. The evaluation we have performed demonstrates that our approach has achieved a significant improvement

