Results 1 - 10
of
30
Improved Algorithms for Topic Distillation in a Hyperlinked Environment
, 1998
"... This paper addresses the problem of topic distillation on the World Wide Web, namely, given a typical user query to find quality documents related to the query topic. Connectivity analysis has been shown to be useful in identifying high quality pages within a topic specific graph of hyperlinked docu ..."
Abstract
-
Cited by 373 (6 self)
- Add to MetaCart
This paper addresses the problem of topic distillation on the World Wide Web, namely, given a typical user query to find quality documents related to the query topic. Connectivity analysis has been shown to be useful in identifying high quality pages within a topic specific graph of hyperlinked documents. The essence of our approach is to augment a previous connectivity analysis based algorithm with content analysis. We identify three problems with the existing approach and devise algorithms to tackle them. The results of a user evaluation are reported that show an improvement of precision at 10 documents by at least 45% over pure connectivity analysis.
Searching the Web: a survey of EXCITE users
- Internet Research: Electronic Networking Applications and Policy
, 1999
"... Web search services are now a major source of information for a growing number of people. We need to know more about how users search Web search engines to improve the effectiveness of their information retrieval. This paper reports results from a major study exploring users ’ information searching ..."
Abstract
-
Cited by 25 (7 self)
- Add to MetaCart
Web search services are now a major source of information for a growing number of people. We need to know more about how users search Web search engines to improve the effectiveness of their information retrieval. This paper reports results from a major study exploring users ’ information searching behavior on the EXCITE Web search engine. The study is the first to investigate Web users ’ successive searching behavior as they conduct related searches over time on the same or evolving topic. A total of 316 EXCITE users responded to an interactive survey accessed through EXCITE’s homepage. Users provided information on their search topics, intended query terms, search frequency for information on their topic, and demographic data. Results show that when searching the Web: users tend to employ simple search strategies; and often conduct more than one search (successive searches) over time to find information related to a particular topic. Implications for the design of Web search services are discussed.
Form and function: The impact of query term and operator usage on Web search results
- Journal of the American Society for Information Science and Technology
, 2002
"... Conventional wisdom holds that queries to information retrieval systems will yield more relevant results if they contain multiple topic-related terms and use Boolean and phrase operators to enhance interpretation. AlthoughstudieshaveshownthattheusersofWeb-based searchenginestypicallyentershort,term- ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Conventional wisdom holds that queries to information retrieval systems will yield more relevant results if they contain multiple topic-related terms and use Boolean and phrase operators to enhance interpretation. AlthoughstudieshaveshownthattheusersofWeb-based searchenginestypicallyentershort,term-basedqueries and rarely use search operators, little information exists concerning the effects of term and operator usage on the relevancy of search results. In this study, search engine users formulated queries on eight search topics. Each query was submitted to the user-specified search engine, and relevancy ratings for the retrieved pages were assigned. Expert-formulated queries were also submittedandprovidedabasisforcomparingrelevancy ratings across search engines. Data analysis based on our research model of the term and operator factors affecting relevancy was then conducted. The results showthatthedifferenceinthenumberoftermsbetween expert and nonexpert searches, the percentage of matching terms between those searches, and the erroneous use of nonsupported operators in nonexpert searchesexplainmostofthevariationintherelevancyof search results. These findings highlight the need for designing search engine interfaces that provide greater support in the areas of term selection and operator usage.
FIGI: The Architecture of an Internet-based Financial Information Gathering Infrastructure
- In Proceedings of the International Workshop on Advanced Issues of E-Commerce and Web-based Information Systems
, 1999
"... In this paper we present the architecture of a Financial Information Gathering Infrastructure (FIGI). FIGI helps investors collect, filter, combine and integrate portfolio-related information provided through various Internet services like World-Wide Web sites and Web-databases. FIGI is being develo ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
In this paper we present the architecture of a Financial Information Gathering Infrastructure (FIGI). FIGI helps investors collect, filter, combine and integrate portfolio-related information provided through various Internet services like World-Wide Web sites and Web-databases. FIGI is being developed with Java-based Mobile Agent technology by Mitsubishi Electric Information Technology Center. The employment of Java and Mobile Agents provides us with a framework for unifying the various financial information services currently available on Internet and for sustaining continuous information provision even to mobile users.
Towards Information Retrieval Measures for Evaluation of Web Search Engines
, 1999
"... Information retrieval on the Web is very different from retrieval in traditional indexed databases. This difference arises from: the high degree of dynamism of the Web; its hyper-linked character; the absence of a controlled indexing vocabulary; the heterogeneity of document types and authoring styl ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Information retrieval on the Web is very different from retrieval in traditional indexed databases. This difference arises from: the high degree of dynamism of the Web; its hyper-linked character; the absence of a controlled indexing vocabulary; the heterogeneity of document types and authoring styles; the easy access that different types of users may have to it. Thus, since Web retrieval is substantially different from information retrieval, new or revised evaluative measures are required to assess retrieval performance using Web search engines. This paper suggests a number of different measures to evaluate information retrieval from the Web. The motivation behind each of these measures is presented, along with their descriptions and definitions. In the second part of the paper, application of these measures is illustrated in the evaluation of three search engines. The purpose of this paper is not to give the definite prescription for evaluating information retrieval from the Web, but rather to present some examples and to initiate a wider discussion of how to enhance measures of Web search performance.
Emerging semantic communities in peer web search
- In P2PIR ’06: Proceedings of the international workshop on Information retrieval in peer-to-peer networks
, 2006
"... Peer network systems are becoming an increasingly important development in Web search technology. Many studies show that peer search systems perform better when a query is sent to a group of peers semantically similar to the query. This suggests that semantic communities should form so that a query ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Peer network systems are becoming an increasingly important development in Web search technology. Many studies show that peer search systems perform better when a query is sent to a group of peers semantically similar to the query. This suggests that semantic communities should form so that a query can quickly propagate to many appropriate peers. For the network to be functional, its dynamic communication topology must match the semantic clustering of peers. We introduce two criteria to evaluate a peer search network based on the concept of semantic locality: first, the “smallworld” topology of the network; second, we use topical semantic similarity to monitor the quality of a peer’s neighbors over time by looking at whether a peer chooses semantically appropriate neighbors to route its queries. We present several simulation experiments conducted with different peer search algorithms on our peer Web search system, 6S. The results suggest that 6S, despite its use of an unstructured overlay network; can effectively foster the spontaneous formation of semantic communities through local peer interactions alone.
Precision among World Wide Web search services (search engines
- Department of Computer Science, University of Minnesota
, 1997
"... This study was conducted in part to correct the problems present in Leighton's early study (Leighton, 1995). In that study, the test suite was inadequate, the statistical model was inappropriate, and the methods were subject to possible bias. This project completed Mr. Leighton's Masters in Computer ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This study was conducted in part to correct the problems present in Leighton's early study (Leighton, 1995). In that study, the test suite was inadequate, the statistical model was inappropriate, and the methods were subject to possible bias. This project completed Mr. Leighton's Masters in Computer Science, and the professor who had critiqued the original study, Dr. Lilja, was on the committee that approved the oral defense. Acknowledgements: We would like to thank Carol Blumberg and Brant Deppa of the Mathematics and Statistics Department of Winona State University for their assistance in the design of the project and the statistical analysis of the data. We would also like to thank Don Byrd of the University of Massachusetts at Amherst for his advice and suggestions on related literature in the field. H. Vernon Leighton 6/16/97 Five search engines, Alta Vista, Excite, Hotbot, Infoseek, and Lycos, are compared for precision on the first twenty results returned for fifteen queries. All searching was done from January 31 to March 12, 1997. In the study, steps are taken to ensure that bias has not unduly influenced the evaluation. Friedmann's randomized block design is used to perform multiple comparisons for significance. Analysis
Precision Evaluation of Search Engines
- World Wide Web
, 2002
"... In this paper, we present a general approach for statistically evaluating precision of search engines on the Web. Search engines are evaluated in two steps based on a large number of sample queries: (a) computing relevance scores of hits from each search engine, and (b) ranking the search engines ba ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this paper, we present a general approach for statistically evaluating precision of search engines on the Web. Search engines are evaluated in two steps based on a large number of sample queries: (a) computing relevance scores of hits from each search engine, and (b) ranking the search engines based on statistical comparison of the relevance scores. In computing relevance scores of hits, we study four relevance scoring algorithms. Three of them are variations of algorithms widely used in the traditional information retrieval field. They are cover density ranking, Okapi similarity measurement, and vector space model algorithms. In addition, we develop a new three-level scoring algorithm to mimic commonly used manual approaches. In ranking the search engines in terms of precision, we apply a statistical metric called probability of win. In our experiments, six popular search engines, AltaVista, Fast, Google, Go, iWon, and NorthernLight, were evaluated based on queries from two domains of interest: parallel and distributed processing, and knowledge and data engineering. The first query set contains 1726 queries collected from the index terms of papers published in the IEEE Transactions on Knowledge and Data Engineering. The second set contains 1383 queries collected from the index terms of papers published in the IEEE Transactions on Parallel and Distributed Systems. Search engines were queried and compared in two different search modes: the default search mode and the exact phrase search mode. Our experimental results show that these six search engines performed differently under different search modes and scoring methods. Overall, Google was the best. NorthernLight was mostly second in the default search mode, whereas iWon was mostly second in the exact phrase search mode.
Information retrieval effectiveness of Turkish search engines
- Advances in Information Systems: Second International Conference, ADVIS 2002, İzmir, Turkey, October 23-25, 2002, Proceedings
, 2002
"... Abstract. This is an investigation of information retrieval performance of Turkish search engines with respect to precision, normalized recall, coverage and novelty ratios. We defined seventeen query topics for Arabul, Arama, Netbul and Superonline. These queries were carefully selected to assess th ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. This is an investigation of information retrieval performance of Turkish search engines with respect to precision, normalized recall, coverage and novelty ratios. We defined seventeen query topics for Arabul, Arama, Netbul and Superonline. These queries were carefully selected to assess the capability of a search engine for handling broad or narrow topic subjects, exclusion of particular information, identifying and indexing Turkish characters, retrieval of hub/authoritative pages, stemming of Turkish words, correct interpretation of Boolean operators. We classified each document in a retrieval output as being ”relevant ” or ”nonrelevant ” to calculate precision and normalized recall ratios at various cut-off points for each pair of query topic and search engine. We found the coverage and novelty ratios for each search engine. We also tested how search engines handle meta-tags and dead links. Arama appears to be the best Turkish search engine in terms of average precision and normalized recall ratios, and the coverage of Turkish sites. Turkish characters (and stemming as well) still cause bottlenecks for Turkish search engines. Superonline and Netbul make use of the indexing information in metatag fields to improve retrieval results. 1
Evaluation of Web search engines and the search for better ranking algorithms
- SIGIR99 Workshop on Evaluation of Web Retrieval
, 1999
"... In this paper I will discuss two topics: The evaluation of Web search engines and the search for better ranking algorithms. In the rst section I show the results from two relevance experiments. One is a comparison between three Web search engines, and the other is a comparison between two sizes of t ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper I will discuss two topics: The evaluation of Web search engines and the search for better ranking algorithms. In the rst section I show the results from two relevance experiments. One is a comparison between three Web search engines, and the other is a comparison between two sizes of the number of indexed documents in a search engine. In the second section I discuss some ideas of how to enhance the ranking algorithms in a Web search engine. 1 Evaluation of Web search engines 1.1 Background Web search engines have their ancestors in the information retrieval (IR) systems developed during the last fty years. IR methods include (among others) the Boolean search methods, the vector space methods, the probabilistic methods, and the clustering methods [1]. All these methods aim at nding the relevant documents for a given query. For evaluating such systems, recall (the number of relevant retrieved documents divided by the number of relevant documents) and precision (the num...

