Results 11 - 20
of
62
Effective Site Finding using Link Anchor Information
, 2001
"... Link-based ranking methods have been described in the literature and applied in commercial Web search engines. However, according to recent TREC experiments, they are no better than traditional content-based methods. We conduct a different type of experiment, in which the task is to find the main en ..."
Abstract
-
Cited by 108 (14 self)
- Add to MetaCart
Link-based ranking methods have been described in the literature and applied in commercial Web search engines. However, according to recent TREC experiments, they are no better than traditional content-based methods. We conduct a different type of experiment, in which the task is to find the main entry point of a specific Web site. In our experiments, ranking based on link anchor text is twice as effective as ranking based on document content, even though both methods used the same BM25 formula. We obtained these results using two sets of 100 queries on a 18.5 million docu- ment set and another set of 100 on a 0.4 million document set. This site finding effectiveness begins to explain why many search engines have adopted link methods. It also opens a rich new area for effectiveness improvement, where traditional methods fail.
Information retrieval on the Web
- ACM Computing Surveys
, 2000
"... In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical ..."
Abstract
-
Cited by 58 (0 self)
- Add to MetaCart
In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical figures vary, overall trends cited
A survey on pagerank computing
- Internet Mathematics
, 2005
"... Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. T ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRank and the role of such indices in applications other than web search. We also discuss linkbased search personalization and outline some aspects of PageRank infrastructure from associated measures of convergence to link preprocessing. 1.
Constructing, Organizing, and Visualizing Collections of Topically Related Web Resources
- ACM Transactions on Computer-Human Interaction
, 1999
"... For many purposes, the Web page is too small a unit of interaction and analysis. Web sites are structured multimedia documents consisting of many pages, and users often are interested in obtaining and evaluating entire collections of topically related sites. Once such a collection is obtained, users ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
For many purposes, the Web page is too small a unit of interaction and analysis. Web sites are structured multimedia documents consisting of many pages, and users often are interested in obtaining and evaluating entire collections of topically related sites. Once such a collection is obtained, users face the challenge of exploring, comprehending, and organizing the items. We report four innovations that address these user needs. . We replaced the web page with the web site as the basic unit of interaction and analysis. . We defined a new information structure, the clan graph, that groups together sets of related sites. . We augment the representation of a site with a site profile, information about site structure and content that helps inform user evaluation of a site. . We invented a new graph visualization, the auditorium visualization, that reveals important structural and content properties of sites within a clan graph. Detailed analysis and user studies document the utility o...
Spectral Filtering for Resource Discovery
, 1998
"... We develop a technique we call spectral filtering, for discovering high-quality topical resources in hyperlinked corpora. Through relevance and quality judgements collected from 37 users, we show that, over 26 topics, spectral filtering usually finds web pages that are rated better than those return ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
We develop a technique we call spectral filtering, for discovering high-quality topical resources in hyperlinked corpora. Through relevance and quality judgements collected from 37 users, we show that, over 26 topics, spectral filtering usually finds web pages that are rated better than those returned by the hand-compiled Yahoo! resource list, and by the Altavista search engine.
An Empirical Evaluation of User Interfaces for Topic Management of Web Sites
, 1999
"... Topic management is the task of gathering, evaluating, organizing, and sharing a set of web sites for a specific topic. Current web tools do not provide adequate support for this task. We created the TopicShop system to address this need. TopicShop includes (1) a webcrawler that discovers relevant w ..."
Abstract
-
Cited by 25 (11 self)
- Add to MetaCart
Topic management is the task of gathering, evaluating, organizing, and sharing a set of web sites for a specific topic. Current web tools do not provide adequate support for this task. We created the TopicShop system to address this need. TopicShop includes (1) a webcrawler that discovers relevant web sites and builds site profiles, and (2) user interfaces for exploring and organizing sites. We conducted an empirical study comparing user performance with TopicShop vs. Yahoo. TopicShop subjects found over 80% more high-quality sites (where quality was determined by independent expert judgements) while browsing only 81% as many sites and completing their task in 89% of the time. The site profile data that TopicShop provides -- in particular, the number of pages on a site and the number of other sites that link to it -- was the key to these results, as users exploited it to identify the most promising sites quickly and easily. KEYWORDS information access, information retrieval, informat...
Link Analysis in Web Information Retrieval
- IEEE DATA ENGINEERING BULLETIN
, 2000
"... The analysis of the hyperlink structure of the web has led to significant improvements in web information retrieval. This survey describes two successful link analysis algorithms and the state-of-the art of the field. ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
The analysis of the hyperlink structure of the web has led to significant improvements in web information retrieval. This survey describes two successful link analysis algorithms and the state-of-the art of the field.
Collection Synthesis
, 2002
"... The invention of the hyperlink and the HTTP transmission protocol caused an amazing new structure to appear on the Internet -- the World Wide Web. With the Web, there came spiders, robots, and Web crawlers, which go from one link to the next checking Web health, ferreting out information and resourc ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
The invention of the hyperlink and the HTTP transmission protocol caused an amazing new structure to appear on the Internet -- the World Wide Web. With the Web, there came spiders, robots, and Web crawlers, which go from one link to the next checking Web health, ferreting out information and resources, and imposing organization on the huge collection of information (and dross) residing on the net. This paper reports on the use of one such crawler to synthesize document collections on various topics in science, mathematics, engineering and technology. Such collections could be part of a digital library.

