Results 1 - 10
of
40
Topic-Sensitive PageRank
, 2002
"... In the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search results, we pr ..."
Abstract
-
Cited by 321 (10 self)
- Add to MetaCart
In the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. By using these (precomputed) biased PageRank vectors to generate query-specific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. For ordinary keyword search queries, we compute the topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive PageRank scores using the topic of the context in which the query appeared.
A Faster Algorithm for Betweenness Centrality
- Journal of Mathematical Sociology
, 2001
"... The betweenness centrality index is essential in the analysis of social networks, but costly to compute. Currently, the fastest known algorithms require #(n ) time and #(n ) space, where n is the number of actors in the network. ..."
Abstract
-
Cited by 169 (5 self)
- Add to MetaCart
The betweenness centrality index is essential in the analysis of social networks, but costly to compute. Currently, the fastest known algorithms require #(n ) time and #(n ) space, where n is the number of actors in the network.
Efficient Computation of PageRank
, 1999
"... This paper discusses efficient techniques for computing PageRank, a ranking metric for hypertext documents. We show that PageRank can be computed for very large subgraphs of the web (up to hundreds of millions of nodes) on machines with limited main memory. Running-time measurements on various memor ..."
Abstract
-
Cited by 118 (5 self)
- Add to MetaCart
This paper discusses efficient techniques for computing PageRank, a ranking metric for hypertext documents. We show that PageRank can be computed for very large subgraphs of the web (up to hundreds of millions of nodes) on machines with limited main memory. Running-time measurements on various memory configurations are presented for PageRank computation over the 24-million-page Stanford WebBase archive. We discuss several methods for analyzing the convergence of PageRank based on the induced ordering of the pages. We present convergence results helpful for determining the number of iterations necessary to achieve a useful PageRank assignment, both in the absence and presence of search queries.
Deeper inside pagerank
- Internet Mathematics
, 2004
"... Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existe ..."
Abstract
-
Cited by 107 (4 self)
- Add to MetaCart
Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existence, uniqueness, and convergence properties, possible alterations to the basic model, suggested alternatives to the traditional solution methods, sensitivity and conditioning, and finally the updating problem. We introduce a few new results, provide an extensive reference list, and speculate about exciting areas of future research. 1.
Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search
- IEEE Transactions on Knowledge and Data Engineering
, 2003
"... Abstract—The original PageRank algorithm for improving the ranking of search-query results computes a single vector, using the link structure of the Web, to capture the relative “importance ” of Web pages, independent of any particular search query. To yield more accurate search results, we propose ..."
Abstract
-
Cited by 96 (2 self)
- Add to MetaCart
Abstract—The original PageRank algorithm for improving the ranking of search-query results computes a single vector, using the link structure of the Web, to capture the relative “importance ” of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. For ordinary keyword search queries, we compute the topicsensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive PageRank scores using the topic of the context in which the query appeared. By using linear combinations of these (precomputed) biased PageRank vectors to generate context-specific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. We describe techniques for efficiently implementing a large-scale search system based on the topic-sensitive PageRank scheme. Index Terms—Web search, web graph, link analysis, PageRank, search in context, personalized search, ranking algorithm.
Ranking the Web Frontier
, 2004
"... The celebrated PageRank algorithm has proved to be a very effective paradigm for ranking results of web search algorithms. In this paper we refine this basic paradigm to take into account several evolving prominent features of the web, and propose several algorithmic innovations. First, we analyze f ..."
Abstract
-
Cited by 85 (0 self)
- Add to MetaCart
The celebrated PageRank algorithm has proved to be a very effective paradigm for ranking results of web search algorithms. In this paper we refine this basic paradigm to take into account several evolving prominent features of the web, and propose several algorithmic innovations. First, we analyze features of the rapidly growing "frontier" of the web, namely the part of the web that crawlers are unable to cover for one reason or another. We analyze the effect of these pages and find it to be significant. We suggest ways to improve the quality of ranking by modeling the growing presence of "link rot" on the web as more sites and pages fall out of maintenance. Finally we suggest new methods of ranking that are motivated by the hierarchical structure of the web, are more efficient than PageRank, and may be more resistant to direct manipulation.
Query-Free News Search
, 2005
"... Many daily activities present information in the form of a stream of text, and often people can benefit from additional information on the topic discussed. TV broadcast news can be treated as one such stream of text; in this paper we discuss finding news articles on the web that are relevant to news ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
Many daily activities present information in the form of a stream of text, and often people can benefit from additional information on the topic discussed. TV broadcast news can be treated as one such stream of text; in this paper we discuss finding news articles on the web that are relevant to news currently being broadcast. We evaluated a variety of algorithms for this problem, looking at the impact of inverse document frequency, stemming, compounds, history, and query length on the relevance and coverage of news articles returned in real time during a broadcast. We also evaluated several postprocessing techniques for improving the precision, including reranking using additional terms, reranking by document similarity, and filtering on document similarity. For the best algorithm, 84–91 % of the articles found were relevant, with at least 64 % of the articles being on the exact topic of the broadcast. In addition, a relevant article was found for at least 70 % of the topics.
Using ODP Metadata to Personalize Search
, 2005
"... The Open Directory Project is clearly one of the largest collaborative efforts to manually annotate web pages and export this information in RDF format. This effort involves over 65.000 editors and resulted in metadata specifying topic and importance for more than 4 million web pages. Still, given t ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
The Open Directory Project is clearly one of the largest collaborative efforts to manually annotate web pages and export this information in RDF format. This effort involves over 65.000 editors and resulted in metadata specifying topic and importance for more than 4 million web pages. Still, given that this number is just about 0.1 percent of the Web pages indexed by Google, is this effort enough to make a difference? We will start by discussing how these metadata can be used to personalize search, and then show that personalized search using ODP and other directory metadata is feasible already today. We will focus on two ways of doing this, first by directly using ODP and similar metadata, and second by biasing on and thus automatically extending these metadata to the whole web.
Learning Implicit User Interest Hierarchy for Context in Personalization
- In Proc. of International Conference on Intelligent User Interface (IUI
, 2003
"... To provide a more robust context for personalization, we desire to extract a continuum of general (long-term) to specific (short-term) interests of a user. Our proposed approach is to learn a user interest hierarchy (UIH) from a set of web pages visited by a user. We devise a divisive hierarchical c ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
To provide a more robust context for personalization, we desire to extract a continuum of general (long-term) to specific (short-term) interests of a user. Our proposed approach is to learn a user interest hierarchy (UIH) from a set of web pages visited by a user. We devise a divisive hierarchical clustering (DHC) algorithm to group words (topics) into a hierarchy where more general interests are represented by a larger set of words. Each web page can then be assigned to nodes in the hierarchy for further processing in learning and predicting interests. This approach is analogous to building a subject taxonomy for a library catalog system and assigning books to the taxonomy. Our approach does not need user involvement and learns the UIH "implicitly." Furthermore, it allows the original objects, web pages, to be assigned to multiple topics (nodes in the hierarchy). In this paper, we focus on learning the UIH from a set of visited pages. We propose a few similarity functions and dynamic threshold-funding methods, and evaluate the resulting hierarchies according to their meaningfulhess and shape.
Characterization of national Web domains
- ACM Transactions on Internet Technology
, 2005
"... During the last few years, several studies on the characterization of the public Web space of various national domains have been published. The pages of a country are an interesting set for studying the characteristics of the Web, because at the same time these are diverse (as they are written by se ..."
Abstract
-
Cited by 22 (8 self)
- Add to MetaCart
During the last few years, several studies on the characterization of the public Web space of various national domains have been published. The pages of a country are an interesting set for studying the characteristics of the Web, because at the same time these are diverse (as they are written by several authors) and yet rather similar (as they share a common geographical, historical and cultural context). This paper discusses the methodologies used for presenting the results of Web characterization studies, including the granularity at which different aspects are presented, and a separation of concerns between contents, links, and technologies. Based on this, we present a side-by-side comparison of the results of 12 Web characterization studies comprising over 120 million pages from 24 countries. The comparison unveils similarities and differences between the collections, and sheds light on how certain results of a single Web characterization study on a sample may be valid in the context of the full Web.

