Results 1  10
of
49
Deeper inside pagerank
 Internet Mathematics
, 2004
"... Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existe ..."
Abstract

Cited by 142 (4 self)
 Add to MetaCart
Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existence, uniqueness, and convergence properties, possible alterations to the basic model, suggested alternatives to the traditional solution methods, sensitivity and conditioning, and finally the updating problem. We introduce a few new results, provide an extensive reference list, and speculate about exciting areas of future research. 1.
Learning from Labeled and Unlabeled Data with Label Propagation
, 2002
"... We investigate the use of unlabeled data to help labeled data in classification. We propose a simple iterative algorithm, label propagation, to propagate labels through the dataset along high density areas defined by unlabeled data. We give the analysis of the algorithm, show its solution, and its c ..."
Abstract

Cited by 113 (0 self)
 Add to MetaCart
We investigate the use of unlabeled data to help labeled data in classification. We propose a simple iterative algorithm, label propagation, to propagate labels through the dataset along high density areas defined by unlabeled data. We give the analysis of the algorithm, show its solution, and its connection to several other algorithms. We also show how to learn parameters by minimum spanning tree heuristic and entropy minimization, and the algorithm's ability to do feature selection. Experiment results are promising.
Stable Algorithms for Link Analysis
, 2001
"... The Kleinberg HITS and the Google PageRank algorithms are eigenvector methods for identifying "authoritative" or "influential" articles, given hyperlink or citation information. That such algorithms should give reliable or consistent answers is surely a desideratum, and in [10], we analyzed when th ..."
Abstract

Cited by 107 (1 self)
 Add to MetaCart
The Kleinberg HITS and the Google PageRank algorithms are eigenvector methods for identifying "authoritative" or "influential" articles, given hyperlink or citation information. That such algorithms should give reliable or consistent answers is surely a desideratum, and in [10], we analyzed when they can be expected to give stable rankings under small perturbations to the linkage patterns. In this paper, we extend the analysis and show how it gives insight into ways of designing stable link analysis methods. This in turn motivates two new algorithms, whose performance we study empirically using citation data and web hyperlink data.
Finding Authorities and Hubs From Link Structures on the World Wide Web
 In Proceedings of the 10th International World Wide Web Conference, Hong Kong
, 2001
"... Recently, there have been a number of algorithms proposed for analyzing hypertext link structure so as to determine the best "authorities" for a given topic or query. While such analysis is usually combined with content analysis, there is a sense in which some algorithms are deemed to be "more balan ..."
Abstract

Cited by 71 (9 self)
 Add to MetaCart
Recently, there have been a number of algorithms proposed for analyzing hypertext link structure so as to determine the best "authorities" for a given topic or query. While such analysis is usually combined with content analysis, there is a sense in which some algorithms are deemed to be "more balanced" and others "more focused". We undertake a comparative study of hypertext link analysis algorithms. Guided by some experimental queries, we propose some formal criteria for evaluating and comparing link analysis algorithms. Keywords: link analysis, web searching, hubs, authorities, SALSA, Kleinberg's algorithm, threshold, Bayesian. 1
The Second Eigenvalue of the Google Matrix
, 2003
"... We determine analytically the modulus of the second eigenvalue for the web hyperlink matrix used by Google for computing PageRank. Specifically, we prove the following statement: "For any matrix A = [cP + (1 , where P is an n n rowstochastic matrix, E is a nonnegative nn rankone rowst ..."
Abstract

Cited by 70 (8 self)
 Add to MetaCart
We determine analytically the modulus of the second eigenvalue for the web hyperlink matrix used by Google for computing PageRank. Specifically, we prove the following statement: "For any matrix A = [cP + (1 , where P is an n n rowstochastic matrix, E is a nonnegative nn rankone rowstochastic matrix, and 0 1, the second eigenvalue of A has modulus #2  # c. Furthermore, if P has at least two irreducible closed subsets, the second eigenvalue #2 = c." This statement has implications for the convergence rate of the standard PageRank algorithm as the web scales, for the stability of PageRank to perturbations to the link structure of the web, for the detection of Google spammers, and for the design of algorithms to speed up PageRank.
A survey of eigenvector methods of web information retrieval
 SIAM Rev
"... Abstract. Web information retrieval is significantly more challenging than traditional wellcontrolled, small document collection information retrieval. One main difference between traditional information retrieval and Web information retrieval is the Web’s hyperlink structure. This structure has bee ..."
Abstract

Cited by 66 (6 self)
 Add to MetaCart
Abstract. Web information retrieval is significantly more challenging than traditional wellcontrolled, small document collection information retrieval. One main difference between traditional information retrieval and Web information retrieval is the Web’s hyperlink structure. This structure has been exploited by several of today’s leading Web search engines, particularly Google and Teoma. In this survey paper, we focus on Web information retrieval methods that use eigenvector computations, presenting the three popular methods of HITS, PageRank, and SALSA.
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 66 (10 self)
 Add to MetaCart
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
A survey on pagerank computing
 Internet Mathematics
, 2005
"... Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. T ..."
Abstract

Cited by 64 (0 self)
 Add to MetaCart
Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRank and the role of such indices in applications other than web search. We also discuss linkbased search personalization and outline some aspects of PageRank infrastructure from associated measures of convergence to link preprocessing. 1.
Link Mining: A Survey
 SigKDD Explorations Special Issue on Link Mining
, 2005
"... Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a singleobject type and link type, or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly oth ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a singleobject type and link type, or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly other semantic information). Examples of homogeneous networks include single mode social networks, such as people connected by friendship links, or the WWW, a collection of linked web pages. Examples of heterogeneous networks include those in medical domains describing patients, diseases, treatments and contacts, or in bibliographic domains describing publications, authors, and venues. Link mining refers to data mining techniques that explicitly consider these links when building predictive or descriptive models of the linked data. Commonly addressed link mining tasks include object ranking, group detection, collective classification, link prediction and subgraph discovery. While network analysis has been studied in depth in particular areas such as social network analysis, hypertext mining, and web analysis, only recently has there been a crossfertilization of ideas among these different communities. This is an exciting, rapidly expanding area. In this article, we review some of the common emerging themes. 1.
Natural Communities in Large Linked Networks
, 2003
"... We are interested in finding natural communities in largescale linked networks. Our ultimate goal is to track changes over time in such communities. For such temporal tracking, we require a clustering algorithm that is relatively stable under small perturbations of the input data. We have developed ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
We are interested in finding natural communities in largescale linked networks. Our ultimate goal is to track changes over time in such communities. For such temporal tracking, we require a clustering algorithm that is relatively stable under small perturbations of the input data. We have developed an e#cient, scalable agglomerative strategy and applied it to the citation graph of the NEC CiteSeer database (250,000 papers; 4.5 million citations). Agglomerative clustering techniques are known to be unstable on data in which the community structure is not strong. We find that some communities are essentially random and thus unstable while others are natural and will appear in most clusterings. These natural communities will enable us to track the evolution of communities over time.