Results 1  10
of
12
Blocklevel Link Analysis
 In SIGIR
, 2004
"... Link Analysis has shown great potential in improving the performance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web page as a single node in the web graph. However, in most cases, a web page contains multiple semanti ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
Link Analysis has shown great potential in improving the performance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web page as a single node in the web graph. However, in most cases, a web page contains multiple semantics and hence the web page might not be considered as the atomic node. In this paper, the web page is partitioned into blocks using the visionbased page segmentation algorithm. By extracting the pagetoblock, blocktopage relationships from link structure and page layout analysis, we can construct a semantic graph over the WWW such that each node exactly represents a single semantic topic. This graph can better describe the semantic structure of the web. Based on blocklevel link analysis, we proposed two new algorithms, Block Level PageRank and Block Level HITS, whose performances we study extensively using web data.
Finding community structure in megascale social networks. ArXiv Physics eprints. cs/0702048v1
, 2007
"... Community analysis algorithm proposed by Clauset, Newman, and Moore (CNM algorithm) finds community structure in social networks. Unfortunately, CNM algorithm does not scale well and its use is practically limited to networks whose sizes are up to 500,000 nodes. The paper identifies that this ineffi ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
Community analysis algorithm proposed by Clauset, Newman, and Moore (CNM algorithm) finds community structure in social networks. Unfortunately, CNM algorithm does not scale well and its use is practically limited to networks whose sizes are up to 500,000 nodes. The paper identifies that this inefficiency is caused from merging communities in unbalanced manner. The paper introduces three kinds of metrics (consolidation ratio) to control the process of community analysis trying to balance the sizes of the communities being merged. Three flavors of CNM algorithms are built incorporating those metrics. The proposed techniques are tested using data sets obtained from existing social networking service that hosts 5.5 million users. All the methods exhibit dramatic improvement of execution efficiency in comparison with the original CNM algorithm and shows high scalability. The fastest method processes a network with 1 million nodes in 5 minutes and a network with 4 million nodes in 35 minutes, respectively. Another one processes a network with 500,000 nodes in 50 minutes (7 times faster than the original algorithm), finds community structures that has improved modularity, and scales to a network with 5.5 million.
NavigationAided Retrieval
 IN PROCEEDINGS OF THE 16TH INTERNATIONAL WORLD WIDE WEB CONFERENCE (WWW07
, 2007
"... Users searching for information in hypermedia environments often perform querying followed by manual navigation. Yet, the conventional text/hypertext retrieval paradigm does not take postquery navigation into account. This paper proposes a new retrieval paradigm, called navigationaided retrieval ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Users searching for information in hypermedia environments often perform querying followed by manual navigation. Yet, the conventional text/hypertext retrieval paradigm does not take postquery navigation into account. This paper proposes a new retrieval paradigm, called navigationaided retrieval (NAR), which treats both querying and navigation as firstclass activities. In the NAR paradigm, querying is seen as a means to identify starting points for navigation, and navigation is guided based on information supplied in the query. NAR is a generalization of the conventional probabilistic information retrieval paradigm, which implicitly assumes no navigation takes place. This paper
A Unified Framework for Web Link Analysis
 In Proceedings of the 3rd International Conference on Web Information Systems Engineering
, 2002
"... Web link analysis has been proved to provide significant enhancement to the precision of web search in practice. Among existing approaches, Kleinberg's HITS and Google's PageRank are the two most representative algorithms that employ explicit hyperlinks structure among web pages to conduct link anal ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Web link analysis has been proved to provide significant enhancement to the precision of web search in practice. Among existing approaches, Kleinberg's HITS and Google's PageRank are the two most representative algorithms that employ explicit hyperlinks structure among web pages to conduct link analysis, and DirectHit represents the other extreme that takes the user's access frequency as implicit link to the web page for counting its importance. In this paper, we propose a novel link analysis algorithm which puts both explicit and implicit link structures under a unified framework, and show that HITS and DirectHit are essentially the two extreme instances of our proposed method. One important advantage of our method is its ability to analyze not only the hyperlinks between webpages but also the interactions between the users and the Web at the same time. The importance of webpages and users can reinforce each other to improve the Web link analysis. Compared with traditional HITS and DirectHit algorithms, our method further improves the search precision by 11.8% and 25.3%.
Mining web informative structures and contents based on entropy analysis
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2004
"... In this paper, we study the problem of mining the informative structure of a news Web site that consists of thousands of hyperlinked documents. We define the informative structure of a news Web site as a set of index pages (or referred to as TOC, i.e., ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
In this paper, we study the problem of mining the informative structure of a news Web site that consists of thousands of hyperlinked documents. We define the informative structure of a news Web site as a set of index pages (or referred to as TOC, i.e.,
Authority rankings from hits, pagerank, and salsa: Existence, uniqueness, and effect of initialization
 SIAM Journal on Scientific Computing
, 2006
"... Abstract. Algorithms such as Kleinberg’s HITS algorithm, the PageRank algorithm of Brin and Page, and the SALSA algorithm of Lempel and Moran use the link structure of a network of webpages to assign weights to each page in the network. The weights can then be used to rank the pages as authoritative ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Abstract. Algorithms such as Kleinberg’s HITS algorithm, the PageRank algorithm of Brin and Page, and the SALSA algorithm of Lempel and Moran use the link structure of a network of webpages to assign weights to each page in the network. The weights can then be used to rank the pages as authoritative sources. These algorithms share a common underpinning; they find a dominant eigenvector of a nonnegative matrix that describes the link structure of the given network and use the entries of this eigenvector as the page weights. We use this commonality to give a unified treatment, proving the existence of the required eigenvector for the PageRank, HITS, and SALSA algorithms, the uniqueness of the PageRank eigenvector, and the convergence of the algorithms to these eigenvectors. However, we show that the HITS and SALSA eigenvectors need not be unique. We examine how the initialization of the algorithms affects the final weightings produced. We give examples of networks that lead the HITS and SALSA algorithms to return nonunique or nonintuitive rankings. We characterize all such networks, in terms of the connectivity of the related HITS authority graph. We propose a modification, Exponentiated Input to HITS, to the adjacency matrix input to the HITS algorithm. We prove that Exponentiated Input to HITS returns a unique ranking, so long as the network is weakly connected. Our examples also show that SALSA can give inconsistent hub and authority weights, due to nonuniqueness. We also mention a small modification to the SALSA initialization which makes the hub and authority weights consistent.
Evaluating ObjectOriented Designs with Link Analysis
 IN PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE'2004
, 2004
"... The Hyperlink Induced Topic Search algorithm, which is a method of link analysis, primarily developed for retrieving information from the Web, is extended in this paper, in order to evaluate one aspect of quality in an objectoriented model. Considering the number of discrete messages exchanged betw ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
The Hyperlink Induced Topic Search algorithm, which is a method of link analysis, primarily developed for retrieving information from the Web, is extended in this paper, in order to evaluate one aspect of quality in an objectoriented model. Considering the number of discrete messages exchanged between classes, it is possible to identify “God” classes in the system, elements which imply a poorly designed model. The principal eigenvectors of matrices derived from the adjacency matrix of a modified class diagram, are used to identify and quantify heavily loaded portions of an objectoriented design that deviate from the principle of distributed responsibilities. The nonprincipal eigenvectors are also employed in order to identify possible reusable components in the system. The methodology can be easily automated as illustrated by a Java program that has been developed for this purpose.
Mathematical Assessment of ObjectOriented Design Quality
 IN OOPSLA '91: PROCEEDINGS OF THE 6TH ANNUAL ACM SIGPLAN CONFERENCE ON OBJECTORIENTED PROGRAMMING, SYSTEMS, LANGUAGES, AND APPLICATIONS
, 2003
"... A method of link analysis employed for retrieving information from the Web is extended in order to evaluate one aspect of quality in an objectoriented model. The principal eigenvectors of matrices derived from the adjacency matrix of a modified class diagram are used to identify and quantify heavi ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
A method of link analysis employed for retrieving information from the Web is extended in order to evaluate one aspect of quality in an objectoriented model. The principal eigenvectors of matrices derived from the adjacency matrix of a modified class diagram are used to identify and quantify heavily loaded portions of an objectoriented design that deviate from the principle of distributed responsibilities.
Recommendation as searching without queries: A new hybrid method for recommendation
, 2005
"... Abstract: The paper describes RankFeed a new adaptive method of recommendation that benefits from similarities between searching and recommendation. Concepts such as: the initial ranking, the positive and negative feedback widely used in searching are applied to recommendation in order to enhance it ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract: The paper describes RankFeed a new adaptive method of recommendation that benefits from similarities between searching and recommendation. Concepts such as: the initial ranking, the positive and negative feedback widely used in searching are applied to recommendation in order to enhance its coverage, maintaining high accuracy. There are four principal factors that determine the method’s behaviour: the quality document ranking, navigation patterns, textual similarity and the list of recommended pages that have been ignored during the navigation. In the evaluation part, the local site’s behaviour of the RankFeed ranking is contrasted with PageRank. Additionally, recommendation behaviour of RankFeed versus other classical approaches is evaluated.
A Maximum Entropy Framework for Higher Order Link Analysis on Directed Graphs
 In Workshop on Link Analysis for Detecting Complex Behavior (LinkKDD
, 2003
"... Linkanalysis based techniques for ranking of the vertices of a directed graph have been widely studied in the social networks and bibliometrics communities. More recently, they have been popularized in the context of web graphs by the Pagerank [1] and HITS [2] algorithms, both of which under approp ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Linkanalysis based techniques for ranking of the vertices of a directed graph have been widely studied in the social networks and bibliometrics communities. More recently, they have been popularized in the context of web graphs by the Pagerank [1] and HITS [2] algorithms, both of which under appropriate normalization correspond to dierent random walk (or \sur ng") models. Pagerank is maximally local in the sense that its equivalent surfer ignores the links of the surrounding vertices, whereas the corresponding sur ng model for normalized HITS is a secondorder model, as its behaviour is independent of the rest of the graph, given the vertices a single hop away. In this paper we propose a way of generalizing these strategies by taking into account nonlocal eects of higher order, while remaining computationally ef cient. The need for such an extension is motivated by the fact that Pagerank and HITS have complementary biases, Pagerank can only take advantage of direct endorsement, whereas HITS can only identify closeknit structures. The approach leads to a series of parameterized schemes, where the value of the parameter determines the weights assigned to the neighbouring vertices, connected either by forward or by backward links (edges). The parametric form allows us to select its value to optimize some desirable quality. Access to \correct" rankings would allow nding the optimal value of the parameter in a supervised learning setup. Typically, however, such data is dicult to come by, so we propose and provide solutions for two optimization criteria (i) maximum entropy, which is motivated by the desire to make minimal extra assumptions and (ii) maximum stability. The framework and techniques developed in this paper apply to a wide range of networks. We empiricall...