Results 11 - 20
of
112
PageRank, HITS and a Unified Framework for Link Analysis
"... Two popular webpage ranking algorithms are HITS and PageRank. HITS emphasizes mutual reinforcement between authority and hub webpages, while PageRank emphasizes hyperlink weight normalization and web surfing based on random walk models. We systematically generalize/combine these concepts into a unif ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
Two popular webpage ranking algorithms are HITS and PageRank. HITS emphasizes mutual reinforcement between authority and hub webpages, while PageRank emphasizes hyperlink weight normalization and web surfing based on random walk models. We systematically generalize/combine these concepts into a unified framework. The ranking framework contains a large algorithm space; HITS and PageRank are two extreme ends in this space. We study several normalized ranking algorithms which are intermediate between HITS and PageRank, and obtain closed-form solutions. We show that, to first order approximation, all ranking algorithms in this framework, including PageRank and HITS, lead to same ranking which is highly correlated with ranking by indegree.
Natural Communities in Large Linked Networks
, 2003
"... We are interested in finding natural communities in largescale linked networks. Our ultimate goal is to track changes over time in such communities. For such temporal tracking, we require a clustering algorithm that is relatively stable under small perturbations of the input data. We have developed ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
We are interested in finding natural communities in largescale linked networks. Our ultimate goal is to track changes over time in such communities. For such temporal tracking, we require a clustering algorithm that is relatively stable under small perturbations of the input data. We have developed an e#cient, scalable agglomerative strategy and applied it to the citation graph of the NEC CiteSeer database (250,000 papers; 4.5 million citations). Agglomerative clustering techniques are known to be unstable on data in which the community structure is not strong. We find that some communities are essentially random and thus unstable while others are natural and will appear in most clusterings. These natural communities will enable us to track the evolution of communities over time.
Data Mining
- TO APPEAR IN THE HANDBOOK OF TECHNOLOGY MANAGEMENT, H. BIDGOLI (ED.)
, 2010
"... The amount of data being generated and stored is growing exponentially, due in large part to the continuing advances in computer technology. This presents tremendous opportunities for those who can ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
The amount of data being generated and stored is growing exponentially, due in large part to the continuing advances in computer technology. This presents tremendous opportunities for those who can
Link Analysis in Web Information Retrieval
- IEEE DATA ENGINEERING BULLETIN
, 2000
"... The analysis of the hyperlink structure of the web has led to significant improvements in web information retrieval. This survey describes two successful link analysis algorithms and the state-of-the art of the field. ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
The analysis of the hyperlink structure of the web has led to significant improvements in web information retrieval. This survey describes two successful link analysis algorithms and the state-of-the art of the field.
Visualization of Bibliographic Networks with a Reshaped Landscape Metaphor
- PROC. 4TH JOINT EUROGRAPHICS - IEEE TVCG SYMP. VISUALIZATION (VISSYM ’02
, 2002
"... We describe a novel approach to visualize bibliographic networks that facilitates the simultaneous identification of clusters (e.g., topic areas) and prominent entities (e.g., surveys or landmark papers). While employing the landscape metaphor proposed in several earlier works, we introduce new mean ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
We describe a novel approach to visualize bibliographic networks that facilitates the simultaneous identification of clusters (e.g., topic areas) and prominent entities (e.g., surveys or landmark papers). While employing the landscape metaphor proposed in several earlier works, we introduce new means to determine relevant parameters of the landscape. Moreover, we are able to compute prominent entities, clustering of entities, and the landscape's surface in a surprisingly simple and uniform way. The effectiveness of our network visualizations is illustrated on data from the graph drawing literature.
Untangling Compound Documents on the Web
- In Proc. of the 14th ACM Conference on Hypertext and Hypermedia
, 2003
"... Most text analysis is designed to deal with the concept of a "document", namely a cohesive presentation of thought on a unifying subject. By contrast, individual nodes on the World Wide Web tend to have a much smaller granularity than text documents. We claim that the notions of "document" and "web ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Most text analysis is designed to deal with the concept of a "document", namely a cohesive presentation of thought on a unifying subject. By contrast, individual nodes on the World Wide Web tend to have a much smaller granularity than text documents. We claim that the notions of "document" and "web node" are not synonomous, and that authors often tend to deploy documents as collections of URLs, which we call "compound documents". In this paper we present new techniques for identifying and working with such compound documents, and the results of some largescale studies on such web documents. The primary motivation for this work stems from the fact that information retrieval techniques are better suited to working on documents than individual hypertext nodes.
Evaluating contents-link coupled web page clustering for web search results
- In Proc. 11th Intl. Conference on Information and Knowledge Management
, 2002
"... ..."
PicASHOW: Pictorial Authority Search by Hyperlinks on the Web
, 2001
"... We describe PicASHOW, a fully automated WWW image retrieval system that is based on several link-structure analyzing algorithms. Our basic premise is that a page # displays (or links to) an image when the author of # considers the image to be of value to the viewers of the page. Wethus extend some w ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
We describe PicASHOW, a fully automated WWW image retrieval system that is based on several link-structure analyzing algorithms. Our basic premise is that a page # displays (or links to) an image when the author of # considers the image to be of value to the viewers of the page. Wethus extend some well known link-based WWW #### ######### schemes to the context of image retrieval. PicASHOW's analysis of the link structure enables it to retrieve relevant images even when those are stored in les with meaningless names. The same analysis also allows it to identify ##### ########## and ##### ####. We dene these as Web pages that are rich in relevant images, or from which many images are readily accessible. PicASHOW requires no image analysis whatsoever and no creation of taxonomies for pre-classication of the Web's images. It can be implemented by standard WWW search engines with reasonable overhead, in terms of both computations and storage, and with no change to user query formats. It can thus be used to easily add image retrieving capabilities to standard search engines. Our results demonstrate that PicASHOW, while relying almost exclusively on link analysis, compares well with dedicated WWW image retrieval systems. We conclude that link analysis, a bona-de eective technique for Web page search, can improve the performance of Web image retrieval, as well as extend its denition to include the retrieval of image hubs and containers. Keywords Image Retrieval; Link Structure Analysis; Hubs and Authorities; Image Hubs. 1.
Toward a basic framework for webometrics
- Journal of the American Society for Information Science and Technology
, 2004
"... In this article, we define webometrics within the framework of informetric studies and bibliometrics, as belonging to library and information science, and as associated with cybermetrics as a generic subfield. We develop a consistent and detailed link typology and terminology and make explicit the d ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
In this article, we define webometrics within the framework of informetric studies and bibliometrics, as belonging to library and information science, and as associated with cybermetrics as a generic subfield. We develop a consistent and detailed link typology and terminology and make explicit the distinction among different Web node levels when using the proposed conceptual framework. As a consequence, we propose a novel diagram notation to fully appreciate and investigate link structures between Web nodes in webometric analyses. We warn against taking the analogy between citation analyses and link analyses too far.
Simfusion: measuring similarity using unified relationship matrix
- In SIGIR
, 2005
"... In this paper we use a Unified Relationship Matrix (URM) to represent a set of heterogeneous data objects (e.g., web pages, queries) and their interrelationships (e.g., hyperlinks, user clickthrough sequences). We claim that iterative computations over the URM can help overcome the data sparseness p ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
In this paper we use a Unified Relationship Matrix (URM) to represent a set of heterogeneous data objects (e.g., web pages, queries) and their interrelationships (e.g., hyperlinks, user clickthrough sequences). We claim that iterative computations over the URM can help overcome the data sparseness problem and detect latent relationships among heterogeneous data objects, thus, can improve the quality of information applications that require com-bination of information from heterogeneous sources. To support our claim, we present a unified similarity-calculating algorithm, SimFusion. By iteratively computing over the URM, SimFusion can effectively integrate relationships from heterogeneous sources when measuring the similarity of two data objects. Experiments based on a web search engine query log and a web page collection demonstrate that SimFusion can improve similarity measurement of web objects over both traditional content based algorithms and the cutting edge SimRank algorithm.

