#### DMCA

## RankClus: Integrating clustering with ranking for heterogeneous information network analysis

Venue: | In EDBT’09 |

Citations: | 60 - 25 self |

### Citations

4583 | The anatomy of a large-scale hypertextual web search engine
- Brin, Page
- 1998
(Show Context)
Citation Context ...hat mathematically demonstrates characteristics of objects. With such functions, any two objects of the same type can be compared, either qualitatively or quantitatively, in a partial order. PageRank =-=[2]-=- and HITS [11], among others, are perhaps the most renowned ranking algorithms over information networks. On the other hand, clustering groups objects based on a certain proximity measure so that simi... |

3727 | Normalized cuts and image segmentation
- Shi, Malik
- 2000
(Show Context)
Citation Context ...od ranking, but how to get good clusters? A straightforward way is to first evaluate similarity between objects using a link-based method, such as SimRank [9], and then apply graph clustering methods =-=[15, 12]-=- or the like to generate clusters. However, to evaluate similarity between objects in an arbitrary multi-typed information network is a difficult and time-consuming task. Instead, we propose RankClus ... |

3582 | Authoritative sources in a hyperlinked environment
- Kleinberg
- 1999
(Show Context)
Citation Context ...cally demonstrates characteristics of objects. With such functions, any two objects of the same type can be compared, either qualitatively or quantitatively, in a partial order. PageRank [2] and HITS =-=[11]-=-, among others, are perhaps the most renowned ranking algorithms over information networks. On the other hand, clustering groups objects based on a certain proximity measure so that similar objects ar... |

771 | Scatter/Gather: a cluster-based approach to browsing large document collections
- Cutting, Karger, et al.
- 1992
(Show Context)
Citation Context ...is improved. What is more, ranking results can thus be enhanced further by these high quality clusters. In all, instead of combining ranking and clustering in a two stage procedure like facet ranking =-=[3, 18]-=-, the quality of clustering and ranking can be mutually enhanced in RankClus. In this paper, we propose RankClus, a novel framework that smoothly integrates clustering and ranking. Given a user-specif... |

706 |
An index to quantify an individual’s scientific research output
- Hirsch
- 2005
(Show Context)
Citation Context ...will calculate the rank for each type of objects seperately (i.e., we do not compare ranks of two objects belonging to different types), rather than consider them in a unified framework. J. E. Hirsch =-=[8]-=- proposed h index originally in the area of physics for characterizing the scientific output of a researcher, which is defined as the number of papers with citation number higher or equal to h. Extens... |

678 | A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and
- Bilmes
- 1998
(Show Context)
Citation Context ...uthors with many authors or many highly ranked authors. Using this new rule, we can revise Eqs. (2) as m∑ n∑ ⃗rY (i) = α WY X(i, j)⃗rX(j) + (1 − α) WY Y (i, j)⃗rY (j). j=1 j=1 (5) where parameter α ∈ =-=[0, 1]-=- determines how much weight to put on each factor based on one’s belief. Similarly, we can prove that ⃗rY should be the primary eigenvector of αWY XWXY + (1 − α)WY Y , and ⃗rX should be the primary ei... |

400 | Combating web spam with trustrank
- Gyöngyi, Garcia-Molina, et al.
- 2004
(Show Context)
Citation Context ...ample, authority ranking can be spammed by some bogus conferences that accept any submit papers due to their huge publication number. Techniques that could best use expert knowledge such as TrustRank =-=[7]-=- could be used, which can semiautomatically separate reputable, good objects from spam ones, toward a robust ranking scheme. 5. THE RANKCLUS ALGORITHM In this section, we introduce RankClus algorithm ... |

370 | Simrank: a measure of structural-context similarity
- Jeh, Widom
- 2002
(Show Context)
Citation Context ... cluster. Obviously, good clusters promote good ranking, but how to get good clusters? A straightforward way is to first evaluate similarity between objects using a link-based method, such as SimRank =-=[9]-=-, and then apply graph clustering methods [15, 12] or the like to generate clusters. However, to evaluate similarity between objects in an arbitrary multi-typed information network is a difficult and ... |

300 | Grouper: a dynamic clustering interface to web search results. Computer Networks
- Zamir, Etzioni
- 1999
(Show Context)
Citation Context ...is improved. What is more, ranking results can thus be enhanced further by these high quality clusters. In all, instead of combining ranking and clustering in a two stage procedure like facet ranking =-=[3, 18]-=-, the quality of clustering and ranking can be mutually enhanced in RankClus. In this paper, we propose RankClus, a novel framework that smoothly integrates clustering and ranking. Given a user-specif... |

222 |
A tutorial on spectral clustering
- Luxburg
(Show Context)
Citation Context ...od ranking, but how to get good clusters? A straightforward way is to first evaluate similarity between objects using a link-based method, such as SimRank [9], and then apply graph clustering methods =-=[15, 12]-=- or the like to generate clusters. However, to evaluate similarity between objects in an arbitrary multi-typed information network is a difficult and time-consuming task. Instead, we propose RankClus ... |

100 | Object-level ranking: Bringing order to web objects
- Nie, Zhang, et al.
- 2005
(Show Context)
Citation Context ...rinsic meaning of our ranking methods. However, both PageRank and HITS are designed on the network of web pages, which is a directed homogeneous network, and the weight of the edge is binary. PopRank =-=[13]-=- aims at ranking popularity of web objects. They have considered the role difference of different web pages, and thus turn web pages into a heterogeneous network. They trained the propagation factor b... |

46 | Generalized h-index for Disclosing Latent Facts in Citation Networks. 13, 2006. eprint: cs.DL/0607066. url: http://arxiv.org/abs/cs.DL/0607066
- Sidiropoulos, Katsaros, et al.
(Show Context)
Citation Context ...h index originally in the area of physics for characterizing the scientific output of a researcher, which is defined as the number of papers with citation number higher or equal to h. Extensions work =-=[16]-=- shows that it also can work well in computer science area. However, h-index will assign an integer value h to papers, authors, and publication forums, while our work requires that rank sores can be v... |

38 | LinkClus: Efficient Clustering via Heterogeneous Semantic Links
- Yin, Han, et al.
- 2006
(Show Context)
Citation Context ...d be applied first, which is an iterative PageRank-like method for computing structural similarity between objects. However, the time cost for SimRank is very high, and other methods such as LinkClus =-=[17]-=- have addressed this issue. Without calculating the pairwise similarity between two objects of the same type, RankClus uses conditional ranking as the measure of clusters, and only needs to calculate ... |

10 | Knowledge discovery from transportation network data
- Jiang, Vaidya, et al.
(Show Context)
Citation Context ...ific set of components, forming large, interconnected, and sophisticated networks. We call such interconnected networks as information networks, with examples including the Internet, highway networks =-=[10]-=-, electrical power grids, research collaboration networks [6], public health systems, biological networks [14], and so on. Clearly, information networks are ubiquitous and form a critical component of... |

9 |
Handbook of computational statistics: Concepts and methods, Metrika 67(2
- Gentle, Härdle, et al.
- 2008
(Show Context)
Citation Context ... we get WXY ⃗rY ⃗rX = ‖WXY ⃗rY ‖ = WXY WY X⃗r X ‖WXY ‖WY X⃗r X ‖ W = Y X⃗r X ‖ ‖WY X⃗r X ‖ WXY WY X⃗rX ‖WXY WY X⃗rX‖ Thus, ⃗rX is the eigenvector of WXY WY X. The iterative method is the power method =-=[5]-=- to calculate the eigenvector, which is the primary eigenvector. Similarly, ⃗rY is the primary eigenvector of WY XWXY . When considering the co-author information, the scoring function can be further ... |

8 |
The dblp computer science bibliography, http://www.informatik.unitrier.de/∼ley/db
- DBLP
(Show Context)
Citation Context ... of (1) DB/DM (i.e., Database and Data Mining) and HW/CA (i.e., Hardware and Computer Architecture), each having 10 conferences, as shown in Table 1. Then we choose 100 authors in each area from DBLP =-=[4]-=-. With the ranking function specified in Sec. 4.2, our ranking-only algorithm gives top-10 ranked results (Table 2). Clearly, the results are rather dumb (because of the mixture of the areas) and are ... |

8 | Integrative construction and analysis of condition-specific biological networks
- Roy, Lane, et al.
(Show Context)
Citation Context ... networks as information networks, with examples including the Internet, highway networks [10], electrical power grids, research collaboration networks [6], public health systems, biological networks =-=[14]-=-, and so on. Clearly, information networks are ubiquitous and form a critical component of modern information infrastructure. Among them, heterogeneous network is a special type of network that contai... |

5 |
The future of citeseer
- Giles
(Show Context)
Citation Context ...histicated networks. We call such interconnected networks as information networks, with examples including the Internet, highway networks [10], electrical power grids, research collaboration networks =-=[6]-=-, public health systems, biological networks [14], and so on. Clearly, information networks are ubiquitous and form a critical component of modern information infrastructure. Among them, heterogeneous... |