Results 1 -
1 of
1
Learning to cluster web search results
- In Proc. of SIGIR ’04
, 2004
"... In web search, surfers are often faced with the problem of selecting their most wanted information from the potential huge amount of search results. The clustering of web search results is the possible solution, but the traditional content based clustering is not sufficient since it ignores many uni ..."
Abstract
-
Cited by 93 (6 self)
- Add to MetaCart
In web search, surfers are often faced with the problem of selecting their most wanted information from the potential huge amount of search results. The clustering of web search results is the possible solution, but the traditional content based clustering is not sufficient since it ignores many unique features of web pages. The link structure, authority, quality, or trustfulness of search results can play even the higher role than the actual contents of the web pages in clustering. These possible extents are reflected by Google's PageRank algorithm, HITS algorithm and etc. The main goal of this project is to integrate the authoritative information such as PageRank, link structure (e.g. in-links and out-links) into the K-Means clustering of web search results. The PageRank, inlinks and out-links can be used to extend the vector representation of web pages, and the PageRank can also be considered in the initial centroids selection, or the web page with higher PageRank influences the centroid computation to a higher degree. The relevance of this modified K-Means clustering algorithm needs to be compared to the ones obtained by the content based K-Means clustering, and the effects of different authoritative information also needs to be analyzed.

