Results 1 
2 of
2
Modelbased clustering and visualization of navigation patterns on a web site
 Data Mining and Knowledge Discovery
, 2003
"... We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we rst partition site users into clusters such that users with similar navigation paths through th ..."
Abstract

Cited by 70 (0 self)
 Add to MetaCart
We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we rst partition site users into clusters such that users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach weemployis modelbased (as opposed to distancebased) and partitions users according to the order in which they request web pages. In particular, we cluster users by learning a mixture of rstorder Markov models using the ExpectationMaximization algorithm. The runtime of our algorithm scales linearly with the number of clusters and with the size of the data � and our implementation easily handles hundreds of thousands of user sessions in memory. In the paper, we describe the details of our method and a visualization tool based on it called WebCANVAS. We illustrate the use of our approach on usertra c data from msnbc.com. Keywords: Modelbased clustering, sequence clustering, data visualization, Internet, web 1
Densitybased indexing for approximate nearestneighbor queries
 In Proc. KDD
, 1999
"... We consider the problem of performing Nearestneighbor queries efficiently over large highdimensional databases. To avoid a full database scan, we target constructing a multidimensional index structure. It is wellaccepted that traditional database indexing algorithms fail for highdimensional data ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
(Show Context)
We consider the problem of performing Nearestneighbor queries efficiently over large highdimensional databases. To avoid a full database scan, we target constructing a multidimensional index structure. It is wellaccepted that traditional database indexing algorithms fail for highdimensional data (say d> 10 or 20 depending on the scheme). Some arguments have advocated that nearestneighbor queries do not even make sense for highdimensional data. We show that these arguments are based on overrestrictive assumptions, and that in the general case it is meaningful and possible to build an index for such queries. Our approach, called DBIN, scales to highdimensional databases by exploiting statistical properties of the data. The approach is based on statistically modeling the density of the content of the data table. DBIN uses the density model to derive a single index over the data table and requires physically rewriting data in a new table sorted by the newly created index (i.e. create a clusteredindex). The indexing scheme produces a mapping between a query point (a data record) and an ordering on the clustered index values. Data is then scanned according to the index. We present theoretical and empirical justification for DBIN. The scheme supports a family of distance functions which includes the traditional Euclidean distance measure. 1