#### DMCA

## Survey of clustering data mining techniques (2002)

### Cached

### Download Links

Citations: | 408 - 0 self |

### Citations

12416 |
Elements of Information Theory
- Cover, Thomas
- 1985
(Show Context)
Citation Context ...hat get in the same or in different clusters in each of two partitions. Hence it has O(N 2) complexity and is not always feasible. Conditional entropy of a known label s given clustering partitioning =-=[CT90]-=- H(SIJ) =-Y',./p./Y', PI./log(pl./) is another measure used. Here P.J,P.I.J are probabilities of j cluster, and conditional probabilities of s given j. Other measures are also used, for example, an F-... |

11966 | Maximum likelihood from incomplete data via the em algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...d science (Massart & Kaufman [MK83]). The classic introduction into pattern recognition framework is given in Duda & Hart [DH73]. For statistical approaches to pattern recognition see Dempster et al. =-=[DLR77]-=- and Fukunaga [Fu90]. Typical applications 2 include speech and character recognition. Machine learning clustering algorithms were applied to image segmentation and computer vision in Jain & Flynn [JF... |

10047 |
Genetic Algorithms
- Goldberg
- 1989
(Show Context)
Citation Context ...ion - surveillance monitoring of ground-based "entities" by airborne and ground-based sensors. Similar to simulating annealing is the so-called tabu search, A1-Sultan [A1S95]. Genetic Algori=-=thms (GA) [Go189]-=- are also used for clustering. An example is GGA, Genetically Guided Algorithm, for fuzzy and hard k-means by Hall et al. [HOB99]. This article can be used for further references. Sarails et al. [SZT0... |

4844 |
Pattern classification and scene analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...nes. Clustering has always been used in statistics (Arabie & Hubert [AH96]) and science (Massart & Kaufman [MK83]). The classic introduction into pattern recognition framework is given in Duda & Hart =-=[DH73]-=-. For statistical approaches to pattern recognition see Dempster et al. [DLR77] and Fukunaga [Fu90]. Typical applications 2 include speech and character recognition. Machine learning clustering algori... |

3782 |
Introduction to statistical pattern recognition, 2nd ed
- Fukunaga
- 1990
(Show Context)
Citation Context ...Kaufman [MK83]). The classic introduction into pattern recognition framework is given in Duda & Hart [DH73]. For statistical approaches to pattern recognition see Dempster et al. [DLR77] and Fukunaga =-=[Fu90]-=-. Typical applications 2 include speech and character recognition. Machine learning clustering algorithms were applied to image segmentation and computer vision in Jain & Flynn [JF96]. Clustering can ... |

2750 | R-trees: a dynamic index structure for spatial searching
- Guttman
- 1984
(Show Context)
Citation Context ... process data indices are constructed. Examples include the extension of CLARANS [EKX95] and 37 the algorithm DBSCAN [EKSX96]. Index structures used for spatial data, include KDtrees [FBF77], R-trees =-=[Gut84]-=-, R*-trees [KSSB90]. A blend of attribute transformations (DFT, Polynomials) and indexing technique is presented in [KCPM01 ]. Other indices and numerous generalizations exist [BBK98], [BKS90], [FRM94... |

2069 | Pattern Recognition with Fuzzy Objective Function Algorithms - Bezdek - 1981 |

1785 | A density-based algorithm for discovering clusters in large spatial databases with noise
- Ester, Kriegel, et al.
- 1996
(Show Context)
Citation Context ...based methods is contained in [HK01 ]. Crucial concepts here are density and connectivity. The algorithm DBSCAN (Density Based Spatial Clustering of Applications with Noise) developed by Ester et al. =-=[EKSX96]-=- is the representative of this category and targets lowdimensional spatial data. Two input parameters , MinPts, are used to define: 1) An e-neighborhood N (x) = {ysX Id(x,y )s2) A core object (a point... |

777 | Scatter/gather: A cluster-based approach to browsing large document collections.
- Cutting, Karger, et al.
- 1992
(Show Context)
Citation Context ...cular applications, many important ideas are related to specific fields. Clustering in data mining was brought to life by intense developments in information retrieval and text mining (Cutting et al. =-=[CKPT92]-=-, Steinbach et al. [SKK00], Dhillon et al. [DFG01]), spatial database applications, for example GIS, (Xu et al. [XEKS98], Sander et al. [SEKX98], Ester et al. [EFKS00]), sequence and heterogeneous dat... |

765 | Knowledge acquisition via incre-mental conceptual clustering
- Fisher
- 1987
(Show Context)
Citation Context ... K-Means Methods). The merger decision is based on minimization of its effect on the objective function. The popular hierarchical clustering algorithm for categorical data COBWEB, developed by Fisher =-=[Fis87]-=-, has two very important qualities. First, it is an example of incremental learning. Rather than following divisive or agglomerative approaches, it dynamically builds a dendrogram by processing one in... |

764 | An algorithm for finding best matches in logarithmic expected time.
- Friedman, Bentley, et al.
- 1977
(Show Context)
Citation Context ...o facilitate this process data indices are constructed. Examples include the extension of CLARANS [EKX95] and 37 the algorithm DBSCAN [EKSX96]. Index structures used for spatial data, include KDtrees =-=[FBF77]-=-, R-trees [Gut84], R*-trees [KSSB90]. A blend of attribute transformations (DFT, Polynomials) and indexing technique is presented in [KCPM01 ]. Other indices and numerous generalizations exist [BBK98]... |

724 | Automatic subspace clustering of high dimensional data for data mining applications
- Agrawal, Gehrke, et al.
- 1998
(Show Context)
Citation Context ...ions. in this section we only cover techniques that are specifically designed to work with high dimensional data. The algorithm CLIQUE (Clustering in QUEst) for numerical attributes by Agrawal et al. =-=[AGGR98]-=- contains several significant features and is fundamental in subspace clustering. It marries the ideas offsDensity-based clusteringsGrid-based clusteringsRecursive induction through dimensions similar... |

722 | CURE: An efficient clustering algorithm for large databases
- Guha, Rastogi, et al.
- 1998
(Show Context)
Citation Context ... clustering of spatial data naturally predispose to clusters of spherical shapes. Meanwhile, visual scanning of spatial images usually attests clusters with elongated or curvy appearance. Guha et al. =-=[GRS98]-=- introduced the hierarchical agglomerative clustering algorithm CURE (Clustering Using Representatives). This algorithm has a number of novel features of general significance: it takes special care wi... |

676 | Using Linear Algebra for Intelligent Information Retrieval,
- Berry
- 1995
(Show Context)
Citation Context ...oach is problematic since it leads to clusters with poor interpretability. Singular value decomposition (SVD) based techniques are used to reduce dimensionality in information retrieval (Berry et al. =-=[BDLB95]-=-, Berry & Browne [BB99]) and statistics (Fukunaga [Fu90]). Low frequency Fourier harmonics in conjunction with Parseval's theorem are successfully used in analysis of time series, Agrawal et al. [AFS9... |

567 |
Bayesian classification (AutoClass): Theory and results. In:
- Cheeseman, Stutz
- 1996
(Show Context)
Citation Context ...Clusters?" for more details on this issue. The algorithm SNOB by Wallace & Dowe [WD94] uses a mixture model in conjunction with the MML principle (see section "How Many Clusters?"). Che=-=eseman & Stutz [CS96]-=- proposed algorithm AUTOCLASS, which utilizes a mixture model and covers a broad variety of distributions, including Bernoulli, Poisson, Gaussian, and log-normal distributions. Beyond fitting a partic... |

567 | Data preparation for mining world wide web browsing patterns.
- Cooley, Mobasher, et al.
- 1999
(Show Context)
Citation Context ...se applications, for example GIS, (Xu et al. [XEKS98], Sander et al. [SEKX98], Ester et al. [EFKS00]), sequence and heterogeneous data analysis (Cadez et al. [CSM01]), Web applications (Cooley et al. =-=[CMS99]-=-, Heer & Chi [HC01], Foss et al. [FWZ01]), DNA analysis in computational biology (Ben-Dor & Yakhini [BY99]), and many others. They resulted in a large amount of applicationspecific ideas that are beyo... |

533 | Fast sub-sequence matching in time-series databases”.
- Faloutsos, Ranganathan, et al.
- 1994
(Show Context)
Citation Context ...Gut84], R*-trees [KSSB90]. A blend of attribute transformations (DFT, Polynomials) and indexing technique is presented in [KCPM01 ]. Other indices and numerous generalizations exist [BBK98], [BKS90], =-=[FRM94]-=-, [KH00], [KCP01]. The major application of such data structures is in nearest neighbors search. Preprocessing of multimedia data that is based on its embedding in Euclidean space (algorithm FastMap) ... |

526 | OPTICS: ordering points to identify the clustering structure.
- Ankerst, Breunig
- 1999
(Show Context)
Citation Context ..., there is no straightforward way to fit them to data. Moreover, different parts of data could require different parameters - the problem discussed earlier in conjunction with CnvmL[o. Ankerst et al. =-=[ABKS99]-=- suggested a way to adjust DBSCAN to this challenge. Their algorithm OPTICS (Ordering Points To identify the Clustering Structure) builds an augmented ordering of data which is consistent with DBSCAN,... |

516 | Lof: Identifying density-based local outliers.
- Breunig, Kriegel, et al.
- 2000
(Show Context)
Citation Context ... it could be improved by eliminating parameter c. They rank all the points by their distance to kth nearest neighbor and define the c fraction of points with highest ranks as outliers. Breunig et al. =-=[BKNS00]-=- made an attempt to analyze local outliers. In essence different subsets of data have different densities and are governed by different distributions. A point close to a tight cluster can be a more pr... |

515 | Efficient similarity search in sequence databases
- Agrawal, Faloutsos, et al.
- 1993
(Show Context)
Citation Context ...LB95], Berry & Browne [BB99]) and statistics (Fukunaga [Fu90]). Low frequency Fourier harmonics in conjunction with Parseval's theorem are successfully used in analysis of time series, Agrawal et al. =-=[AFS93]-=-. Keogh et al. [KCMP01 ] used wavelets and other transformations. The second technique divides the data into subsets, called canopies, using some inexpensive similarity measure, so that the high dimen... |

502 | FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia databases
- Faloutsos, Lin
- 1995
(Show Context)
Citation Context ...ing to a leaf. To mitigate this, a similar algorithm BUBBLE-FM is proposed that relies on an improvement: approximate isometric map is used. This is possible due to FastMap algorithm, Faloutsos & Lin =-=[FL95]-=-. In a context of hierarchical density-based clustering in VLDB, Breunig et al. [BKKS01] analyzed data reduction techniques such as sampling and BIRCH summarization, and noticed that they resulted in ... |

472 |
Model-based Gaussian and non-Gaussian clustering
- Banfield, Raftery
- 1993
(Show Context)
Citation Context ...Information Criterion (BIC) [Sch78], [FR98],s:tkaike's Information Criterion (AIC) [Boz83],sNon-coding Information Theoretic Criterion (ICOMP) [Boz94],sApproximate FFeight of Evidence (AWE) criterion =-=[BF93]-=-.sBayes Factors [KR95], and others [Bock96]. All these criteria are expressed through combinations of loglikelihood L, number of clusters k, number of parameters per cluster, total number of estimated... |

468 | Co-clustering documents and words using bipartite spectral graph partitioning. - Dhillon - 2001 |

451 | Clustering gene expression patterns.
- Ben-Dor, A, et al.
- 2002
(Show Context)
Citation Context ...ence and heterogeneous data analysis (Cadez et al. [CSM01]), Web applications (Cooley et al. [CMS99], Heer & Chi [HC01], Foss et al. [FWZ01]), DNA analysis in computational biology (Ben-Dor & Yakhini =-=[BY99]-=-), and many others. They resulted in a large amount of applicationspecific ideas that are beyond our scope, but also in some general techniques. These techniques and classic clustering algorithms that... |

449 | How many clusters? Which clustering method? Answers via model-based cluster analysis.
- Fraley, AE
- 1998
(Show Context)
Citation Context ...r of criteria were suggested includingsMinimum Description Length (MDL) [Ris78], [Sch78], [Ris89],sMinimum Message Length (MML) criterion [WF87], [WD94],sBayesian Information Criterion (BIC) [Sch78], =-=[FR98]-=-,s:tkaike's Information Criterion (AIC) [Boz83],sNon-coding Information Theoretic Criterion (ICOMP) [Boz94],sApproximate FFeight of Evidence (AWE) criterion [BF93].sBayes Factors [KR95], and others [B... |

446 | ROCK: A robust clustering algorithm for categorical attributes’,Information System.,
- Rastogi, Shim
- 2000
(Show Context)
Citation Context ...ts c, number of partitions, and sample size. While the algorithm CURE works with numerical attributes (particularly low dimensional spatial data), the algorithm ROCK developed by the same researchers =-=[GRS99]-=- targets hierarchical agglomerative clustering for categorical attributes. It is surveyed in the section Co-Occurrence of Categorical Data. The hierarchical agglomerative algorithm CHAMELEON developed... |

408 | When is ”nearest neighbor” meaningful?,
- Beyer, Goldstein, et al.
- 1999
(Show Context)
Citation Context ...paration in high dimensional space. Mathematically, nearest neighbor query becomes unstable: the distance to the nearest neighbor becomes indistinguishable from the distance to the majority of points =-=[BGRS99]-=-. This effect starts to be severe for dimensions greater than 15. Therefore, construction of clusters founded on the concept of proximity is doubtful in such situations. For interesting insights into ... |

316 | Refining Initial Points for K-Means Clustering - Bradley, Fayyad - 1998 |

304 | Scaling clustering algorithms to large databases.
- Bradley, Fayyad, et al.
- 1998
(Show Context)
Citation Context ...es so-called harmonic means. We discuss scalability issues in the section Scalability and VLDB Extensions. For a comprehensive approach in relation to k-means see an excellent study by Bradley et al. =-=[BFR98]-=-. One generic method to achieve scalability is to preprocess or squash the data. Such preprocessing usually also takes care of outliers. Preprocessing has its drawbacks. It results in approximations t... |

297 | Distributional clustering of words for text classification
- Baker, McCallum
- 1998
(Show Context)
Citation Context ...que under the name information bottleneck method. Slonim & Tishby used this method in document clustering [ST00] and classification [ST01] by using agglomerative clustering of words. Baker & McCallum =-=[BM98]-=- showed that in predictive mining NaYve Bayes classification intimately relates to such attribute grouping. Berkhin & Becher [BB02] showed algebraic connection of distributional clustering to k-means.... |

289 |
Cluster analysis of multivariate data: efficiency versus interpretability of classifications.”
- Forgy
- 1965
(Show Context)
Citation Context ..., and then recompute centroids of newly assembled groups. Iterations continue until a stopping criterion is achieved (for example, no reassignments happen). This version is known as Forgy's algorithm =-=[For65]-=- and has many advantages:sIt easily works with any Lp-normsIt allows straightforward parallelization [DM99]sIt is insensitive with respect to data ordering. Another version of k-means iterative optimi... |

228 |
Models of incremental concept formation
- Gennari, Langley, et al.
- 1989
(Show Context)
Citation Context ...), though it depends non-linearly on tree characteristics packed into constant t. The similar incremental hierarchical algorithm for all numerical attributes, CLASSIT, was developed by Gennari et al. =-=[GLF89]-=-. CLASSIT associates normal distributions with cluster nodes. Both algorithms can result in a highly unbalanced trees. Chiu et al. [CFCW01] proposed another conceptual or model-based approach to hiera... |

226 | Computer Immunology
- Forrest, Hofmeyr, et al.
- 1997
(Show Context)
Citation Context ...on rules). Outlier is also a primary commodity in fraud detection, network security, anomaly detection, and computer immunology. Some connections and further references can be found in Forrest et al. =-=[FHS97]-=-, Lee & Stolfo [LS98], Ghosh et al. [GSS99]. Acknowledgements Cooperation with Jonathan Becher was essential for the appearance of this text. It resulted in numerous discussions and various improvemen... |

208 | The pyramid-technique: Towards breaking the curse of dimensionality
- Berchtold, Böhm, et al.
- 1998
(Show Context)
Citation Context ...[FBF77], R-trees [Gut84], R*-trees [KSSB90]. A blend of attribute transformations (DFT, Polynomials) and indexing technique is presented in [KCPM01 ]. Other indices and numerous generalizations exist =-=[BBK98]-=-, [BKS90], [FRM94], [KH00], [KCP01]. The major application of such data structures is in nearest neighbors search. Preprocessing of multimedia data that is based on its embedding in Euclidean space (a... |

194 | Finding generalized projected clusters in high dimensional spaces
- Aggarwal, Yu
(Show Context)
Citation Context ...l data pass follows after iterative stage is finished to refine clusters including subspaces associated with the medoids. The algorithm ORCLUS (ORiented projected CLUSter generation) by Aggrawal & Yu =-=[AY00]-=- uses a similar concept of projected clusters, but employs non-axis parallel subspaces of high dimensional space. in fact, both developments address a more generic issue: even in a low dimensional spa... |

180 | Clustering categorical data: an approach based ong dynamical systems”,
- Gibson, Kleinberg, et al.
- 2000
(Show Context)
Citation Context ...mong all association rules involving the item-set. A solution to the problem of k-way partitioning of a hyper-graph//is provided by algorithm HMETIS [KAKS97]. A beautiful development of Gibson et al. =-=[GKR98]-=-, algorithm STIRR (Sieving Through Iterated Reinfircement) deals with co-occurrence for d-dimensional categorical objects, tuples. (Extension to transactional data is obvious.) The authors consider a ... |

162 | Entropy-based subspace clustering for mining numerical data,” in KDD,
- Cheng, Fu, et al.
- 1999
(Show Context)
Citation Context ...selected, a complexity of dense units generations is O(const k + kN). identification of clusters is a quadratic task in terms of units. The algorithm ENCLUS (ENtropy-based CLUStering) by Cheng et al. =-=[CFZ99]-=-, follows in the footsteps of CLIQUE, but uses a different criterion for subspace selection. The criterion is derived from entropy related considerations: the subspace spanned by attributes A,...,Akwi... |

155 |
Understanding Search Engines: Mathematical Modeling and Text Retrieval.
- Berry, Browne
- 2005
(Show Context)
Citation Context ... taxonomies are frequently used. For example, linear algebra methods, based on singular value decomposition (SVD) are broadly used in collaborative filtering and information retrieval (Berry & Browne =-=[BB99]-=-). SVD application to hierarchical divisive clustering of document collections resulted in the PDDP (Principal Direction Divisive Partitioning) algorithm developed by Boley [Bo198]. In our notations o... |

133 | Principal direction divisive partitioning.
- Boley
- 1998
(Show Context)
Citation Context ...eval (Berry & Browne [BB99]). SVD application to hierarchical divisive clustering of document collections resulted in the PDDP (Principal Direction Divisive Partitioning) algorithm developed by Boley =-=[Bo198]-=-. In our notations object x corresponds to a document, lth attribute corresponds to a word (or index term), and matrix entryxl is a measure of/-term frequency in document x (for example, TF-IDF). PDDP... |

120 | Iterative optimization and simplification of hierarchical clusterings.
- Fisher
- 1996
(Show Context)
Citation Context ...reflects greedy approach utilized in hierarchical clustering. Though COBWEB does reconsider its decisions, it is so inexpensive that the resulting classification tree can have sub-par quality. Fisher =-=[Fis96]-=- studied iterative hierarchical cluster redistribution to improve once constructed dendrogram. Karypis et al. [KHK99b] also researched refinement for hierarchical clustering. In particular, they broug... |

115 | Efficient clustering of very large document collections. Data Mining for Scientific and Engineering Applications
- Dhillon, Fan, et al.
- 2001
(Show Context)
Citation Context ...d to specific fields. Clustering in data mining was brought to life by intense developments in information retrieval and text mining (Cutting et al. [CKPT92], Steinbach et al. [SKK00], Dhillon et al. =-=[DFG01]-=-), spatial database applications, for example GIS, (Xu et al. [XEKS98], Sander et al. [SEKX98], Ester et al. [EFKS00]), sequence and heterogeneous data analysis (Cadez et al. [CSM01]), Web application... |

111 | Convergence properties of the k-means algorithms.
- Bottou, Bengio
- 1995
(Show Context)
Citation Context ...! Z t 2 t+l t '+' ' (x,-c./)w/ or c./ =c./+a,(x, ' = -c./)% C./ C./ + a t i=I:N in direction of gradient decent. Ideas related to gradient descent methods for k-means are discussed in Bottou & Bengio =-=[BB95]-=-. Here scalar sequence a, satisfies certain monotone asymptotic behavior and converge to zero, coefficients w are defined trough and in second case one x is selected randomly. Such updates are also us... |

108 |
Spatial clustering methods in data mining: A survey. Taylor and Francis.
- Han, Kamber, et al.
- 2001
(Show Context)
Citation Context ...lude Hartigan [Har75], Spath [Spa80], Jain & Dubes [JD88], Kaufman & Rousseeuw [KR90], Dubes [Dub93], Everitt [Eve93], Mirkin [Mir96], Jain et al. [JMF99], Fasulo [Fas99], Kolatch [Ko101], Han et al. =-=[HKT01]-=-, Ghosh [Gho02] A very good introduction to contemporary data mining clustering techniques can be found in the textbook Han& Kamber [HK01 ]. There is a close relationship between clustering techniques... |

107 |
Efficient algorithms for agglomerative hierarchical clustering methods.
- Day, Edelsbrunner
- 1984
(Show Context)
Citation Context ...-Williams formula has an utmost importance since it makes manipulation with linkage metrics computationally feasible. Survey of linkage metrics can be found in Murtagh [Mur83], and Day & Edelsbrunner =-=[DE84]-=-. As mentioned above, when the base measure is a distance, these methods capture inter-cluster closeness. However, a similarity-based view is also possible and results in inter-cluster connectivity co... |

101 | Clustering based on association rule hypergraphs.
- Han, Karypis, et al.
- 1997
(Show Context)
Citation Context ...tering (section CoClustering) . This formally-speaking preprocessing step becomes the major concern, while the following data clustering is a lesser issue. We start with the development of Han et al. =-=[HKKM97]-=- dealing with transactional data that concentrates on clustering items. After items are clustered, the authors use a very simple method to cluster transactions themselves: each transaction T is assign... |

96 | B.Ozyrt, “Clustering with a Genetically Optimized Approach”,
- Hall
- 1999
(Show Context)
Citation Context ... the so-called tabu search, A1-Sultan [A1S95]. Genetic Algorithms (GA) [Go189] are also used for clustering. An example is GGA, Genetically Guided Algorithm, for fuzzy and hard k-means by Hall et al. =-=[HOB99]-=-. This article can be used for further references. Sarails et al. [SZT02] applied GA in the context of k-means objective function. A population is a set of k-means systems, though represented by grid ... |

86 | An Analysis of Recent Work on Clustering Algorithms,
- Fasulo
- 1999
(Show Context)
Citation Context ...l references regarding clustering include Hartigan [Har75], Spath [Spa80], Jain & Dubes [JD88], Kaufman & Rousseeuw [KR90], Dubes [Dub93], Everitt [Eve93], Mirkin [Mir96], Jain et al. [JMF99], Fasulo =-=[Fas99]-=-, Kolatch [Ko101], Han et al. [HKT01], Ghosh [Gho02] A very good introduction to contemporary data mining clustering techniques can be found in the textbook Han& Kamber [HK01 ]. There is a close relat... |

84 | MAFIA: efficient and scalable subspace clustering for very large data sets
- Goil, Nagesh, et al.
- 1999
(Show Context)
Citation Context ...w entropy subspace corresponds to a skewed distribution of unit densities. The computational costs of ENCLUS are significant. The algorithm MAFIA (Merging of Adaptive Finite intervals) by Goil et al. =-=[GNC99]-=-, [NGC01] significantly modifies CLIQUE. it starts with one data pass to construct adaptive grids in each dimension. Many (1000) bins are used to compute histograms by adding blocks of data in core, w... |

81 | A general probabilistic framework for clustering individuals and objects.
- Cadez, Gaffney, et al.
- 2000
(Show Context)
Citation Context ...ble length dynamic data (e.g., customer profile). The dynamic data can represent finite sequences subject to a first-order Markov model with a transition matrix dependent on the cluster. Cadez et al. =-=[CGS00]-=- further developed this framework to an individual data instance consisting of several sequences, where number n i of sequences per xi is subject to geometric distribution. To emulate sessions of diff... |

72 |
CACTUS-Clustering Categorical Data Using Summaries. In:
- Ganti, Gehrkeand, et al.
- 1999
(Show Context)
Citation Context ...d nearest neighbors in clustering was suggested by Jarvis & Patrick [JP73] in 1973! See also Gowda & Krishna [GK78]. The algorithm CACTUS (Clustering Categorical Data Using Summaries) by Ganti et al. =-=[GGR99]-=- looks for hyper-rectangular clusters (called interval regions) in point-byattribute data with categorical attributes. In our terminology such clusters are segments. CACTUS utilizes the idea of co-occ... |

67 |
A Robust and Scalable Clustering Algorithm for Mixed Type Attributes in Large Database Environment.
- Chiu, Fang, et al.
- 2001
(Show Context)
Citation Context ...umerical attributes, CLASSIT, was developed by Gennari et al. [GLF89]. CLASSIT associates normal distributions with cluster nodes. Both algorithms can result in a highly unbalanced trees. Chiu et al. =-=[CFCW01]-=- proposed another conceptual or model-based approach to hierarchical clustering. This development contains several different useful features, such as the extension of BIRCH-like preprocessing to categ... |

67 | Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. In:
- Ertoz, Steinbach, et al.
- 2003
(Show Context)
Citation Context ...a. It has a complexity O(CmNsample q-Nample 10g(Nsample)) , where coefficient c m is a product of average and maximum number of neighbors. The algorithm SNN (Shared Nearest Neighbors) by Ertoz et al. =-=[ESK02]-=- blends a density-based approach with the idea of ROCK. SNN sparsifies similarity matrix (therefore, unfortunately resulting in O(/V 2) complexity) by only keeping k-nearest neighbors, and thus derive... |

65 |
Agglomerative clustering using the concept of mutual nearest neighborhood.
- Gowda, Krishna
- 1978
(Show Context)
Citation Context ...thus derives the total strength of links for each x. For this matter, the idea to use shared nearest neighbors in clustering was suggested by Jarvis & Patrick [JP73] in 1973! See also Gowda & Krishna =-=[GK78]-=-. The algorithm CACTUS (Clustering Categorical Data Using Summaries) by Ganti et al. [GGR99] looks for hyper-rectangular clusters (called interval regions) in point-byattribute data with categorical a... |

59 |
A near optimal initial seed value selection in k-means algorithm using a genetic algorithm.
- Babu, Murty
- 1993
(Show Context)
Citation Context ...f samples. Centroids of the best system constructed this way are suggested as intelligent initial guesses to ignite the k-means algorithm on the full data. Another interesting attempt by Babu & Murty =-=[BM93]-=- is based on GA (see below). No initialization actually guarantees global minimum for k-means. As is common to any combinatorial optimization, a logical attempt to cure this problem is to use simulate... |

58 | A database interface for clustering in large spatial databases.
- Ester, Kriegel, et al.
- 1995
(Show Context)
Citation Context ...alue humlocal=2 is recommended). The best node (set of reedolds) is returned for the formation of a resulting partition. The complexity of CLARANS is O(N 2) in terms of number of points. Ester et al. =-=[EKX95]-=- extended CLARANS to spatial VLDB. They used R*-trees [BKS90] to relax the original requirement that all the data resides in core memory, which allowed focusing exploration on the relevant part of the... |

56 | Clustering large datasets in arbitrary metric spaces. - Ganti, Ramakrishnan, et al. - 1999 |

55 | Enhanced word clustering for hierarchical text classification,
- Dhillon, Mallela, et al.
- 2002
(Show Context)
Citation Context ...nalog for KL-distance as an iterative step in algorithm SIMPLIFYRELATION that gradually co-clusters points and attributes. This development has industrial applications in Web analysis. Dhillon et al. =-=[DMK02]-=- used Jensen-Shanon divergence to cluster attributes in k-means fashion in text classification. Besides text and Web data clustering, the idea of co-clustering finds its way into clustering of gene mi... |

52 | Using the fractal dimension to cluster datasets
- Barbará, Chen
- 2000
(Show Context)
Citation Context ...of log-log plot of number of sells N(r) occupied by the set as a function of a grid size r. A fast algorithm (box counting) for computing fractal dimension was introduced in [LT89]. In Barbara & Chen =-=[BC00]-=- these concepts were used to develop the FC (Fractal Clustering) algorithm, which was designed for numeric attributes and works with several layers of grids (cardinality of each dimension is increased... |

51 | Pregibon D. Squashing flat files flatter
- W, Volinsky, et al.
- 1999
(Show Context)
Citation Context ...is makes it very appealing for dynamic VLDB. Some further tools can be used to improve obtained clusters. Data squashing techniques scan data to compute certain data summaries (sufficient statistics) =-=[DVJ99]-=-. The obtained summaries are then used instead of the original data for further clustering. The pivotal role here belongs to the algorithm BIRCH (Balanced Iterative Reduction and Clustering using Hier... |

50 |
Explaining basic categories: Feature predictability and information.
- Corter, Gluck
- 1992
(Show Context)
Citation Context ...ree construction, every new point is descended along the tree and the tree is potentially updated (by an insert/split/merge/create operation). Decisions are based on an analysis of a category utility =-=[CG92]-=- = k, cu(c/) = y'/,p (Pr(g = V/p I C/) 2 - (Pr(,4 = Vp) 2) similar to GINI index. It rewards clusters C/ for increases in predictability of the categorical attribute values Vp. Being incremental, COBW... |

50 | MCLUST: Software for modelbased clustering, discriminant analysis and density estimation.
- Fraley, Raftery
- 2002
(Show Context)
Citation Context ...bability distribution complexity to the likelihood expression previously dependent only on parameter values. This algorithm has a history of industrial usage. The algorithm MCLUST by Fraley & Raftery =-=[FR99] is a soft-=-ware package (commercially linked with S-PLUS) for hierarchical, mixture model clustering, and discriminant analysis using BIC (see section "How Many Clusters?") for estimation of goodness o... |

40 |
On the Surprising Behavior of Distance Metrics
- Aggarwal, Hinneburg, et al.
- 2001
(Show Context)
Citation Context ...n 15. Therefore, construction of clusters founded on the concept of proximity is doubtful in such situations. For interesting insights into complications of high dimensional data, see Aggarwal et al. =-=[AHK00]-=-. Basic exploratory data analysis (e.g., attribute selection) preceding the clustering step is the best way to address the first problem. We consider this topic in the section General Algorithmic Issu... |

39 |
An overview of combinatorial data analysis.
- Arabie, Hubert
- 1996
(Show Context)
Citation Context ...an be found in the textbook Han& Kamber [HK01 ]. There is a close relationship between clustering techniques and many other disciplines. Clustering has always been used in statistics (Arabie & Hubert =-=[AH96]-=-) and science (Massart & Kaufman [MK83]). The classic introduction into pattern recognition framework is given in Duda & Hart [DH73]. For statistical approaches to pattern recognition see Dempster et ... |

39 | Constrained K-Means Clustering,
- Bennett, Bradley, et al.
- 2000
(Show Context)
Citation Context ... of very different sizes (in some versions it could result in empty clusters). Is it possible to guarantee clusters of balanced sized? The answer is yes. For corresponding research see Bradley et al. =-=[BBD00]-=- and Banerjee & Ghosh [BG02]. 4. Density-Based Partitioning An open set in Euclidean space can be factorized into a set of its connected components. The implementation of this idea for partitioning of... |

39 |
Fitting equations to data: computer analysis of multifactor data
- Daniel, Wood
- 1999
(Show Context)
Citation Context ... data compression in image processing, which is also known as vector quantization (Gersho & Gray [GG92]). Data fitting in numerical analysis provides another venue in data modeling; see Daniel & Wood =-=[DW80]-=-. This survey's emphasis is on clustering in data mining. Such clustering is characterized by large datasets with many attributes of different types. Though we do not even try to review particular app... |

37 | Data bubbles: quality preserving performance boosting for hierarchical clustering.
- Breunig, Kriegel, et al.
- 2001
(Show Context)
Citation Context ... on an improvement: approximate isometric map is used. This is possible due to FastMap algorithm, Faloutsos & Lin [FL95]. In a context of hierarchical density-based clustering in VLDB, Breunig et al. =-=[BKKS01]-=- analyzed data reduction techniques such as sampling and BIRCH summarization, and noticed that they resulted in deterioration of cluster quality. To mitigate this they introduced data bubble, another ... |

36 | Clustering spatial data using random walks.
- Harel, Koren
- 2001
(Show Context)
Citation Context ...a (e.g., GIS databases) an algorithm AMOEBA (Estivill-Castro & Lee [EL00]) uses Delaunay diagram (the dual of Voronoi diagram) to represent data proximity and has O(Nlog(N)) complexity. Harel & Koren =-=[HKo01]-=- suggested another approach related to agglomerative hierarchical graph methodology that they showed to successfully find local clusters in 2D. Graph's G = (X,E) vertices X are points, while edge weig... |

33 | A scalable algorithm for clustering sequential data” in - Guralnik, Karypis - 2001 |

32 |
Clustering with evolution strategies
- Babu, Murty
(Show Context)
Citation Context ...fit data and have high computational costs that limit their application in data mining. However, usage of combined strategies (e.g., generation of initial guess for k-means) has been attempted [BM93]-=-=[BM94]-=-. A recent publication by Lee & Antonsson [LA00] deals with simultaneous improvements of k-means centroids and k itself, and uses GA with variable length genome. This has merit compared with running m... |

32 |
ISODATA, a novel method of data analysis and classification, Research Report AD699616,
- Ball, Hall
- 1965
(Show Context)
Citation Context ...s, there have been other attempts to find minimum of k-means objective function. One early attempt to optimize by means of merging and splitting of intermediate clusters is implemented by Ball & Hall =-=[BH65]-=- in algorithm ISODATA. The k-means algorithm suffers from all the usual suspects:sThe result strongly depends on the initial guess of centroids (or assignments)sComputed local optimum is known to be a... |

32 |
A Data Clustering Algorithm on Distributed Memory Multiprocessors,” Large-Scale Parallel Data Mining,
- Dhillon, Modha
- 2000
(Show Context)
Citation Context ...achieved (for example, no reassignments happen). This version is known as Forgy's algorithm [For65] and has many advantages:sIt easily works with any Lp-normsIt allows straightforward parallelization =-=[DM99]-=-sIt is insensitive with respect to data ordering. Another version of k-means iterative optimization reassigns points based on more detailed analysis of effects on the objective function caused by movi... |

32 | Spatial data mining: Database primitives, algorithms and efficient dbms support
- Ester, Frommelt, et al.
(Show Context)
Citation Context ... and text mining (Cutting et al. [CKPT92], Steinbach et al. [SKK00], Dhillon et al. [DFG01]), spatial database applications, for example GIS, (Xu et al. [XEKS98], Sander et al. [SEKX98], Ester et al. =-=[EFKS00]-=-), sequence and heterogeneous data analysis (Cadez et al. [CSM01]), Web applications (Cooley et al. [CMS99], Heer & Chi [HC01], Foss et al. [FWZ01]), DNA analysis in computational biology (Ben-Dor & Y... |

30 |
Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Informational Measure of Complexity
- Bozdogan
- 1994
(Show Context)
Citation Context ... Message Length (MML) criterion [WF87], [WD94],sBayesian Information Criterion (BIC) [Sch78], [FR98],s:tkaike's Information Criterion (AIC) [Boz83],sNon-coding Information Theoretic Criterion (ICOMP) =-=[Boz94]-=-,sApproximate FFeight of Evidence (AWE) criterion [BF93].sBayes Factors [KR95], and others [Bock96]. All these criteria are expressed through combinations of loglikelihood L, number of clusters k, num... |

30 | Probabilistic modeling of transactional data with applications to profiling, Visualization, and Prediction,
- CADEZ, SMYTH, et al.
- 2001
(Show Context)
Citation Context ...], Dhillon et al. [DFG01]), spatial database applications, for example GIS, (Xu et al. [XEKS98], Sander et al. [SEKX98], Ester et al. [EFKS00]), sequence and heterogeneous data analysis (Cadez et al. =-=[CSM01]-=-), Web applications (Cooley et al. [CMS99], Heer & Chi [HC01], Foss et al. [FWZ01]), DNA analysis in computational biology (Ben-Dor & Yakhini [BY99]), and many others. They resulted in a large amount ... |

30 |
Percentage points of a test for clusters.
- ENGELMAN
- 1969
(Show Context)
Citation Context ...me industrial applications (SAS, NeoVista) report pseudo F-statistic. This only makes sense for k-means clustering in context of ANOVA. Earlier publications on the subject include Engleman & Hartigan =-=[EH69]-=-, Milligan & Cooper [MC85]. They analyzed cluster separation for different k. For instance, average weighted (by cluster weights) distance between centroids (medoids) of 27 all cluster pairs normalize... |

28 | Amoeba: Hierarchical clustering based on spatial proximity using Delaunay diagram
- Estivill-Castro, Lee
- 2000
(Show Context)
Citation Context ...ents Among the many other developments, those that qualify for data mining have to achieve reasonable performance. For 2D spatial data (e.g., GIS databases) an algorithm AMOEBA (Estivill-Castro & Lee =-=[EL00]-=-) uses Delaunay diagram (the dual of Voronoi diagram) to represent data proximity and has O(Nlog(N)) complexity. Harel & Koren [HKo01] suggested another approach related to agglomerative hierarchical ... |

27 | A new method for similarity indexing of market basket data.
- Aggarwal, Wolf, et al.
- 1999
(Show Context)
Citation Context ...nce analysis, text mining, and pattern recognition. Extended Jaccard coefficient is advocated in [Gho02]. For construction of similarity measures for market basket analysis see Aggarwal, Wolf, and Yu =-=[AWY99]-=-; see also Baeza-Yates [FBY92]. For an axiomatic approach based on information-theoretical considerations regarding similarity, see Lin [Lin98]. The last two references contain material related to str... |

24 | A practical application of simulated annealing to clustering
- Brown, Huntley
- 1992
(Show Context)
Citation Context ...ve resonance theory (by Carpenter & Grossberg), have relation to clustering. For further directions and introduction see Jain & Mao [JM96]. 25 7.3. Evolutionary Methods The article by Brown & Huntley =-=[BH91]-=- contains substantial information on simulated annealing in the context of partitioning (main focus) or hierarchical clustering, and presents the algorithm SINICC (Simulation of Near-optima for Intern... |

24 |
Cluster analysis (3rd ed.).
- Everitt
- 1993
(Show Context)
Citation Context ...CJ2 :o. 1.2. Review of Clustering Bibliography General references regarding clustering include Hartigan [Har75], Spath [Spa80], Jain & Dubes [JD88], Kaufman & Rousseeuw [KR90], Dubes [Dub93], Everitt =-=[Eve93]-=-, Mirkin [Mir96], Jain et al. [JMF99], Fasulo [Fas99], Kolatch [Ko101], Han et al. [HKT01], Ghosh [Gho02] A very good introduction to contemporary data mining clustering techniques can be found in the... |

24 | A non-parametric approach to Web log analysis.
- FOSS, WANG, et al.
- 2001
(Show Context)
Citation Context ... al. [XEKS98], Sander et al. [SEKX98], Ester et al. [EFKS00]), sequence and heterogeneous data analysis (Cadez et al. [CSM01]), Web applications (Cooley et al. [CMS99], Heer & Chi [HC01], Foss et al. =-=[FWZ01]-=-), DNA analysis in computational biology (Ben-Dor & Yakhini [BY99]), and many others. They resulted in a large amount of applicationspecific ideas that are beyond our scope, but also in some general t... |

22 | Double conjugated clustering applied to leukemia microarray data.
- Busygin, al
- 2002
(Show Context)
Citation Context ...gence to cluster attributes in k-means fashion in text classification. Besides text and Web data clustering, the idea of co-clustering finds its way into clustering of gene microarrays Busygin et al. =-=[BJK02]-=-. 36 11. General Algorithmic Issues We have presented many different clustering techniques. However, there are common issues that must be addressed to make any clustering algorithm successful. Some ar... |

20 |
Simultaneous clustering of rows and columns
- Govaert
- 1995
(Show Context)
Citation Context ...y or contingency matrix. In applications it can reflect an occurrence of an item in a basket, frequency of visitation activity of a page, or the amount of a sale in a store per item category. Govaert =-=[Gov95]-=- researched simultaneous block clustering of the rows and columns of contingency tables. This article also contains a review of earlier work. Dhillon [Dhi01 ] proposed an advanced algebraic approach t... |

16 | Learning simple relations: theory and applications.
- Berkhin, Becher
- 2002
(Show Context)
Citation Context ... is computationally feasible, because the outlined analysis requires an inner loop over other member points of involved clusters affected by centroids shifts. However, in L 2 case it is known [DH73], =-=[BB02]-=- that all computations can be algebraically reduced to simply computing a single distance. Therefore, in this case both versions have the same computational complexity. There is an experimental eviden... |

13 |
Vector Quantization and Signal Compression. Communications and Information Theory.
- GERSHO, GRAY
- 1992
(Show Context)
Citation Context ... subject of traditional multivariate statistical estimation (Scott [Sco92]). It is also widely used for data compression in image processing, which is also known as vector quantization (Gersho & Gray =-=[GG92]-=-). Data fitting in numerical analysis provides another venue in data modeling; see Daniel & Wood [DW80]. This survey's emphasis is on clustering in data mining. Such clustering is characterized by lar... |

11 | A Framework for Finding Projected Clusters in High Dimensional Spaces
- Aggarwal, Procopiuc, et al.
- 1999
(Show Context)
Citation Context ...criminates two significantly dense half-spaces. Several cutting planes are chosen, and recursion continues with each subset of data. 34 The algorithm PROCLUS (PROjected CLUstering) by Aggarwal et al. =-=[APW99]-=- associates with the subset C a low-dimensional subspace such that the projection of C into the subspace is a tight cluster. The subset and subspace pair constitutes a projected cluster. The number k ... |

11 | An information-theoretical approach to clustering categorical databases using genetic algorithms. in:
- Simovici
- 2002
(Show Context)
Citation Context ...ules. Unlike the normal k-means, clusters can have different size and elongation; however, shapes are restricted to segments, a far cry from density-based methods. In a work by Cristofor and Simovici =-=[CS02]-=- GA is applied to clustering of categorical data. Authors use so-called generalized entropy to define the concept of dissimilarity. Evolutionary techniques rely on certain parameters to empirically fi... |

10 |
Probability Models in Partitional Cluster Analysis
- BOCK
(Show Context)
Citation Context ...8],s:tkaike's Information Criterion (AIC) [Boz83],sNon-coding Information Theoretic Criterion (ICOMP) [Boz94],sApproximate FFeight of Evidence (AWE) criterion [BF93].sBayes Factors [KR95], and others =-=[Bock96]-=-. All these criteria are expressed through combinations of loglikelihood L, number of clusters k, number of parameters per cluster, total number of estimated parameters p, and different flavors of Fis... |

9 | Efficient algorithms for normalized edit distance
- Arslan, Egecioglu
- 2000
(Show Context)
Citation Context ... last two references contain material related to strings similarity (biology is one application). With respect to strings over the same alphabet, edit distance is a frequent choice Arslan & Egecioglu =-=[AE00]-=-. it is based on a complexity of a sequence of transformations (such as insertion, deletion, transposition, etc.) that are necessary for mapping one string into another. A classic Hamming distance [CT... |

9 |
Cluster analysis and related issues, in: Handbook of Pattern Recognition & Computer Vision
- Dubes
- 1993
(Show Context)
Citation Context ...Couli ..... CJ N CJ2 :o. 1.2. Review of Clustering Bibliography General references regarding clustering include Hartigan [Har75], Spath [Spa80], Jain & Dubes [JD88], Kaufman & Rousseeuw [KR90], Dubes =-=[Dub93]-=-, Everitt [Eve93], Mirkin [Mir96], Jain et al. [JMF99], Fasulo [Fas99], Kolatch [Ko101], Han et al. [HKT01], Ghosh [Gho02] A very good introduction to contemporary data mining clustering techniques ca... |

8 |
Determining the number of component clusters in the standardmultivariate normal mixture model using model-selection criteria. Technical Report UIC/DQM/A83-1 ARO Contract DAAG29-82-K-0155, Quantitative Methods Department,University of Illinois at Chicago.
- Bozdogan
- 1983
(Show Context)
Citation Context ... Description Length (MDL) [Ris78], [Sch78], [Ris89],sMinimum Message Length (MML) criterion [WF87], [WD94],sBayesian Information Criterion (BIC) [Sch78], [FR98],s:tkaike's Information Criterion (AIC) =-=[Boz83]-=-,sNon-coding Information Theoretic Criterion (ICOMP) [Boz94],sApproximate FFeight of Evidence (AWE) criterion [BF93].sBayes Factors [KR95], and others [Bock96]. All these criteria are expressed throug... |

7 | Automating exploratory data analysis for efficient data mining
- Becher, Berkhin, et al.
- 2000
(Show Context)
Citation Context ...t proximity measures and clusters lose their spherical shapes. Therefore, sound exploratory data analysis (EDA) is essential. An overall framework for EDA can be found in Becher, Berkhin, and Freeman =-=[BBF00]-=-. As its first order of business, EDA eliminates inappropriate attributes and reduces the cardinality of the retained categorical attributes. Next it provides attribute selection. Different attribute ... |

4 |
A tabu search approach to the clustering problem
- A1-Sultan
- 1995
(Show Context)
Citation Context ...s development has a real application - surveillance monitoring of ground-based "entities" by airborne and ground-based sensors. Similar to simulating annealing is the so-called tabu search, =-=A1-Sultan [A1S95]-=-. Genetic Algorithms (GA) [Go189] are also used for clustering. An example is GGA, Genetically Guided Algorithm, for fuzzy and hard k-means by Hall et al. [HOB99]. This article can be used for further... |

2 |
Clustering to minimize the maximum intercluster distance
- F
- 1985
(Show Context)
Citation Context ...of admissible k. The tremendous popularity of k-means has brought to life many extensions and modifications. Mao & Jain [MJ96] used Mahalanobis distance to handle hyper-ellipsoidal clusters. Gonzales =-=[Gon85]-=- used the maximum of intra-cluster variances instead of the sum. Almost every industrial implementation of k-means somehow resolves the issue of categorical attributes. Huang [Hua98] described a possi... |

1 |
Cluster Analysis andApplications
- Anderberg
- 1973
(Show Context)
Citation Context ...t grouping columns as well. This utilizes a canonical duality containing in the concept of point-by-attribute representation. The idea of co-clustering of data points and attributes is old (Anderberg =-=[And73]-=-, Hartigan [Har75]) and is known under the names simultaneous clustering, bi-dimensional clustering, block clustering, conjugate clustering, distributional clustering, and information bottleneck metho... |

1 |
On scaling up balanced clustering algorithms. 2
- Banerjee, Ghosh
- 2002
(Show Context)
Citation Context ...some versions it could result in empty clusters). Is it possible to guarantee clusters of balanced sized? The answer is yes. For corresponding research see Bradley et al. [BBD00] and Banerjee & Ghosh =-=[BG02]-=-. 4. Density-Based Partitioning An open set in Euclidean space can be factorized into a set of its connected components. The implementation of this idea for partitioning of a finite set of points requ... |

1 |
The R*-tree: An efficient access method for points and rectangles
- BECKMANN, KRIEGEL, et al.
- 1990
(Show Context)
Citation Context ...ds) is returned for the formation of a resulting partition. The complexity of CLARANS is O(N 2) in terms of number of points. Ester et al. [EKX95] extended CLARANS to spatial VLDB. They used R*-trees =-=[BKS90]-=- to relax the original requirement that all the data resides in core memory, which allowed focusing exploration on the relevant part of the database that resides at a branch of the whole data tree. 3.... |

1 |
An efficient algorithm for a complete link method
- Delays
- 1977
(Show Context)
Citation Context ...(x,y) lxsC, ysC2}. Early examples include the algorithm SLINK by Sibson [Sib73] which implements single link, Voorhees' method [Voo86] which implements average link, and the algorithm CLINK by Delays =-=[Def77]-=- which implements complete link. Of these SLINK is referenced the most. It is related to the problem of finding the Euclidean minimal spanning tree Yao [Yao82] and has O(N 2) computational complexity.... |

1 |
Refining clusters in high dimensional data. 2 "d SI.4MICDM,, Workshop on clustering high dimensional data
- Dhillon, Guan, et al.
- 2002
(Show Context)
Citation Context ...versions have the same computational complexity. There is an experimental evidence that, compared with Forgy's algorithm, this version frequently yields better results [LA99], [SKK00]. Dhillon et al. =-=[DGK02]-=- noticed that a Forgy's spherical k-means (using cosine similarity instead of Euclidean distance) has a tendency to get stuck when applied to document collections. They noticed that a version reassign... |

1 |
Clustering Methods. Private communication
- Ghosh
(Show Context)
Citation Context ...Har75], Spath [Spa80], Jain & Dubes [JD88], Kaufman & Rousseeuw [KR90], Dubes [Dub93], Everitt [Eve93], Mirkin [Mir96], Jain et al. [JMF99], Fasulo [Fas99], Kolatch [Ko101], Han et al. [HKT01], Ghosh =-=[Gho02]-=- A very good introduction to contemporary data mining clustering techniques can be found in the textbook Han& Kamber [HK01 ]. There is a close relationship between clustering techniques and many other... |

1 |
Learning program behavior profiles for intrusion detection
- unknown authors
- 1999
(Show Context)
Citation Context ...ity in fraud detection, network security, anomaly detection, and computer immunology. Some connections and further references can be found in Forrest et al. [FHS97], Lee & Stolfo [LS98], Ghosh et al. =-=[GSS99]-=-. Acknowledgements Cooperation with Jonathan Becher was essential for the appearance of this text. It resulted in numerous discussions and various improvements. I am very much thankful to Sue Krouscup... |