Results 1  10
of
65
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract

Cited by 286 (0 self)
 Add to MetaCart
(Show Context)
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
ADE4: a multivariate analysis and graphical display software
 Stat. Comput
, 1997
"... e searching, zooming, selection of points, and display of data values on factor maps. The user interface is simple and homogeneous among all the programs; this contributes to making the use of ADE4 very easy for nonspecialists in statistics, data analysis or computer science. Keywords: Multivar ..."
Abstract

Cited by 109 (12 self)
 Add to MetaCart
(Show Context)
e searching, zooming, selection of points, and display of data values on factor maps. The user interface is simple and homogeneous among all the programs; this contributes to making the use of ADE4 very easy for nonspecialists in statistics, data analysis or computer science. Keywords: Multivariate analysis, principal component analysis, correspondence analysis, instrumental variables, canonical correspondence analysis, partial least squares regression, coinertia analysis, graphics, multivariate graphics, interactive graphics, Macintosh, HyperCard, Windows 95 1. Introduction ADE4 is a multivariate analysis and graphical display software for Apple Macintosh and Windows 95 microcomputers. It is made up of several standalone applications, called modules, that feature a wide range of multivariate analysis methods, from simple onetable analysis to threeway table analysis and twotable coupling methods. It also provides many possibilitie
The analysis of vegetationenvironment relationships by canonical correspondence analysis
, 1987
"... Canonical correspondence analysis (CCA) is introduced as a multivariate extension of weighted averaging ordination, which is a simple method for arranging species along environmental variables. CCA constructs those linear combinations of environmental variables, along which the distributions of the ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
Canonical correspondence analysis (CCA) is introduced as a multivariate extension of weighted averaging ordination, which is a simple method for arranging species along environmental variables. CCA constructs those linear combinations of environmental variables, along which the distributions of the species are maximally separated. The eigenvalues produced by CCA measure this separation. As its name suggests, CCA is also a correspondence analysis technique, but one in which the ordination axes are constrained to be linear combinations of environmental variables. The ordination diagram generated by CCA visualizes not only a pattern of community variation (as in standard ordination) but also the main features of the distributions of species along the environmental variables. Applications demonstrate that CCA can be used both for detecting speciesenvironment relations, and for investigating specific questions about the response of species to environmental variables. Questions in community ecology that have typically been studied by 'indirect ' gradient analysis (i.e. ordination followed by external interpretation of the axes) can now be answered more directly by CCA.
Document clustering via adaptive subspace iteration
 In SIGIR
, 2004
"... Document clustering has long been an important problem in information retrieval. In this paper, we present a new clustering algorithm ASI1, which uses explicitly modeling of the subspace structure associated with each cluster. ASI simultaneously performs data reduction and subspace identification vi ..."
Abstract

Cited by 29 (6 self)
 Add to MetaCart
(Show Context)
Document clustering has long been an important problem in information retrieval. In this paper, we present a new clustering algorithm ASI1, which uses explicitly modeling of the subspace structure associated with each cluster. ASI simultaneously performs data reduction and subspace identification via an iterative alternating optimization procedure. Motivated from the optimization procedure, we then provide a novel method to determine the number of clusters. We also discuss the connections of ASI with various existential clustering approaches. Finally, extensive experimental results on real data sets show the effectiveness of ASI algorithm.
An analysis and synthesis of multiple correspondence analysis, optimal scaling, dual scaling, homogeneity analysis and other methods for quantifying categorical multivariate data
 Psychometrika
, 1985
"... We discuss a variety of methods for quantifying categorical multivariate data. These methods have been proposed in many different countries, by many different authors, under many different names. In the first major section of the paper we analyze the many different methods and show that they all lea ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
(Show Context)
We discuss a variety of methods for quantifying categorical multivariate data. These methods have been proposed in many different countries, by many different authors, under many different names. In the first major section of the paper we analyze the many different methods and show that they all lead to the same equations for analyzing the same data. In the second major section of the paper we introduce the notion of a duality diagram, and use this diagram to synthesize the many superficially different methods into a single method. Key words: multiple correspondence analysis, optimal scaling, dual scaling, homogeneity analysis, categorical multivariate data. This paper has two major sections. In the first section we discuss a variety of apparently different data analysis methods and show that they all lead to the same equations for analyzing the same data. In the second section we use the notion of a duality diagram to systematize and organize the relationships between these superficially different meth
The Gifi System Of Descriptive Multivariate Analysis
 STATISTICAL SCIENCE
, 1998
"... The Gifi system of analyzing categorical data through nonlinear varieties of classical multivariate analysis techniques is reviewed. The system is characterized by the optimal scaling of categorical variables which is implemented through alternating least squares algorithms. The main technique of h ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
The Gifi system of analyzing categorical data through nonlinear varieties of classical multivariate analysis techniques is reviewed. The system is characterized by the optimal scaling of categorical variables which is implemented through alternating least squares algorithms. The main technique of homogeneity analysis is presented, along with its extensions and generalizations leading to nonmetric principal components analysis and canonical correlation analysis. A brief account of stability issues and areas of applications of the techniques is also given.
The relative effects of maternal and child problems on the quality of attachment: A metaanalysis of attachment in clinical samples
 Child Development
, 1992
"... metaanalysis of 34 clinical studies on attachment the hypothesis is tested that maternal problems such äs mental illness lead to more deviating attachment classification distributions than child Problems such äs deafness. A correspondence analysis on 21 North American studies with normal subjects p ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
metaanalysis of 34 clinical studies on attachment the hypothesis is tested that maternal problems such äs mental illness lead to more deviating attachment classification distributions than child Problems such äs deafness. A correspondence analysis on 21 North American studies with normal subjects produced a baseline against which the clinical samples could be evaluated. Separate analyses were carried out on studies containing the traditional A, B, C classifications and on studies that also included the recently discovered D or A/C category. Results show that groups with a primary identification of maternal problems show attachment classiflcation distributions highly divergent from the normal distributions, whereas groups with a primary identification of child problems show distributions that are similar to the distributions of normal samples. The introduction of the D or A/C classifications (about 15 % in normal samples) reveals an overrepresentation of D or A/C in the child problem groups, but the resulting distribution still is much closer to the normal distributions compared to the samples with maternal problems. In clinical samples, the mother appears to play a more important role than the child in shaping the quality of the infantmother attachment relationship. The Strange Situation and its associated classification scheme (Ainsworth, Blehar,
Partitioning Networks by Eigenvectors
, 1995
"... A survey of published methods for partitioning sparse arrays is presented. These include early attempts to describe the partitioning properties of eigenvectors of the adjacency matrix. More direct methods of partitioning are developed by introducing the Laplacian of the adjacency matrix via the dire ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
A survey of published methods for partitioning sparse arrays is presented. These include early attempts to describe the partitioning properties of eigenvectors of the adjacency matrix. More direct methods of partitioning are developed by introducing the Laplacian of the adjacency matrix via the directed (signed) edgevertex incidence matrix. It is shown that the Laplacian solves the minimization of total length of connections between adjacent nodes, which induces clustering of connected nodes by partitioning the underlying graph. Another matrix derived from the adjacency matrix is also introduced via the unsigned edgevertex matrix. This (the Normal) matrix is not symmetric, and it also is shown to solve the minimization of total length in its own nonEuclidean metric. In this case partitions are induced by clustering the connected nodes. The Normal matrix is closely related to Correspondence Analysis.
Relationships among several methods of linearly constrained correspondence analysis. Psychometrika 56
, 1991
"... This paper shows essential equivalences among several methods of linearly constrained correspondence analysis. They include Fisher’s method of additive scoring, Hayashi’s second type of quantification method, ter Braak’s canonical correspondence analysis, Nishisato’s ANOVA of categorical data, corre ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
This paper shows essential equivalences among several methods of linearly constrained correspondence analysis. They include Fisher’s method of additive scoring, Hayashi’s second type of quantification method, ter Braak’s canonical correspondence analysis, Nishisato’s ANOVA of categorical data, correspondence analysis of manipulated contingency tables, B6ckenholt and B6ckenholt’s least squares canonical analysis with linear constraints, and van der Heijden and Meijerink’s zero average restrictions. These methods fall into one of two classes of methods corresponding to two alternative ways of imposing linear constraints, the reparametrization method and the null space method. A connection between the two is established through Khatri’s lemma. Key words: canonical correlation analysis, generalized singular value decomposition (GSVD), the method of additive scoring, the second type of quantification method (Q2), canonical correspondence analysis (CCA), ANOVA of categorical data, canonical analysis with linear constraints (CALC), zero average restrictions, Khatri’s lemma. 1.