• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

An Analysis of Recent Work on Clustering Algorithms (1999)

by Daniel Fasulo
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 42
Next 10 →

Survey of clustering data mining techniques

by Pavel Berkhin , 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract - Cited by 177 (0 self) - Add to MetaCart
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique

On affine invariant clustering and automatic cast listing in movies

by Andrew Fitzgibbon, Andrew Zisserman - In Proc. ECCV , 2002
"... Abstract We develop a distance metric for clustering and classification algorithms which is invariant to affine transformations and includes priors on the transformation parameters. Such clustering requirements are generic to a number of problems in computer vision. We extend existing techniques for ..."
Abstract - Cited by 57 (13 self) - Add to MetaCart
Abstract We develop a distance metric for clustering and classification algorithms which is invariant to affine transformations and includes priors on the transformation parameters. Such clustering requirements are generic to a number of problems in computer vision. We extend existing techniques for affine-invariant clustering, and show that the new distance metric outperforms existing approximations to affine invariant distance computation, particularly under large transformations. In addition, we incorporate prior probabilities on the transformation parameters. This further regularizes the solution, mitigating a rare but serious tendency of the existing solutions to diverge. For the particular special case of corresponding point sets we demonstrate that the affine invariant measure we introduced may be obtained in closed form. As an application of these ideas we demonstrate that the faces of the principal cast of a feature film can be generated automatically using clustering with appropriate invariance. This is a very demanding test as it involves detecting and clustering over tens of thousands of images with the variances including changes in viewpoint, lighting, scale and expression. 1

Network tomography: recent developments

by Rui Castro, Mark Coates, Gang Liang, Robert Nowak, Bin Yu - Statistical Science , 2004
"... Today's Int ernet is a massive, dist([/#][ net work which cont inuest o explode in size as ecommerce andrelatH actH]M/# grow. Thehet([H(/#]H( and largelyunregulatS stregula of t/ Int/HH3 renderstnde such as dynamicroutc/[ opt2]3fl/ service provision, service level verificatflH( and det(2][/ of anoma ..."
Abstract - Cited by 49 (3 self) - Add to MetaCart
Today's Int ernet is a massive, dist([/#][ net work which cont inuest o explode in size as ecommerce andrelatH actH]M/# grow. Thehet([H(/#]H( and largelyunregulatS stregula of t/ Int/HH3 renderstnde such as dynamicroutc/[ opt2]3fl/ service provision, service level verificatflH( and det(2][/ of anomalous/malicious behaviorext/[(22 challenging. The problem is compounded bytS fact tct onecannot rely ont[ cooperatH2 of individual servers and routSS t aid intS collect[3 of net workt/[S measurement vits fort/]3 t/]3] In many ways, net workmonit]/#[ and inference problems bear a st[fl[ resemblancet otnc "inverse problems" in which key aspect of asystfl are not direct/ observable. Familiar signal processing orst[]23/#[S problems such ast omographic imagereconst[/#[S] and phylogenet# tog identn/HH2[M have int erest3/ connect[HU t tonn arising in net working. This artflMM int/ ducesnet workt/H3]S]/ y, a new field which we believe will benefit greatU from tm wealt of stH2](/#S( ttH2 andalgorit#S( It focuses especially on recent development s int2 field includingtl applicat[fl of pseudolikelihoodmetfl ds andt reeestfl3](/# formulat]M23 Keyw ords:Net workt/HflS33/ y, pseudo-likelihood,t opology identn/]H22(/ tn est/]H tst 1 Introducti6 Nonet work is an island, ent/S ofitS[S] everynet work is a piece of an int/]SS work, a part of t/ main . Alt[]][ administHSHSS of small-scale net works can monit( localt ra#ccondit][/ and ident ify congest/# point s and performance botU((2/ ks, very few net works are complet/# # Rui Castroan Robert Nowak are with theDepartmen t of Electricalan ComputerEnterX Rice Unc ersity,Houston TX; Mark Coates is with the Departmen t of Electricalan ComputerEnterX McGill UnG ersity,Mon treal, Quebec,Can Gan Lian an Bin Yu are with theDepartmen t of Statistics,...

Cluster Analysis for Gene Expression Data: A Survey

by Daxin Jiang, Chun Tang, Aidong Zhang - IEEE Transactions on Knowledge and Data Engineering , 2004
"... Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity f ..."
Abstract - Cited by 48 (3 self) - Add to MetaCart
Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increases the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. In particular, we divide cluster analysis for gene expression data into three categories. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches. We also discuss the problem of cluster validation in three aspects and review various methods to assess the quality and reliability of clustering results. Finally, we conclude this paper and suggest the promising trends in this field. Index Terms—Microarray technology, gene expression data, clustering.

From First Contact to Close Encounters: A Developmentally Deep Perceptual System for a Humanoid Robot

by Paul Fitzpatrick, Paul Michael Fitzpatrick, Paul Michael Fitzpatrick , 2003
"... This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain ..."
Abstract - Cited by 35 (6 self) - Add to MetaCart
This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply `pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively.

Pattern Recognition Techniques in Microarray Data Analysis: A Survey. Annals of the New York Academy of Sciences

by Faramarz Valafar - of Sciences, techniques in Bioinformatics and Medical Informatics , 2002
"... analysis Abstract: Recent development of technologies (e.g. microarray technology) that are capable of producing massive amounts of genetic data has highlighted the need for new pattern recognition techniques that can mine and discover “biologically meaningful ” knowledge in large data sets. Many re ..."
Abstract - Cited by 21 (0 self) - Add to MetaCart
analysis Abstract: Recent development of technologies (e.g. microarray technology) that are capable of producing massive amounts of genetic data has highlighted the need for new pattern recognition techniques that can mine and discover “biologically meaningful ” knowledge in large data sets. Many researchers have begun an endeavor in this direction to devise such datamining techniques. As such, there is a need for survey articles that periodically review and summarize the work that has been done in the area. This article presents one such survey. The first portion of the paper is meant to provide the basic biology (mostly for non-biologists) that is required in such a project. This part is only meant to be a starting point for those experts in the technical fields who wish to embark on this new area of bioinformatics. The second portion of the paper is a survey of various data mining techniques that have been used in mining microarray data for biological knowledge and information (such as sequence information). This survey is not meant to be treated as complete in any form, as the area is currently one of the most active, and the body of research is very large. Furthermore, the applications of the techniques mentioned here are not meant to be taken as the most significant applications of the techniques, but simply as some examples among many. Molecular Genome Biology

Locating and tracking multiple dynamic optima by a particle swarm model using speciation

by Daniel Parrott, Xiaodong Li - IEEE Trans. Evol. Comput , 2006
"... Abstract—This paper proposes an improved particle swarm optimizer using the notion of species to determine its neighborhood best values for solving multimodal optimization problems and for tracking multiple optima in a dynamic environment. In the proposed species-based particle swam optimization (SP ..."
Abstract - Cited by 18 (1 self) - Add to MetaCart
Abstract—This paper proposes an improved particle swarm optimizer using the notion of species to determine its neighborhood best values for solving multimodal optimization problems and for tracking multiple optima in a dynamic environment. In the proposed species-based particle swam optimization (SPSO), the swarm population is divided into species subpopulations based on their similarity. Each species is grouped around a dominating particle called the species seed. At each iteration step, species seeds are identified from the entire population, and then adopted as neighborhood bests for these individual species groups separately. Species are formed adaptively at each step based on the feedback obtained from the multimodal fitness landscape. Over successive iterations, species are able to simultaneously optimize toward multiple optima, regardless of whether they are global or local optima. Our experiments on using the SPSO to locate multiple optima in a static environment and a dynamic SPSO (DSPSO) to track multiple changing optima in a dynamic environment have demonstrated that SPSO is very effective in dealing with multimodal optimization functions in both environments. Index Terms—Multimodal optimization, optimization in dynamic environments, particle swam optimization (PSO), tracking optima in dynamic environments. I.

Inferring hierarchical descriptions

by Eric Glover, David M. Pennock, Steve Lawrence, Robert Krovetz - In Proceedings of the 11th International Conference on Informaiton and Knowledge Management (2002
"... We create a statistical model for inferring hierarchical term relationships about a topic, given only a small set of example web pages on the topic, without prior knowledge of any hierarchical information. The model can utilize either the full text of the pages in the cluster or the context of links ..."
Abstract - Cited by 18 (2 self) - Add to MetaCart
We create a statistical model for inferring hierarchical term relationships about a topic, given only a small set of example web pages on the topic, without prior knowledge of any hierarchical information. The model can utilize either the full text of the pages in the cluster or the context of links to the pages. To support the model, we use “ground truth ” data taken from the category labels in the Open Directory. We show that the model accurately separates terms in the following classes: self terms describing the cluster, parent terms describing more general concepts, and child terms describing specializations of the cluster. For example, for a set of biology pages, sample parent, self, and child terms are science, biology, and genetics respectively. We create an algorithm to predict parent, self, and child terms using the new model, and compare the predictions to the ground truth data. The algorithm accurately ranks a majority of the ground truth terms highly, and identifies additional complementary terms missing in the Open Directory.

The feature signatures of evolving programs

by Daniel R. Licata, Christopher D. Harris, Shriram Krishnamurthi - Automated Software Engineering , 2003
"... As programs evolve, their code increasingly becomes tangled by programmers and requirements. This mosaic quality complicates program comprehension and maintenance. Many of these activities can benefit from viewing the program as a collection of features. We introduce an inexpensive and easily compre ..."
Abstract - Cited by 15 (0 self) - Add to MetaCart
As programs evolve, their code increasingly becomes tangled by programmers and requirements. This mosaic quality complicates program comprehension and maintenance. Many of these activities can benefit from viewing the program as a collection of features. We introduce an inexpensive and easily comprehensible summary of program changes called the feature signature and investigate its properties. We find a remarkable similarity in the nature of feature signatures across multiple non-trivial programs, developers and magnitudes of changes. This indicates that feature signatures are a meaningful notion worth studying. We then show numerous applications of feature signatures, establishing their utility. 1

A Particle Swarm Model for Tracking Multiple Peaks In a Dynamic . . .

by Daniel Parrott, et al. , 2004
"... A particle swarm optimisation model for tracking multiple peaks in a continuously varying dynamic environment is described. To achieve this, a form of speciation allowing development of parallel subpopulations is used. The model employs a mechanism to encourage simulataneous tracking of multiple pea ..."
Abstract - Cited by 13 (1 self) - Add to MetaCart
A particle swarm optimisation model for tracking multiple peaks in a continuously varying dynamic environment is described. To achieve this, a form of speciation allowing development of parallel subpopulations is used. The model employs a mechanism to encourage simulataneous tracking of multiple peaks by preventing overcrowding at peaks. Possible metrics for evaluating the performance of algorithms in dynamic, multimodal environments are put forward. Results are appraised in terms of the proposed metrics, showing that the technique is capable of tracking multiple peaks and that its performance is enhanced by preventing overcrowding. Directions for further research suggested by these results are put forward.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University