Results 1 - 10
of
80
Maximum margin clustering
- Advances in Neural Information Processing Systems 17
, 2005
"... We propose a new method for clustering based on finding maximum margin hyperplanes through data. By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. Although this still yields a difficult computational problem, the ha ..."
Abstract
-
Cited by 49 (3 self)
- Add to MetaCart
We propose a new method for clustering based on finding maximum margin hyperplanes through data. By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. Although this still yields a difficult computational problem, the hard-clustering constraints can be relaxed to a soft-clustering formulation which can be feasibly solved with a semidefinite program. Since our clustering technique only depends on the data through the kernel matrix, we can easily achieve nonlinear clusterings in the same manner as spectral clustering. Experimental results show that our maximum margin clustering technique often obtains more accurate results than conventional clustering methods. The real benefit of our approach, however, is that it leads naturally to a semi-supervised training method for support vector machines. By maximizing the margin simultaneously on labeled and unlabeled training data, we achieve state of the art performance by using a single, integrated learning principle. 1
A mixture model of clustering ensembles
- Proc. SIAM Intl. Conf. on Data Mining
, 2004
"... Clustering ensembles have emerged as a powerful method for improving both the robustness and the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial or stati ..."
Abstract
-
Cited by 39 (4 self)
- Add to MetaCart
Clustering ensembles have emerged as a powerful method for improving both the robustness and the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial or statistical perspectives. We offer a probabilistic model of consensus using a finite mixture of multinomial distributions in a space of clusterings. A combined partition is found as a solution to the corresponding maximum likelihood problem using the EM algorithm. The excellent scalability of this algorithm and comprehensible underlying model are particularly important for clustering of large datasets. This study compares the performance of the EM consensus algorithm with other fusion approaches for clustering ensembles. We also analyze clustering ensembles with incomplete information and the effect of missing cluster labels on the quality of overall consensus. Experimental results demonstrate the effectiveness of the proposed method on large real-world datasets.
A Tree Kernel to Analyze Phylogenetic Profiles
, 2002
"... Motivation: The phylogenetic profile of a protein is a string that encodes the presence or absence of the protein in every fully sequenced genome. Because proteins that participate in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion, the phylogenetic prof ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
Motivation: The phylogenetic profile of a protein is a string that encodes the presence or absence of the protein in every fully sequenced genome. Because proteins that participate in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion, the phylogenetic profiles of such proteins are often "similar" or at least "related" to each other. The question we address in this paper is the following: how to measure the "similarity" between two profiles, in an evolutionarily relevant way, in order to develop efficient function prediction methods? Results: We show how the profiles can be mapped to a high-dimensional vector space which incorporates evolutionarily relevant information, and we provide an algorithm to compute efficiently the inner product in that space, which we call the tree kernel. The tree kernel can be used by any kernel-based analysis method for classification or data mining of phylogenetic profiles.
Kernels and Distances for Structured Data
- Machine Learning
, 2004
"... This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure of the data, as defined by its type signature in a higher-order logic. Our main theo ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure of the data, as defined by its type signature in a higher-order logic. Our main theoretical result is the positive definiteness of any kernel thus defined. We report encouraging experimental results on a range of real-world datasets. By converting our kernel to a distance pseudo-metric for 1-nearest neighbour, we were able to improve the best accuracy from the literature on the Diterpene dataset by more than 10%.
Graph-Driven Features Extraction From Microarray Data
, 2002
"... Gene function prediction from microarray data is a first step toward better understanding the machinery of the cell from relatively cheap and easy-to-produce data. In this paper we investigate whether the knowledge of many metabolic pathways and their catalyzing enzymes accumulated over the years ca ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
Gene function prediction from microarray data is a first step toward better understanding the machinery of the cell from relatively cheap and easy-to-produce data. In this paper we investigate whether the knowledge of many metabolic pathways and their catalyzing enzymes accumulated over the years can help improve the performance of classifiers for this problem.
On Bregman Voronoi Diagrams
- in "Proc. 18th ACM-SIAM Sympos. Discrete Algorithms
, 2007
"... The Voronoi diagram of a point set is a fundamental geometric structure that partitions the space into elementary regions of influence defining a discrete proximity graph and dually a well-shaped Delaunay triangulation. In this paper, we investigate a framework for defining and building the Voronoi ..."
Abstract
-
Cited by 31 (16 self)
- Add to MetaCart
The Voronoi diagram of a point set is a fundamental geometric structure that partitions the space into elementary regions of influence defining a discrete proximity graph and dually a well-shaped Delaunay triangulation. In this paper, we investigate a framework for defining and building the Voronoi diagrams for a broad class of distortion measures called Bregman divergences, that includes not only the traditional (squared) Euclidean distance, but also various divergence measures based on entropic functions. As a by-product, Bregman Voronoi diagrams allow one to define information-theoretic Voronoi diagrams in statistical parametric spaces based on the relative entropy of distributions. We show that for a given Bregman divergence, one can define several types of Voronoi diagrams related to each other
Approximate Minimum Enclosing Balls in High Dimensions Using Core-Sets
, 2003
"... this paper can be downloaded from http://www.compgeom.com/meb/. P. Kumar and J. Mitchell are partially supported by a grant from the National Science Foundation (CCR0098172) . J. Mitchell is also partially supported by grants from the Honda Fundamental Research Labs, Metron Aviation, NASAAmes Resear ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
this paper can be downloaded from http://www.compgeom.com/meb/. P. Kumar and J. Mitchell are partially supported by a grant from the National Science Foundation (CCR0098172) . J. Mitchell is also partially supported by grants from the Honda Fundamental Research Labs, Metron Aviation, NASAAmes Research (NAG2-1325), and the US-Israel Binational Science Foundation. E. A. Yldrm is partially supported by an NSF CAREER award (DMI-0237415)
Clustering ensembles: Models of consensus and weak partitions
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2005
"... Clustering ensembles have emerged as a powerful method for improving both the robustness as well as the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Clustering ensembles have emerged as a powerful method for improving both the robustness as well as the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial or statistical perspectives. This study extends previous research on clustering ensembles in several respects. First, we introduce a unified representation for multiple clusterings and formulate the corresponding categorical clustering problem. Second, we propose a probabilistic model of consensus using a finite mixture of multinomial distributions in a space of clusterings. A combined partition is found as a solution to the corresponding maximum likelihood problem using the EM algorithm. Third, we define a new consensus function that is related to the classical intra-class variance criterion using the generalized mutual information definition. Finally, we demonstrate the efficacy of combining partitions generated by weak clustering algorithms that use data projections and random data splits. A simple explanatory model is offered for the behavior of combinations of such weak clustering components. Combination accuracy is analyzed as a function of several parameters that control the power and resolution of component partitions as well as the number of partitions. We also analyze clustering ensembles with incomplete information and the effect of missing cluster labels on the quality of overall consensus. Experimental results demonstrate the effectiveness of the proposed methods on several real-world datasets.
Kernel-based methods for hyperspectral image classification
- IEEE Transactions on Geoscience and Remote Sensing
, 2005
"... Abstract—This paper presents the framework of kernel-based methods in the context of hyperspectral image classification, illustrating from a general viewpoint the main characteristics of different kernel-based approaches and analyzing their properties in the hyperspectral domain. In particular, we a ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
Abstract—This paper presents the framework of kernel-based methods in the context of hyperspectral image classification, illustrating from a general viewpoint the main characteristics of different kernel-based approaches and analyzing their properties in the hyperspectral domain. In particular, we assess performance of regularized radial basis function neural networks (Reg-RBFNN), standard support vector machines (SVMs), kernel Fisher discriminant (KFD) analysis, and regularized AdaBoost (Reg-AB). The novelty of this work consists in: 1) introducing Reg-RBFNN and Reg-AB for hyperspectral image classification; 2) comparing kernel-based methods by taking into account the peculiarities of hyperspectral images; and 3) clarifying their theoretical relationships. To these purposes, we focus on the accuracy of methods when working in noisy environments, high input dimension, and limited training sets. In addition, some other important issues are discussed, such as the sparsity of the solutions, the computational burden, and the capability of the methods to provide outputs that can be directly interpreted as probabilities. Index Terms—AdaBoost, feature space, hyperspectral classification, kernel-based methods, kernel Fisher discriminant analysis, radial basis function neural networks, regularization, support vector machines. I.
A Needle in a Haystack: Local One-Class Optimization
- In In Proc. ICML
, 2004
"... This paper addresses the problem of finding a small and coherent subset of points in a given data. This problem, sometimes referred to as one-class or set covering, requires to find a small-radius ball that covers as many data points as possible. It rises naturally in a wide range of applicati ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
This paper addresses the problem of finding a small and coherent subset of points in a given data. This problem, sometimes referred to as one-class or set covering, requires to find a small-radius ball that covers as many data points as possible. It rises naturally in a wide range of applications, from finding gene-modules to extracting documents' topics, where many data points are irrelevant to the task at hand, or in applications where only positive examples are available. Most previous approaches to this problem focus on identifying and discarding a possible set of outliers. In this paper we adopt an opposite approach which directly aims to find a small set of coherently structured regions, by using a loss function that focuses on local properties of the data. We formalize the learning task as an optimization problem using the InformationBottleneck principle. An algorithm to solve this optimization problem is then derived and analyzed.

