Results 1 -
9 of
9
Multiway Spectral Clustering: A Margin-based Perspective
, 2008
"... Spectral clustering is a broad class of clustering procedures in which an intractable combinatorial optimization formulation of clustering is “relaxed ” into a tractable eigenvector problem, and in which the relaxed solution is subsequently “rounded ” into an approximate discrete solution to the ori ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Spectral clustering is a broad class of clustering procedures in which an intractable combinatorial optimization formulation of clustering is “relaxed ” into a tractable eigenvector problem, and in which the relaxed solution is subsequently “rounded ” into an approximate discrete solution to the original problem. In this paper we present a novel margin-based perspective on multiway spectral clustering. We show that the margin-based perspective illuminates both the relaxation and rounding aspects of spectral clustering, providing a unified analysis of existing algorithms and guiding the design of new algorithms. We also present connections between spectral clustering and several other topics in statistics, specifically minimum-variance clustering, Procrustes analysis and Gaussian intrinsic autoregression.
Abstract Semi-supervised Learning of a Markovian Metric
"... The role of a distance metric in many supervised and semi-supervised learning applications is central in the success of clustering algorithms. Since existing metrics like Euclidean do not necessarily reflect the true structure (clusters or manifolds) in the data, it becomes imperative that an approp ..."
Abstract
- Add to MetaCart
The role of a distance metric in many supervised and semi-supervised learning applications is central in the success of clustering algorithms. Since existing metrics like Euclidean do not necessarily reflect the true structure (clusters or manifolds) in the data, it becomes imperative that an appropriate metric be somehow learned from training or labeled data. Metric learning has been a relatively new topic in data mining and machine learning, though most work that deals with this topic learns a suitable linear transformation of the original data. This transformation is usually learned using training data and has been shown to improve test data classification accuracy. In this paper we present a Markov random walk based semi-supervised method for metric learning. Our method differs from the aforementioned techniques in that we use minimal labeled data and we do not assume any Mahalanobis type metric structure on the data. We create a computationally efficient nearest neighbor graph representation of the data and pose a semidefinite program that learns the random walk on the associated graph. This is used to generate a distance measure between all unlabeled points and the performance is compared against other important metrics using the k-NN classification rule.
unknown title
, 2007
"... This toolbox is an extension to the SpectraLib package written by Deepak Verma. It mainly consists of two parts: ..."
Abstract
- Add to MetaCart
This toolbox is an extension to the SpectraLib package written by Deepak Verma. It mainly consists of two parts:
F.1. OVERVIEW
"... The overall goal of the Abstraction Based Complexity Management seedling was to research existing drivers of complexity in aerospace systems, and notionalize a new design paradigm that could significantly reduce the cost and schedule associated with creating these systems. The fundamental areas of r ..."
Abstract
- Add to MetaCart
The overall goal of the Abstraction Based Complexity Management seedling was to research existing drivers of complexity in aerospace systems, and notionalize a new design paradigm that could significantly reduce the cost and schedule associated with creating these systems. The fundamental areas of research considered as part of this activity included • Develop a measure of complexity that utilizes available systems parameters. • Assess the uncertainty of complex hybrid systems and the relationship to complexity. • Define an abstraction-based design method to provide formalism to the definition of the system. • Determine a method of architecture synthesis that can be used to explore the complete design space available during early conceptual design. • Assess analytical methods available to identify weakly connected areas in a complex system for the purpose of clustering. Each of these topics is discussed in detail in the the following sections of this Appendix. Each section comprises a draft of a stand-alone paper with the intention that each will be presented at an appropriate future industry conference. A brief discussion of the major topics of the seedling
Pacific Symposium on Biocomputing 15:444-455(2010) CLUSTERING CONTEXT-SPECIFIC GENE REGULATORY NETWORKS ∗
"... Gene regulatory networks (GRNs) learned from high throughput genomic data are often hard to visualize due to the large number of nodes and edges involved, rendering them difficult to appreciate. This becomes an important issue when modular structures are inherent in the inferred networks, such as in ..."
Abstract
- Add to MetaCart
Gene regulatory networks (GRNs) learned from high throughput genomic data are often hard to visualize due to the large number of nodes and edges involved, rendering them difficult to appreciate. This becomes an important issue when modular structures are inherent in the inferred networks, such as in the recently proposed contextspecific GRNs. 12 In this study, we investigate the application of graph clustering techniques to discern modularity in such highly complex graphs, focusing on context-specific GRNs. Identified modules are then associated with a subset of samples and the key pathways enriched in the module. Specifically, we study the use of Markov clustering and spectral clustering on cancer datasets to yield evidence on the possible association amongst different tumor types. Two sets of gene expression profiling data were analyzed to reveal context-specificity as well as modularity in genomic regulations.
Detecting Commmunities via Simultaneous Clustering of Graphs and Folksonomies
"... Abstract. We present a simple technique for detecting communities by utilizing both the link structure and folksonomy (or tag) information. A simple way to describe our approach is by defining a community as a set of nodes in a graph that link more frequently within this set than outside it and they ..."
Abstract
- Add to MetaCart
Abstract. We present a simple technique for detecting communities by utilizing both the link structure and folksonomy (or tag) information. A simple way to describe our approach is by defining a community as a set of nodes in a graph that link more frequently within this set than outside it and they share similar tags. Our technique is based on the Normalized Cut (NCut) algorithm and can be easily and efficiently implemented. We validate our method by using a real network of blogs and tag information obtained from a social bookmarking site. We also verify our results on a citation network for which we have access to ground truth cluster information. Our method, Simultaneous Cut (SimCut), has the advantage that it can group related tags and cluster the nodes simultaneously. 1
Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplacian-type Operators
"... This paper considers the problem of embedding directed graphs in Euclidean space while retaining directional information. We model the observed graph as a sample from a manifold endowed with a vector field, and we design an algorithm that separates and recovers the features of this process: the geom ..."
Abstract
- Add to MetaCart
This paper considers the problem of embedding directed graphs in Euclidean space while retaining directional information. We model the observed graph as a sample from a manifold endowed with a vector field, and we design an algorithm that separates and recovers the features of this process: the geometry of the manifold, the data density and the vector field. The algorithm is motivated by our analysis of Laplacian-type operators and their continuous limit as generators of diffusions on a manifold. We illustrate the recovery algorithm on both artificially constructed and real data. 1
ISOTROPY CRITERIA AND ALGORITHMS FOR DATA CLUSTERING
, 2011
"... Given a set of points, the goal of data clustering is to group them into clusters, such that the internal homogeneity of points within each cluster contrasts to inter-cluster heterogeneity. Over the last fifty years, many methods for data clustering have been developed in diverse scientific communit ..."
Abstract
- Add to MetaCart
Given a set of points, the goal of data clustering is to group them into clusters, such that the internal homogeneity of points within each cluster contrasts to inter-cluster heterogeneity. Over the last fifty years, many methods for data clustering have been developed in diverse scientific communities. However, many of these methods suffer from several shortcomings, and are unable to handle the rich diversity of cluster structures that are usually present in data. We develop an unsupervised, nonparametric approach to data clustering that addresses these shortcomings. Our goal is to build on the strengths of these methods, while simultaneously offering innovative solutions to their limitations. In our cluster model, clusters are seen as groups of points, with overlapping neighborhoods, that have similar spatial structures that are in contrast with their surroundings. We use the isotropy of a point distribution to characterize spatial structure. We argue that identifying the isotropic density neighborhoods of a point, helps in the detection of a diversity of cluster structures that are challenging to many other methods. We develop three different criteria for identifying neighborhoods with isotropic density. The first criterion is based on examining properties of one-dimensional projections in a hyperspherical neighborhood with uniform point distribution. The second and third criteria are based on the analysis of the force

