• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

T (2003) Consensus Clustering: A resampling-based method for class discovery and visualization of gene expression microarray data (0)

by S Monti, P Tamayo, J Mesirov, Golub
Venue:Machine Learning Journal
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 62
Next 10 →

Self-organizing map-based discovery and visualization of human endogenous retroviral sequence groups

by Merja Oja, Göran O. Sperber, Jonas Blomberg, Samuel Kaski Selforganizing, Of Human, Endogenous Retroviral, Sequence Groups, Göran O. Sperber, Jonas Blomberg, Samuel Kaski - International Journal of Neural Systems , 2005
"... map-based discovery and visualization of human endogenous ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
map-based discovery and visualization of human endogenous

Learning states and rules for detecting anomalies in time series

by Stan Salvador, Philip Chan - Applied Intelligence
"... The normal operation of a device can be characterized in different temporal states. To identify these states, we introduce a segmentation algorithm called Gecko that can determine a reasonable number of segments using our proposed L method. We then use the RIPPER classification algorithm to describe ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
The normal operation of a device can be characterized in different temporal states. To identify these states, we introduce a segmentation algorithm called Gecko that can determine a reasonable number of segments using our proposed L method. We then use the RIPPER classification algorithm to describe these states in logical rules. Finally, transitional logic between the states is added to create a finite state automaton. Our empirical results, on data obtained from the NASA shuttle program, indicate that the Gecko segmentation algorithm is comparable to a human expert in identifying states, and our L method performs better than the existing permutation tests method when determining the number of segments to return in segmentation algorithms. Empirical results have also shown that our overall system can track normal behavior and detect anomalies.

Learning States for Detecting Anomalies in Time Series

by Stan Weidner Salvador, Stan Weidner Salvador, Philip K. Chan, Ph. D, Georgios C. Anagnostopoulos, Ph. D, William D. Shoaff, Ph. D, Stan Weidner Salvador, Philip K. Chan, Ph. D , 2004
"... ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract not found

Correlation Clustering for Learning Mixtures of Canonical Correlation Models

by X. Z. Fern , C. E. Brodley, M. A. Friedl
"... This paper addresses the task of analyzing the correlation between two related domains X and Y . Our research is motivated by an Earth Science task that studies the relationship between vegetation and precipitation. A standard statistical technique for such problems is Canonical Correlation Analysis ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
This paper addresses the task of analyzing the correlation between two related domains X and Y . Our research is motivated by an Earth Science task that studies the relationship between vegetation and precipitation. A standard statistical technique for such problems is Canonical Correlation Analysis (CCA). A critical limitation of CCA is that it can only detect linear correlation between the two domains that is globally valid throughout both data sets. Our approach addresses this limitation by constructing a mixture of local linear CCA models through a process we name correlation clustering. In correlation clustering, both data sets are clustered simultaneously according to the data's correlation structure such that, within a cluster, domain X and domain Y are linearly correlated in the same way. Each cluster is then analyzed using the traditional CCA to construct local linear correlation models. We present results on both artificial data sets and Earth Science data sets to demonstrate that the proposed approach can detect useful correlation patterns, which traditional CCA fails to discover.

Bayesian cluster ensembles

by Hongjun Wang, Hanhuai Shan, Arindam Banerjee - In Proceedings of the 9th SIAM International Conference on Data Mining , 2009
"... Cluster ensembles provide a framework for combining multiple base clusterings of a dataset to generate a stable and robust consensus clustering. There are important variants of the basic cluster ensemble problem, notably including cluster ensembles with missing values, as well as row-distributed or ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Cluster ensembles provide a framework for combining multiple base clusterings of a dataset to generate a stable and robust consensus clustering. There are important variants of the basic cluster ensemble problem, notably including cluster ensembles with missing values, as well as row-distributed or column-distributed cluster ensembles. Existing cluster ensemble algorithms are applicable only to a small subset of these variants. In this paper, we propose Bayesian Cluster Ensembles (BCE), which is a mixed-membership model for learning cluster ensembles, and is applicable to all the primary variants of the problem. We propose two methods, respectively based on variational approximation and Gibbs sampling, for learning a Bayesian cluster ensemble. We compare BCE extensively with several other cluster ensemble algorithms, and demonstrate that BCE is not only versatile in terms of its applicability, but also outperforms the other algorithms in terms of stability and accuracy. 1

Uncovering Groups via Heterogeneous Interaction Analysis

by Lei Tang, Xufei Wang, Huan Liu
"... Abstract—With the pervasive availability of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment shared content (bookmark, photos, videos), and users can ..."
Abstract - Cited by 3 (3 self) - Add to MetaCart
Abstract—With the pervasive availability of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment shared content (bookmark, photos, videos), and users can tag their own favorite content. Users can also connect to each other, and subscribe to or become a fan or a follower of others. These diverse individual activities result in a multi-dimensional network among actors, forming cross-dimension group structures with group members sharing certain similarities. It is challenging to effectively integrate the network information of multiple dimensions in order to discover cross-dimension group structures. In this work, we propose a two-phase strategy to identify the hidden structures shared across dimensions in multi-dimensional networks. We extract structural features from each dimension of the network via modularity analysis, and then integrate them all to find out a robust community structure among actors. Experiments on synthetic and realworld data validate the superiority of our strategy, enabling the analysis of collective behavior underneath diverse individual activities in a large scale.

Exploring Biological Network Dynamics with Ensembles of Graph Partitions

by Saket Navlakha, Carl Kingsford - In Proceedings of the PSB Pacific Symposium on Biocomputing , 2010
"... Unveiling the modular structure of biological networks can reveal important organizational patterns in the cell. Many graph partitioning algorithms have been proposed towards this end. However, most approaches only consider a single, optimal decomposition of the network. In this work, we make use of ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
Unveiling the modular structure of biological networks can reveal important organizational patterns in the cell. Many graph partitioning algorithms have been proposed towards this end. However, most approaches only consider a single, optimal decomposition of the network. In this work, we make use of the multitude of near-optimal clusterings in order to explore the dynamics of network clusterings and how those dynamics relate to the structure of the underlying network. We recast the modularity optimization problem as an integer linear program with diversity constraints. These constraints produce an ensemble of dissimilar but still highly modular clusterings. We apply our approach to four social and biological networks and show how optimal and near-optimal solutions can be used in conjunction to identify deeper community structure in the network, including inter-community dynamics, communities that are especially resilient to change, and core-and-peripheral community members. 1.

Uncovering Cross-Dimension Group Structures in Multi-Dimensional Networks

by Lei Tang, Huan Liu
"... With the proliferation of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment sharing content (bookmark, photos, videos), and users can tag her own favor ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
With the proliferation of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment sharing content (bookmark, photos, videos), and users can tag her own favorite content. Users can also connect to friends, and subscribe to or become a fan of other users. These diverse individual activities result in a multi-dimensional network among actors, forming cross-dimension group structures with group members focusing on similar topics. It is challenging to effectively integrate the network information of multiple dimensions to find out the cross-dimension group structure. In this work, we propose a two-phase strategy to identify the hidden structures shared across dimensions in multi-dimensional networks. We extract structural features from each dimension of the network via modularity analysis, and then integrate them to find out a robust community structure among actors. Experiments on synthetic and real-world data validate the superiority of our strategy, enabling the analysis of collective behavior underneath diverse individual activities in a large scale. 1

BioMed Central

by Bmc Bioinformatics, Ling Qin, Yixin Chen, Yi Pan, Ling Chen, Open Access , 2006
"... A novel approach to phylogenetic tree construction using stochastic optimization and clustering ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
A novel approach to phylogenetic tree construction using stochastic optimization and clustering

Annotation-based Distance Measures for Patient Subgroup Discovery in Clinical Microarray Studies

by Claudio Lottaz, Joern Toedling, Rainer Spang
"... Motivation: Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering, however, can also stratify patients by similarity of their gene expression profiles, thereby defining novel disease enti ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Motivation: Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering, however, can also stratify patients by similarity of their gene expression profiles, thereby defining novel disease entities based on molecular characteristics. Several distance-based cluster algorithms have been suggested, but little attention has been given to the distance measure between patients. Even with the Euclidean metric, including and excluding genes from the analysis leads to different distances between the same objects, and consequently different clustering results. Results: We describe a new clustering algorithm, in which gene selection is used to derive biologically meaningful clusterings of samples by combining expression profiles and functional annotation data. According to gene annotations, candidate gene sets with specific functional characterizations are generated. Each set defines a different distance measure between patients, leading to different clusterings. These clusterings are filtered using a resampling based significance measure. Significant clusterings are reported together with the underlying gene sets and their functional definition. Conclusions: Our method reports clusterings defined by biologically focused sets of genes. In annotation driven clusterings, we have recovered clinically relevant patient subgroups through biologically plausible sets of genes, as well as new subgroupings. We conjecture that our method has the potential to reveal so far unknown, clinically relevant classes of patients in an unsupervised manner. Availability: We provide the R package adSplit as part of Bioconductor release 1.9 and on
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University