Results 1  10
of
41
Biclustering Algorithms: A Survey
 In Handbook of Computational Molecular Biology Edited by: Aluru S. Chapman & Hall/CRC Computer and Information Science Series
, 2005
"... Analysis of large scale geonomics data, notably gene expression, has initially focused on clustering methods. Recently, biclustering techniques were proposed for revealing submatrices showing unique patterns. We review some of the algorithmic approaches to biclustering and discuss their properties. ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
Analysis of large scale geonomics data, notably gene expression, has initially focused on clustering methods. Recently, biclustering techniques were proposed for revealing submatrices showing unique patterns. We review some of the algorithmic approaches to biclustering and discuss their properties. 1
Going weighted: Parameterized algorithms for cluster editing
 Theoretical Computer Science
"... Abstract. The goal of the Cluster Editing problem is to make the fewest changes to the edge set of an input graph such that the resulting graph is a disjoint union of cliques. This problem is NPcomplete but recently, several parameterized algorithms have been proposed. In this paper we present a su ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
Abstract. The goal of the Cluster Editing problem is to make the fewest changes to the edge set of an input graph such that the resulting graph is a disjoint union of cliques. This problem is NPcomplete but recently, several parameterized algorithms have been proposed. In this paper we present a surprisingly simple branching strategy for Cluster Editing. We generalize the problem assuming that edge insertion and deletion costs are positive integers. We show that the resulting search tree has size O(1.82 k)foreditcostk, resulting in the currently fastest parameterized algorithm for this problem. We have implemented and evaluated our approach, and find that it outperforms other parametrized algorithms for the problem.
Exact algorithms for cluster editing: Evaluation and experiments
 Algorithmica
"... Abstract. We present empirical results for the Cluster Editing problem using exact methods from fixedparameter algorithmics and linear programming. We investigate parameterindependent data reduction methods and find that effective preprocessing is possible if the number of edge modifications k is ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
Abstract. We present empirical results for the Cluster Editing problem using exact methods from fixedparameter algorithmics and linear programming. We investigate parameterindependent data reduction methods and find that effective preprocessing is possible if the number of edge modifications k is smaller than some multiple of V . Inparticular, combining parameterdependent data reduction with lower and upper bounds we can effectively reduce graphs satisfying k ≤ 25 V . In addition to the fastest known fixedparameter branching strategy for the problem, we investigate an integer linear program (ILP) formulation of the problem using a cutting plane approach. Our results indicate that both approaches are capable of solving large graphs with 1000 vertices and several thousand edge modifications. For the first time, complex and very large graphs such as biological instances allow for an exact solution, using a combination of the above techniques. 1
Gene Expression Profile Classification: A Review
 Current Bioinformatics
, 2006
"... Abstract: In this review, we have discussed the classprediction and discovery methods that are applied to gene expression data, along with the implications of the findings. We attempted to present a unified approach that considers both classprediction and classdiscovery. We devoted a substantial ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract: In this review, we have discussed the classprediction and discovery methods that are applied to gene expression data, along with the implications of the findings. We attempted to present a unified approach that considers both classprediction and classdiscovery. We devoted a substantial part of this review to an overview of pattern classification/recognition methods and discussed important issues such as preprocessing of gene expression data, curse of dimensionality, feature extraction/selection, and measuring or estimating classifier performance. We discussed and summarized important properties such as generalizability (sensitivity to overtraining), builtin feature selection, ability to report prediction strength, and transparency (ease of understanding of the operation) of different classpredictor design approaches to provide a quick and concise reference. We have also covered the topic of biclustering, which is an emerging clustering method that processes the entries of the gene expression data matrix in both gene and sample directions simultaneously, in detail. 1.
GraphBased Data Clustering with Overlaps
 TO APPEAR IN DISCRETE OPTIMIZATION,
, 2010
"... We introduce overlap cluster graph modification problems where, other than in most previous work, the clusters of the target graph may overlap. More precisely, the studied graph problems ask for a minimum number of edge modifications such that the resulting graph consists of clusters (that is, maxim ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
We introduce overlap cluster graph modification problems where, other than in most previous work, the clusters of the target graph may overlap. More precisely, the studied graph problems ask for a minimum number of edge modifications such that the resulting graph consists of clusters (that is, maximal cliques) that may overlap up to a certain amount specified by the overlap number s. In the case of svertexoverlap, each vertex may be part of at most s maximal cliques; sedgeoverlap is analogously defined in terms of edges. We provide a complexity dichotomy (polynomialtime solvable versus NPhard) for the underlying edge modification problems, develop forbidden subgraph characterizations of “cluster graphs with overlaps”, and study the parameterized complexity in terms of the number of allowed edge modifications, achieving fixedparameter tractability (in case of constant svalues) and parameterized hardness (in case of unbounded svalues).
Binary Matrix Factorization with Applications
"... An interesting problem in Nonnegative Matrix Factorization (NMF) is to factorize the matrix X which is of some specific class, for example, binary matrix. In this paper, we extend the standard NMF to Binary Matrix Factorization (BMF for short): given a binary matrix X, we want to factorize X into tw ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
An interesting problem in Nonnegative Matrix Factorization (NMF) is to factorize the matrix X which is of some specific class, for example, binary matrix. In this paper, we extend the standard NMF to Binary Matrix Factorization (BMF for short): given a binary matrix X, we want to factorize X into two binary matrices W,H (thus conserving the most important integer property of the objective matrix X) satisfying X ≈ WH. Two algorithms are studied and compared. These methods rely on a fundamental boundedness property of NMF which we propose and prove. This new property also provides a natural normalization scheme that eliminates the bias of factor matrices. Experiments on both synthetic and real world datasets are conducted to show the competency and effectiveness of BMF. 1.
FixedParameter Enumerability of Cluster Editing and Related Problems
 Theory Comp. Systems
"... Cluster Editing is transforming a graph by at most k edge insertions or deletions into a disjoint union of cliques. This problem is fixedparameter tractable (FPT). Here we compute concise enumerations of all minimal solutions in O(2.27 k + k 2 n + m) time. Such enumerations support efficient infere ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Cluster Editing is transforming a graph by at most k edge insertions or deletions into a disjoint union of cliques. This problem is fixedparameter tractable (FPT). Here we compute concise enumerations of all minimal solutions in O(2.27 k + k 2 n + m) time. Such enumerations support efficient inference procedures, but also the optimization of further objectives such as minimizing the number of clusters. In an extended problem version, target graphs may have a limited number of overlaps of cliques, measured by the number t of edges that remain when the twin vertices are merged. This problem is still in FPT, with respect to the combined parameter k and t. The result is based on a property of twinfree graphs. We also give FPT results for problem versions avoiding certain artificial clusterings. Furthermore, we prove that all solutions with minimal edit sequences differ on a socalled full kernel with at most k 2 /4 + O(k) vertices, that can be found in polynomial time. The size bound is tight. We also get a bound for the number of edges in the full kernel, which is optimal up to a (large) constant factor. Numerous open problems are mentioned.
D: COMPACT: A Comparative Package for Clustering Assessment
 In: Lecture Notes in Computer Science. 3759 ed: SpringerVerlag; 2005
, 2005
"... Abstract. There exist numerous algorithms that cluster datapoints from largescale genomic experiments such as sequencing, geneexpression and proteomics. Such algorithms may employ distinct principles, and lead to different performance and results. The appropriate choice of a clustering method is a ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract. There exist numerous algorithms that cluster datapoints from largescale genomic experiments such as sequencing, geneexpression and proteomics. Such algorithms may employ distinct principles, and lead to different performance and results. The appropriate choice of a clustering method is a significant and often overlooked aspect in extracting information from largescale datasets. Evidently, such choice may significantly influence the biological interpretation of the data. We present an easytouse and intuitive tool that compares some clustering methods within the same framework. The interface is named COMPACT for ComparativePackageforClusteringAssessment. COMPACT first reduces the dataset's dimensionality using the Singular Value Decomposition (SVD) method, and only then employs various clustering techniques. Besides its simplicity, and its ability to perform well on highdimensional data, it provides visualization tools for evaluating the results. COMPACT was tested on a variety of datasets, from classical benchmarks to largescale geneexpression experiments. COMPACT is configurable and expendable to newly added algorithms. 1
Clustering gene expression data using graph separators
 IN REVIEW FOR IN SILICO BIOLOGY
, 2007
"... Recent work has used graphs to modelize expression data from microarray experiments, in view of partitioning the genes into clusters. In this paper, we introduce the use of a decomposition by clique separators. Our aim is to improve the classical clustering methods in two ways: first we want to allo ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Recent work has used graphs to modelize expression data from microarray experiments, in view of partitioning the genes into clusters. In this paper, we introduce the use of a decomposition by clique separators. Our aim is to improve the classical clustering methods in two ways: first we want to allow an overlap between clusters, as this seems biologically sound, and second we want to be guided by the structure of the graph to define the number of clusters. We test this approach with a wellknown yeast database (Saccharomyces cerevisiae). Our results are good, as the expression profiles of the clusters we find are very coherent. Moreover, we are able to organize into another graph the clusters we find, and order them in a fashion which turns out to respect the chronological order defined by the the sporulation process.
Generalized graph clustering: recognizing (p, q)cluster graphs
 In Proc. 36th WG, LNCS
, 2010
"... Cluster Editing is a classical graph theoretic approach to tackle the problem of data set clustering: it consists of modifying a similarity graph into a disjoint union of cliques, i.e, clusters. As pointed out in a number of recent papers, the cluster editing model is too rigid to capture common fea ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Cluster Editing is a classical graph theoretic approach to tackle the problem of data set clustering: it consists of modifying a similarity graph into a disjoint union of cliques, i.e, clusters. As pointed out in a number of recent papers, the cluster editing model is too rigid to capture common features of real data sets. Several generalizations have thereby been proposed. In this paper, we introduce (p, q)cluster graphs, where each cluster misses at most p edges to be a clique, and there are at most q edges between a cluster and other clusters. Our generalization is the first one that allows a large number of false positives and negatives in total, while bounding the number of these locally for each cluster by p and q. We show that recognizing (p, q)cluster graphs is NPcomplete when p and q are input. On the positive side, we show that (0, q)cluster, (p, 1)cluster, (p, 2)cluster, and (1, 3)cluster graphs can be recognized in polynomial time. 1