Results 1  10
of
76
Biclustering Algorithms: A Survey
 In Handbook of Computational Molecular Biology Edited by: Aluru S. Chapman & Hall/CRC Computer and Information Science Series
, 2005
"... Analysis of large scale geonomics data, notably gene expression, has initially focused on clustering methods. Recently, biclustering techniques were proposed for revealing submatrices showing unique patterns. We review some of the algorithmic approaches to biclustering and discuss their properties. ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
(Show Context)
Analysis of large scale geonomics data, notably gene expression, has initially focused on clustering methods. Recently, biclustering techniques were proposed for revealing submatrices showing unique patterns. We review some of the algorithmic approaches to biclustering and discuss their properties. 1
Going weighted: Parameterized algorithms for cluster editing
 Theoretical Computer Science
"... Abstract. The goal of the Cluster Editing problem is to make the fewest changes to the edge set of an input graph such that the resulting graph is a disjoint union of cliques. This problem is NPcomplete but recently, several parameterized algorithms have been proposed. In this paper we present a su ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
(Show Context)
Abstract. The goal of the Cluster Editing problem is to make the fewest changes to the edge set of an input graph such that the resulting graph is a disjoint union of cliques. This problem is NPcomplete but recently, several parameterized algorithms have been proposed. In this paper we present a surprisingly simple branching strategy for Cluster Editing. We generalize the problem assuming that edge insertion and deletion costs are positive integers. We show that the resulting search tree has size O(1.82 k)foreditcostk, resulting in the currently fastest parameterized algorithm for this problem. We have implemented and evaluated our approach, and find that it outperforms other parametrized algorithms for the problem.
Exact algorithms for cluster editing: Evaluation and experiments
 Algorithmica
"... Abstract. We present empirical results for the Cluster Editing problem using exact methods from fixedparameter algorithmics and linear programming. We investigate parameterindependent data reduction methods and find that effective preprocessing is possible if the number of edge modifications k is ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We present empirical results for the Cluster Editing problem using exact methods from fixedparameter algorithmics and linear programming. We investigate parameterindependent data reduction methods and find that effective preprocessing is possible if the number of edge modifications k is smaller than some multiple of V . Inparticular, combining parameterdependent data reduction with lower and upper bounds we can effectively reduce graphs satisfying k ≤ 25 V . In addition to the fastest known fixedparameter branching strategy for the problem, we investigate an integer linear program (ILP) formulation of the problem using a cutting plane approach. Our results indicate that both approaches are capable of solving large graphs with 1000 vertices and several thousand edge modifications. For the first time, complex and very large graphs such as biological instances allow for an exact solution, using a combination of the above techniques. 1
Gene Expression Profile Classification: A Review
 Current Bioinformatics
, 2006
"... Abstract: In this review, we have discussed the classprediction and discovery methods that are applied to gene expression data, along with the implications of the findings. We attempted to present a unified approach that considers both classprediction and classdiscovery. We devoted a substantial ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
Abstract: In this review, we have discussed the classprediction and discovery methods that are applied to gene expression data, along with the implications of the findings. We attempted to present a unified approach that considers both classprediction and classdiscovery. We devoted a substantial part of this review to an overview of pattern classification/recognition methods and discussed important issues such as preprocessing of gene expression data, curse of dimensionality, feature extraction/selection, and measuring or estimating classifier performance. We discussed and summarized important properties such as generalizability (sensitivity to overtraining), builtin feature selection, ability to report prediction strength, and transparency (ease of understanding of the operation) of different classpredictor design approaches to provide a quick and concise reference. We have also covered the topic of biclustering, which is an emerging clustering method that processes the entries of the gene expression data matrix in both gene and sample directions simultaneously, in detail. 1.
Binary Matrix Factorization with Applications
"... An interesting problem in Nonnegative Matrix Factorization (NMF) is to factorize the matrix X which is of some specific class, for example, binary matrix. In this paper, we extend the standard NMF to Binary Matrix Factorization (BMF for short): given a binary matrix X, we want to factorize X into tw ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
An interesting problem in Nonnegative Matrix Factorization (NMF) is to factorize the matrix X which is of some specific class, for example, binary matrix. In this paper, we extend the standard NMF to Binary Matrix Factorization (BMF for short): given a binary matrix X, we want to factorize X into two binary matrices W,H (thus conserving the most important integer property of the objective matrix X) satisfying X ≈ WH. Two algorithms are studied and compared. These methods rely on a fundamental boundedness property of NMF which we propose and prove. This new property also provides a natural normalization scheme that eliminates the bias of factor matrices. Experiments on both synthetic and real world datasets are conducted to show the competency and effectiveness of BMF. 1.
GraphBased Data Clustering with Overlaps
 TO APPEAR IN DISCRETE OPTIMIZATION,
, 2010
"... We introduce overlap cluster graph modification problems where, other than in most previous work, the clusters of the target graph may overlap. More precisely, the studied graph problems ask for a minimum number of edge modifications such that the resulting graph consists of clusters (that is, maxim ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
We introduce overlap cluster graph modification problems where, other than in most previous work, the clusters of the target graph may overlap. More precisely, the studied graph problems ask for a minimum number of edge modifications such that the resulting graph consists of clusters (that is, maximal cliques) that may overlap up to a certain amount specified by the overlap number s. In the case of svertexoverlap, each vertex may be part of at most s maximal cliques; sedgeoverlap is analogously defined in terms of edges. We provide a complexity dichotomy (polynomialtime solvable versus NPhard) for the underlying edge modification problems, develop forbidden subgraph characterizations of “cluster graphs with overlaps”, and study the parameterized complexity in terms of the number of allowed edge modifications, achieving fixedparameter tractability (in case of constant svalues) and parameterized hardness (in case of unbounded svalues).
FixedParameter Enumerability of Cluster Editing and Related Problems
 Theory Comp. Systems
"... Cluster Editing is transforming a graph by at most k edge insertions or deletions into a disjoint union of cliques. This problem is fixedparameter tractable (FPT). Here we compute concise enumerations of all minimal solutions in O(2.27 k + k 2 n + m) time. Such enumerations support efficient infere ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Cluster Editing is transforming a graph by at most k edge insertions or deletions into a disjoint union of cliques. This problem is fixedparameter tractable (FPT). Here we compute concise enumerations of all minimal solutions in O(2.27 k + k 2 n + m) time. Such enumerations support efficient inference procedures, but also the optimization of further objectives such as minimizing the number of clusters. In an extended problem version, target graphs may have a limited number of overlaps of cliques, measured by the number t of edges that remain when the twin vertices are merged. This problem is still in FPT, with respect to the combined parameter k and t. The result is based on a property of twinfree graphs. We also give FPT results for problem versions avoiding certain artificial clusterings. Furthermore, we prove that all solutions with minimal edit sequences differ on a socalled full kernel with at most k 2 /4 + O(k) vertices, that can be found in polynomial time. The size bound is tight. We also get a bound for the number of edges in the full kernel, which is optimal up to a (large) constant factor. Numerous open problems are mentioned.
Exploration of HighDimensional Scalar Function for Nuclear Reactor Safety Analysis and Visualization
"... The next generation of methodologies for nuclear reactor Probabilistic Risk Assessment (PRA) explicitly accounts for the time element in modeling the probabilistic system evolution and uses numerical simulation tools to account for possible dependencies between failure events. The MonteCarlo (MC) a ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
The next generation of methodologies for nuclear reactor Probabilistic Risk Assessment (PRA) explicitly accounts for the time element in modeling the probabilistic system evolution and uses numerical simulation tools to account for possible dependencies between failure events. The MonteCarlo (MC) and the Dynamic Event Tree (DET) approaches belong to this new class of dynamic PRA methodologies. A challenge of dynamic PRA algorithms is the large amount of data they produce which may be difficult to visualize and analyze in order to extract useful information. We present a software tool that is designed to address these goals. We model a largescale nuclear simulation dataset as a highdimensional scalar function defined over a discrete sample of the domain. First, we provide structural analysis of such a function at multiple scales and provide insight into the relationship between the input parameters and the output. Second, we enable exploratory analysis for users, where we help the users to differentiate features from noise through multiscale analysis on an interactive platform, based on domain knowledge and data characterization. Our analysis is performed by exploiting the topological and geometric properties of the domain, building statistical models based on its topological segmentations and providing interactive visual interfaces to facilitate such explorations. We provide a user’s guide to our software tool by highlighting its analysis and visualization capabilities, along with a use case involving data from a nuclear reactor safety simulation. Key Words: highdimensional data analysis, computational topology, nuclear reactor safety analysis, visualization
Robust biclustering algorithm (roba) for dna microarray data nalysis
 Proceedings of IEEE Workshop on Statistical Signal Processing
, 2005
"... Recently, biclustering algorithms have been used to extract useful information from large sets of DNA microarray experimental data. They refer to a distinct class of clustering algorithms that perform simultaneous rowcolumn clustering. The goal is to find submatrices, that is, subgroups of genes an ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Recently, biclustering algorithms have been used to extract useful information from large sets of DNA microarray experimental data. They refer to a distinct class of clustering algorithms that perform simultaneous rowcolumn clustering. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. Almost all of the methods proposed in the literature search for one or two types of bicluster among four. Also, most of the proposed methods rely on solving an optimization problem. Therefore, the method is dependant on the optimally criterion which most of the time, is likely to miss some significant biclusters. In this study, we develop a Robust Biclustering Algorithm to address the two issues mentioned above. The proposed algorithm is simple because it uses basic linear algebra and arithmetic tools and there is no need to solve an optimization problem. 1.
D: COMPACT: A Comparative Package for Clustering Assessment
 In: Lecture Notes in Computer Science. 3759 ed: SpringerVerlag; 2005
, 2005
"... Abstract. There exist numerous algorithms that cluster datapoints from largescale genomic experiments such as sequencing, geneexpression and proteomics. Such algorithms may employ distinct principles, and lead to different performance and results. The appropriate choice of a clustering method is a ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract. There exist numerous algorithms that cluster datapoints from largescale genomic experiments such as sequencing, geneexpression and proteomics. Such algorithms may employ distinct principles, and lead to different performance and results. The appropriate choice of a clustering method is a significant and often overlooked aspect in extracting information from largescale datasets. Evidently, such choice may significantly influence the biological interpretation of the data. We present an easytouse and intuitive tool that compares some clustering methods within the same framework. The interface is named COMPACT for ComparativePackageforClusteringAssessment. COMPACT first reduces the dataset's dimensionality using the Singular Value Decomposition (SVD) method, and only then employs various clustering techniques. Besides its simplicity, and its ability to perform well on highdimensional data, it provides visualization tools for evaluating the results. COMPACT was tested on a variety of datasets, from classical benchmarks to largescale geneexpression experiments. COMPACT is configurable and expendable to newly added algorithms. 1