Results 1  10
of
42
QUBIC: a qualitative biclustering algorithm for analyses of gene expression data
, 2009
"... ..."
(Show Context)
A New Geometric Biclustering Algorithm based on the Hough Transform for Analysis of LargeScale Microarray Data
, 2008
"... Biclustering is an important tool in microarray analysis when only a subset of genes coregulates in a subset of conditions. Different from standard clustering analyses, biclustering performs simultaneous classification in both gene and condition directions in a microarray data matrix. However, the ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Biclustering is an important tool in microarray analysis when only a subset of genes coregulates in a subset of conditions. Different from standard clustering analyses, biclustering performs simultaneous classification in both gene and condition directions in a microarray data matrix. However, the biclustering problem is inherently intractable and computationally complex. In this paper, we present a new biclustering algorithm based on the geometrical viewpoint of coherent gene expression profiles. In this method, we perform pattern identification based on the Hough transform in a columnpair space. The algorithm is especially suitable for the biclustering analysis of largescale microarray data. Our studies show that the approach can discover significant biclusters with respect to the increased noise level and regulatory complexity. Furthermore, we also test the ability of our method to locate biologically verifiable biclusters within an annotated set of genes.
cHawk: An Efficient Biclustering Algorithm based on Bipartite Graph Crossing Minimization
"... Biclustering is a very useful data mining technique for gene expression analysis and profiling. It helps identify patterns where different genes are corelated based on a subset of conditions. Bipartite Spectral partitioning is a powerful technique to achieve biclustering but its computation complex ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Biclustering is a very useful data mining technique for gene expression analysis and profiling. It helps identify patterns where different genes are corelated based on a subset of conditions. Bipartite Spectral partitioning is a powerful technique to achieve biclustering but its computation complexity is prohibitive for applications dealing with large input data. We provide a connection between spectral partitioning and crossing minimization which is amenable to efficient implementations. Theoretical construction of Biclustering model based on crossing minimization is provided. Based on this model, an efficient biclustering algorithm, which is termed as cHawk, is developed. We have evaluated cHawk on both synthetic and real data sets. We show that cHawk is able to identify, with good accuracy, constant, coherent and overlapped biclusters amid noise. Moreover, its execution time grows linearly with input data size. 2.
Linear Coherent Bicluster Discovery via Line Detection and Sample Majority Voting
"... Abstract. Discovering groups of genes that share common expression profiles is an important problem in DNA microarray analysis. Unfortunately, standard biclustering algorithms often fail to retrieve common expression groups because (1) genes only exhibit similar behaviors over a subset of condition ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Discovering groups of genes that share common expression profiles is an important problem in DNA microarray analysis. Unfortunately, standard biclustering algorithms often fail to retrieve common expression groups because (1) genes only exhibit similar behaviors over a subset of conditions, and (2) genes may participate in more than one functional process and therefore belong to multiple groups. Many algorithms have been proposed to address these problems in the past decade; however, in addition to the above challenges most such algorithms are unable to discover linear coherent biclusters—a strict generalization of additive and multiplicative biclustering models. In this paper, we propose a novel biclustering algorithm that discovers linear coherent biclusters, based on first detecting linear correlations between pairs of gene expression profiles, then identifying groups by sample majority voting. Our experimental results on both synthetic and two real datasets, Saccharomyces cerevisiae and Arabidopsis thaliana, show significant performance improvements over previous methods. One intriguing aspect of our approach is that it can easily be extended to identify biclusters of more complex genegene correlations. 1
Methods A novel nonoverlapping biclustering algorithm for network generation using Living Cell Array data
"... Motivation: The living cell array quantifies the contribution of activated transcription factors upon the expression levels of their target genes. The direct manipulation of the regulatory mechanisms offers enormous possibilities for deciphering the machinery that activates and controls gene express ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Motivation: The living cell array quantifies the contribution of activated transcription factors upon the expression levels of their target genes. The direct manipulation of the regulatory mechanisms offers enormous possibilities for deciphering the machinery that activates and controls gene expression. We propose a novel biclustering algorithm for generating nonoverlapping clusters of reporter genes and conditions and demonstrate how this information can be interpreted in order to assist in the construction of transcription factor interaction networks. 1
BicAT_Plus: An Automatic Comparative Tool For Bi/Clustering of Gene Expression Data Obtained Using Microarrays
"... In the last few years the gene expressi on microarray technology has become a central tool in the fi eld of functional genomics in which the expression levels of thousands of gene s in a biological sample are determ ined in a single exper iment. Several clustering and biclustering methods have been ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
In the last few years the gene expressi on microarray technology has become a central tool in the fi eld of functional genomics in which the expression levels of thousands of gene s in a biological sample are determ ined in a single exper iment. Several clustering and biclustering methods have been introduced to analyze the gene expression data by identifying the similar pattern s and group ing gene s into subsets that share biological significance. However, it is not clear how the different methods compare with each other with respect to the biolog ical relevance of the biclusters and clusters as well as with other characteristics such as robustness and predictabil ity. This research describes the development of an autom atic comparative tool called BieAT plus that was designed to help researche rs in evaluating the result s of different bi/c1ustering methods, compare the results against each others and allow viewing the comparison results via convenient graphical displays. BieAT plus incorporates a reasonabl e biologi cal comparat ive methodology based on the enrichment of the output bi/c1usters with gene ontology functional categories. No exact algorithm can be considere d the optimum one. Instead, bi/clu stering algor ithms can be used as integrated techniques to highlight the most enr iched biclusters that help biolog ists to draw biological predict ion about the unkno wn genes. 1.
Mining Subspace Clusters from DNA Microarray Data Using Large Itemset Techniques
"... Mining subspace clusters from the DNA microarrays could help researchers identify those genes which commonly contribute to a disease, where a subspace cluster indicates a subset of genes whose expression levels are similar under a subset of conditions. Since in a DNA microarray, the number of genes ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Mining subspace clusters from the DNA microarrays could help researchers identify those genes which commonly contribute to a disease, where a subspace cluster indicates a subset of genes whose expression levels are similar under a subset of conditions. Since in a DNA microarray, the number of genes is far larger than the number of conditions, those previous proposed algorithms which compute the maximum dimension sets (MDSs) for any two genes will take a long time to mine subspace clusters. In this paper, we propose the Large ItemsetBased Clustering (LISC) algorithm for mining subspace clusters. Instead of constructing MDSs for any two genes, we construct only MDSs for any two conditions. Then, we transform the task of finding the maximal possible gene sets into the problem of mining large itemsets from the conditionpair MDSs. Since we are only interested in those subspace clusters with gene sets as large as possible, it is desirable to pay attention to those gene sets which have reasonable large support values in the conditionpair MDSs. From our simulation results, we show that the proposed algorithm needs shorter processing time than those previous proposed algorithms which need to construct genepair MDSs.
Validation Measures of Bicluster Solutions
, 2009
"... Abstract. Biclustering is a method to extract subsets of objects and features from a dataset which are characterized in some way. In contrast to traditional clustering algorithms which group objects similar in a whole feature set, biclustering methods find groups of objects which have similar values ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Biclustering is a method to extract subsets of objects and features from a dataset which are characterized in some way. In contrast to traditional clustering algorithms which group objects similar in a whole feature set, biclustering methods find groups of objects which have similar values or patterns in some features. Both in clustering and biclustering, validating how much the result is informative or reliable is a very important task. Whereas validation methods of cluster solutions have been studied actively, there are only few measures to validate bicluster solutions. Furthermore, the existing validation methods of bicluster solutions have some critical problems to be used in general cases. In this paper, we review several wellknown validation measures for cluster and bicluster solutions and discuss their limitations. Then, we propose several improved validation indices as modified versions of existing ones.
Sparse Learning based Linear Coherent Biclustering
"... Abstract. Clustering algorithms are often limited by an assumption that each data point belongs to a single class, and furthermore that all features of a data point are relevant to class determination. Such assumptions are inappropriate in applications such as gene clustering, where, given expressio ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Clustering algorithms are often limited by an assumption that each data point belongs to a single class, and furthermore that all features of a data point are relevant to class determination. Such assumptions are inappropriate in applications such as gene clustering, where, given expression profile data, genes may exhibit similar behaviors only under some, but not all conditions, and genes may participate in more than one functional process and hence belong to multiple groups. Identifying genes that have similar expression patterns in a common subset of conditions is a central problem in gene expression microarray analysis. To overcome the limitations of standard clustering methods for this purpose, Biclustering has often been proposed as an alternative approach, where one seeks groups of observations that exhibit similar patterns over a subset of the features. In this paper, we propose a new biclustering algorithm for identifying linearcoherent biclusters in gene expression data, strictly generalizing the type of bicluster structure considered by other methods. Our algorithm is based on recent sparse learning techniques that have gained significant attention in the machine learning research community. In this work, we propose a novel sparse learning based model, SLLB, for solving the linear coherent biclustering problem. Experiments on both synthetic data and real gene expression data demonstrate the model is significantly more effective than current biclustering algorithms for these problems. The parameter selection problem and the model’s usefulness in other machine learning clustering applications are also discussed. The Appendix of this paper can be found on
Approximation algorithms for biclustering problems
 In Proc. 6th WABI, volume 4175 of LNBI
, 2006
"... Abstract. One of the main goals in the analysis of microarray data is to identify groups of genes and groups of experimental conditions (including environments, individuals, and tissues) that exhibit similar expression patterns. This is the socalled biclustering problem. In this paper, we consider ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. One of the main goals in the analysis of microarray data is to identify groups of genes and groups of experimental conditions (including environments, individuals, and tissues) that exhibit similar expression patterns. This is the socalled biclustering problem. In this paper, we consider two variations of the biclustering problem: the consensus submatrix problem and the bottleneck submatrix problem. The input of the problems contains an m × n matrix A and integers l and k. The consensus submatrix problem is to find an l × k submatrix with l<mand k<nand a consensus vector such that the sum of distances between the rows in the submatrix and the consensus vector is minimized. The bottleneck submatrix problem is to find an l × k submatrix with l<mand k<n, an integer d and a center vector such that the distance between every row in the submatrix and the vector is at most d and d is minimized. We show that both problems are NPhard and give randomized approximation algorithms for special cases of the two problems. Using standard techniques, we can derandomize the algorithms to get polynomial time approximation schemes for the two problems. To the best of our knowledge, this is the first time that approximation algorithms with guaranteed ratios are presented for microarray data analysis.