Results 1  10
of
545
Is CrossValidation Valid for SmallSample Microarray Classification?
, 2004
"... Motivation: Microarray classification typically possesses two striking attributes: (1) classifier design and error estimation are based on remarkably small samples and (2) crossvalidation error estimation is employed in the majority of the papers. Thus, it is necessary to have a quantifiable unders ..."
Abstract

Cited by 139 (33 self)
 Add to MetaCart
Motivation: Microarray classification typically possesses two striking attributes: (1) classifier design and error estimation are based on remarkably small samples and (2) crossvalidation error estimation is employed in the majority of the papers. Thus, it is necessary to have a quantifiable understanding of the behavior of crossvalidation in the context of very small samples.
Spectral biclustering of microarray data: Coclustering genes and conditions
 Genome Research
, 2003
"... ..."
A Bayesian missing value estimation method for gene expression profile data
 Bioinformatics
, 2003
"... Motivation: Gene expression profile analyses have been used in numerous studies covering a broad range of areas in biology. When unreliable measurements are excluded, missing values are introduced in gene expression profiles. Although existing multivariate analysis methods have difficulty with the t ..."
Abstract

Cited by 125 (2 self)
 Add to MetaCart
Motivation: Gene expression profile analyses have been used in numerous studies covering a broad range of areas in biology. When unreliable measurements are excluded, missing values are introduced in gene expression profiles. Although existing multivariate analysis methods have difficulty with the treatment of missing values, this problem has received little attention. There are many options for dealing with missing values, each of which reaches drastically different results. Ignoring missing values is the simplest method and is frequently applied. This approach, however, has its flaws. In this article, we propose an estimation method for missing values, which is based on Bayesian principal component analysis (BPCA). Although the methodology that a probabilistic model and latent variables are estimated simultaneously within the framework of Bayes
OntoTools, the toolkit of the modern biologist: OntoExpress, OntoCompare, OntoDesign and OntoTranslate
 Nucleic Acids Res
, 2003
"... OntoTools is a set of four seamlessly integrated databases: OntoExpress, OntoCompare, OntoDesign and OntoTranslate. OntoExpress is able to automatically translate lists of genes found to be differentially regulated in a given condition into functional profiles characterizing the impact of the ..."
Abstract

Cited by 113 (8 self)
 Add to MetaCart
(Show Context)
OntoTools is a set of four seamlessly integrated databases: OntoExpress, OntoCompare, OntoDesign and OntoTranslate. OntoExpress is able to automatically translate lists of genes found to be differentially regulated in a given condition into functional profiles characterizing the impact of the condition studied upon various biological processes and pathways. OE constructs functional profiles (using Gene Ontology terms) for the following categories: biochemical function, biological process, cellular role, cellular component, molecular function and chromosome location. Statistical significance values are calculated for each category. Once the initial exploratory analysis identified a number of relevant biological processes, specific mechanisms of interactions can be hypothesized for the conditions studied. Currently, many commercial arrays are available for the investigation of specific mechanisms. Each such array is characterized by a biological bias determined by the extent to which the genes present on the array represent specific pathways. OntoCompare is a tool that allows efficient comparisons of any sets of commercial or custom arrays. Using OntoCompare, a researcher can determine quickly which array, or set of arrays, covers best the hypotheses studied. In many situations, no commercial arrays are available for specific biological mechanisms. OntoDesign is a tool that allows the user to select genes that represent given functional categories. OntoTranslate allows the user to translate easily lists of accession numbers, UniGene clusters and Affymetrix probes into one another. All tools above are seamlessly integrated. The OntoTools are available online at
CLICK and EXPANDER: a system for clustering and visualizing gene expression data
 Bioinformatics
, 2003
"... Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar exp ..."
Abstract

Cited by 101 (6 self)
 Add to MetaCart
Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering genes based on their expression patterns. Results: We present a novel clustering algorithm, called CLICK, and its applications to gene expression analysis. The algorithm utilizes graphtheoretic and statistical techniques to identify tight groups (kernels) of highly similar elements, which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clusters. We report on the application of CLICK to a variety of gene expression data sets. In all those applications it outperformed extant algorithms according to several common figures of merit. We also point out that CLICK can be successfully used for the identification of common regulatory motifs in the upstream regions of coregulated genes. Furthermore, we demonstrate how CLICK can be used to accurately classify tissue samples into disease types, based on their expression profiles. Finally, we present a new javabased graphical tool, called EXPANDER, for gene expression analysis and visualization, which incorporates CLICK and several other popular clustering algorithms.
Missing value estimation for DNA microarray gene expression data: local least squares imputation
 BIOINFORMATICS
, 2005
"... ..."
Prediction by supervised principal components
 Journal of the American Statistical Association
, 2006
"... In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal co ..."
Abstract

Cited by 98 (9 self)
 Add to MetaCart
In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal components is similar to conventional principal components analysis except that it uses a subset of the predictors selected based on their association with the outcome. Supervised principal components can be applied to regression and generalized regression problems, such as survival analysis. It compares favorably to other techniques for this type of problem, and can also account for the effects of other covariates and help identify which predictor variables are most important. We also provide asymptotic consistency results to help support our empirical findings. These methods could become important tools for DNA microarray data, where they may be used to more accurately diagnose and treat cancer. KEY WORDS: Gene expression; Microarray; Regression; Survival analysis. 1.
Graph Kernels
, 2007
"... We present a unified framework to study graph kernels, special cases of which include the random walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time complexit ..."
Abstract

Cited by 94 (9 self)
 Add to MetaCart
We present a unified framework to study graph kernels, special cases of which include the random walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time complexity of kernel computation between unlabeled graphs with n vertices from O(n 6) to O(n 3). We find a spectral decomposition approach even more efficient when computing entire kernel matrices. For labeled graphs we develop conjugate gradient and fixedpoint methods that take O(dn 3) time per iteration, where d is the size of the label set. By extending the necessary linear algebra to Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for ddimensional edge kernels, and O(n 4) in the infinitedimensional case; on sparse graphs these algorithms only take O(n 2) time per iteration in all cases. Experiments on graphs from bioinformatics and other application domains show that these techniques can speed up computation of the kernel by an order of magnitude or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to Rconvolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment kernel of Fröhlich et al. (2006) yet provably positive semidefinite.
private communication
"... A rigid interval graph is an interval graph which has only one clique tree. In 2009, Panda and Das show that all connected unit interval graphs are rigid interval graphs. Generalizing the two classic graph search algorithms, Lexicographic BreadthFirst Search (LBFS) and Maximum Cardinality Search (M ..."
Abstract

Cited by 88 (6 self)
 Add to MetaCart
A rigid interval graph is an interval graph which has only one clique tree. In 2009, Panda and Das show that all connected unit interval graphs are rigid interval graphs. Generalizing the two classic graph search algorithms, Lexicographic BreadthFirst Search (LBFS) and Maximum Cardinality Search (MCS), Corneil and Krueger propose in 2008 the socalled Maximal Neighborhood Search (MNS) and show that one sweep of MNS is enough to recognize chordal graphs. We develop the MNS properties of rigid interval graphs and characterize this graph class in several different ways. This allows us obtain several linear time multisweep MNS algorithms for recognizing rigid interval graphs and unit interval graphs, generalizing a corresponding 3sweep LBFS algorithm for unit interval graph recognition designed by Corneil in 2004. For unit interval graphs, we even present a new linear time 2sweep MNS certifying recognition algorithm. Submitted:
Functional interpretation of microarray experiments
 OMICS
, 2006
"... Over the past few years, due to the popularisation of highthroughput methodologies such as DNA microarrays, the possibility of obtaining experimental data has increased significantly. Nevertheless, the interpretation of the results, which involves translating these data into useful biological knowl ..."
Abstract

Cited by 75 (24 self)
 Add to MetaCart
Over the past few years, due to the popularisation of highthroughput methodologies such as DNA microarrays, the possibility of obtaining experimental data has increased significantly. Nevertheless, the interpretation of the results, which involves translating these data into useful biological knowledge, still remains a challenge. The methods and strategies used for this interpretation are in continuous evolution and new proposals are constantly arising. Initially, a twostep approach was used in which genes of interest were initially selected, based on thresholds that consider only experimental values, and then in a second, independent step the enrichment of these genes in biologically relevant terms, was analysed. For different reasons, these methods are relatively poor in terms of performance and a new generation of procedures, which draw inspiration from systems biology criteria, are currently under development. Such procedures, aim to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes.