Results 1 - 10
of
118
Spectral biclustering of microarray data: Coclustering genes and conditions
- Genome Research
, 2003
"... service ..."
Is Cross-Validation Valid for Small-Sample Microarray Classification?
, 2004
"... Motivation: Microarray classification typically possesses two striking attributes: (1) classifier design and error estimation are based on remarkably small samples and (2) cross-validation error estimation is employed in the majority of the papers. Thus, it is necessary to have a quantifiable unders ..."
Abstract
-
Cited by 54 (12 self)
- Add to MetaCart
Motivation: Microarray classification typically possesses two striking attributes: (1) classifier design and error estimation are based on remarkably small samples and (2) cross-validation error estimation is employed in the majority of the papers. Thus, it is necessary to have a quantifiable understanding of the behavior of cross-validation in the context of very small samples.
CLICK and EXPANDER: a system for clustering and visualizing gene expression data
- Bioinformatics
, 2003
"... Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar exp ..."
Abstract
-
Cited by 42 (6 self)
- Add to MetaCart
Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering genes based on their expression patterns. Results: We present a novel clustering algorithm, called CLICK, and its applications to gene expression analysis. The algorithm utilizes graph-theoretic and statistical techniques to identify tight groups (kernels) of highly similar elements, which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clusters. We report on the application of CLICK to a variety of gene expression data sets. In all those applications it outperformed extant algorithms according to several common figures of merit. We also point out that CLICK can be successfully used for the identification of common regulatory motifs in the upstream regions of co-regulated genes. Furthermore, we demonstrate how CLICK can be used to accurately classify tissue samples into disease types, based on their expression profiles. Finally, we present a new java-based graphical tool, called EXPANDER, for gene expression analysis and visualization, which incorporates CLICK and several other popular clustering algorithms.
Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate
- Nucleic Acids Res
, 2003
"... Onto-Tools is a set of four seamlessly integrated databases: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Onto-Express is able to automatically translate lists of genes found to be differentially regulated in a given condition into functional profiles characterizing the impact of the ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
Onto-Tools is a set of four seamlessly integrated databases: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Onto-Express is able to automatically translate lists of genes found to be differentially regulated in a given condition into functional profiles characterizing the impact of the condition studied upon various biological processes and pathways. OE constructs functional profiles (using Gene Ontology terms) for the following categories: biochemical function, biological process, cellular role, cellular component, molecular function and chromosome location. Statistical significance values are calculated for each category. Once the initial exploratory analysis identified a number of relevant biological processes, specific mechanisms of interactions can be hypothesized for the conditions studied. Currently, many commercial arrays are available for the investigation of specific mechanisms. Each such array is characterized by a biological bias determined by the extent to which the genes present on the array represent specific pathways. Onto-Compare is a tool that allows efficient comparisons of any sets of commercial or custom arrays. Using Onto-Compare, a researcher can determine quickly which array, or set of arrays, covers best the hypotheses studied. In many situations, no commercial arrays are available for specific biological mechanisms. Onto-Design is a tool that allows the user to select genes that represent given functional categories. Onto-Translate allows the user to translate easily lists of accession numbers, UniGene clusters and Affymetrix probes into one another. All tools above are seamlessly integrated. The Onto-Tools are available online at
Prediction by supervised principal components
- Journal of the American Statistical Association
, 2006
"... In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal co ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal components is similar to conventional principal components analysis except that it uses a subset of the predictors selected based on their association with the outcome. Supervised principal components can be applied to regression and generalized regression problems, such as survival analysis. It compares favorably to other techniques for this type of problem, and can also account for the effects of other covariates and help identify which predictor variables are most important. We also provide asymptotic consistency results to help support our empirical findings. These methods could become important tools for DNA microarray data, where they may be used to more accurately diagnose and treat cancer. KEY WORDS: Gene expression; Microarray; Regression; Survival analysis. 1.
Functional interpretation of microarray experiments
- OMICS
, 2006
"... Over the past few years, due to the popularisation of high-throughput methodologies such as DNA microarrays, the possibility of obtaining experimental data has increased significantly. Nevertheless, the interpretation of the results, which involves translating these data into useful biological knowl ..."
Abstract
-
Cited by 22 (11 self)
- Add to MetaCart
Over the past few years, due to the popularisation of high-throughput methodologies such as DNA microarrays, the possibility of obtaining experimental data has increased significantly. Nevertheless, the interpretation of the results, which involves translating these data into useful biological knowledge, still remains a challenge. The methods and strategies used for this interpretation are in continuous evolution and new proposals are constantly arising. Initially, a two-step approach was used in which genes of interest were initially selected, based on thresholds that consider only experimental values, and then in a second, independent step the enrichment of these genes in biologically relevant terms, was analysed. For different reasons, these methods are relatively poor in terms of performance and a new generation of procedures, which draw inspiration from systems biology criteria, are currently under development. Such procedures, aim to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes.
Feature Selection for Unsupervised and Supervised Inference: the Emergence of Sparsity in a Weighted-Based Approach
"... The problem of selecting a subset of relevant features in a potentially overwhelming quantity of data is classic and found in many branches of science including --- examples in computer vision, text processing and more recently bioinformatics are abundant. In this work we present a definition of "re ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
The problem of selecting a subset of relevant features in a potentially overwhelming quantity of data is classic and found in many branches of science including --- examples in computer vision, text processing and more recently bioinformatics are abundant. In this work we present a definition of "relevancy" based on spectral properties of the Affinity (or Laplacian) of the features' measurement matrix. The feature selection process is then based on a continuous ranking of the features defined by a least-squares optimization process. A remarkable property of the feature relevance function is that sparse solutions for the ranking values naturally emerge as a result of a "biased non-negativity" of a key matrix in the process. As a result, a simple least-squares optimization process converges onto a sparse solution, i.e., a selection of a subset of features which form a local maxima over the relevance function. The feature selection algorithm can be embedded in both unsupervised and supervised inference problems and empirical evidence show that the feature selections typically achieve high accuracy even when only a small fraction of the features are relevant.
Practical Approaches to Analyzing Results of Microarray Experiments
"... this article we provide a practically oriented review focusing on methods for analysis of large-scale gene expression data in the research laboratory. We describe the various common clustering methods and outline our approach to using them. We dis- cuss methods for scoring genes for their relevance, ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
this article we provide a practically oriented review focusing on methods for analysis of large-scale gene expression data in the research laboratory. We describe the various common clustering methods and outline our approach to using them. We dis- cuss methods for scoring genes for their relevance, focusing on the statistical meaning of microarray results, especially with regard to the problem of multiple testing. We also deal with the problem of adding biologic meaning to the results of microarray experiments and describe advanced tools that represent different but valid directions in providing automated solutions to this problem. The tools and approaches described and discussed here should provide the reader with a preliminary understanding of the analysis of the results of microarray experiments. The practical focus of this review should remove the mystery behind the analysis of microarray experiments, thus leading to more productive and efficient use of the technology. Microarray technology is rapidly becoming a standard technique used in research laboratories all across the world. In essence, all the variants of the technology allow simultaneous profiling of the expression levels of tens of thousands of genes, potentially whole genomes in a single experiment (1--3). This unique power provides scientists with an opportunity to look at the transcriptional profile of biologic systems, processes, and diseases in an unbiased fashion. The relative ease (despite the prohibitive cost) of performing microarray experiments in molecular laboratory settings, combined with the potential power of the technology, have captured the imagination of scientists in academic and industry research institutes. This combination of ease of use with unforeseen power also appealed to adminis...
Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data
, 2005
"... ..."
Next station in microarray data analysis: GEPAS
- Nucleic Acids Res
, 2006
"... The Gene Expression Profile Analysis Suite (GEPAS) has been running for more than four years. During this time it has evolved to keep pace with the new interests and trends in the still changing world of microarray data analysis. GEPAS has been designed to provide an intuitive although powerful web- ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
The Gene Expression Profile Analysis Suite (GEPAS) has been running for more than four years. During this time it has evolved to keep pace with the new interests and trends in the still changing world of microarray data analysis. GEPAS has been designed to provide an intuitive although powerful web-based interface that offers diverse analysis options from the early step of preprocessing (normalization of Affymetrix and two-colour microarray experiments and other preprocessing options), to the final step of the functional annotation of the experiment (using Gene Ontology, pathways, PubMed abstracts etc.), and include different possibilities for clustering, gene selection, class prediction and arraycomparative genomic hybridization management. GEPAS is extensively used by researchers of many countries and its records indicate an average usage rate of 400 experiments per day. The web-based pipeline for microarray gene expression data, GEPAS, is available at

