Results 1 - 10
of
11
A Hilbert space embedding for distributions
- In Algorithmic Learning Theory: 18th International Conference
, 2007
"... Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for ..."
Abstract
-
Cited by 27 (15 self)
- Add to MetaCart
Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation. Kernel methods are widely used in supervised learning [1, 2, 3, 4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5, 6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or Kullback-Leibler divergence, we require sophisticated space partitioning and/or
Multidimensional local false discovery rate for microarray studies
- Bioinformatics
, 2006
"... Motivation: The false discovery rate (fdr) is a key tool for statistical assessment of differential expression (DE) in microarray studies. Overall control of the fdr alone, however, is not sufficient to address the problem of genes with small variance, which generally suffer from a disproportional h ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Motivation: The false discovery rate (fdr) is a key tool for statistical assessment of differential expression (DE) in microarray studies. Overall control of the fdr alone, however, is not sufficient to address the problem of genes with small variance, which generally suffer from a disproportional high rate of false positives. It is desirable to have an fdr-controlling procedure that automatically accounts for gene variability. Methods: We generalize the local fdr as a function of multiple statistics, combining a common test statistic for assessing differential expression with its standard error information. We use a nonparametric mixture model for DE and nonDE genes to describe the observed multi-dimensional statistics, and estimate the distribution for nonDE genes via the permutation method. We demonstrate this fdr2d approach for simulated and real microarray data. Results: The fdr2d allows objective assessment of differential expression as a function of gene variability. We also show that the fdr2d performs better than commonly-used modified test statistics. Availability: An R-package OCplus containing functions for computing fdr2d() and other operating characteristics of microarray data is available at
Data-adaptive test statistics for microarray data
- Bioinformatics
, 2005
"... Motivation: An important task in microarray data analysis is the selection of genes that are differentially expressed between different tissue samples, such as healthy and diseased. However, microarray data contain an enormous number of dimensions (genes) and very few samples (arrays), a mismatch wh ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Motivation: An important task in microarray data analysis is the selection of genes that are differentially expressed between different tissue samples, such as healthy and diseased. However, microarray data contain an enormous number of dimensions (genes) and very few samples (arrays), a mismatch which poses fundamental statistical problems for the selection process that have defied easy resolution. Results: In this paper, we present a novel approach to the selection of differentially expressed genes in which test statistics are learned from data using a simple notion of reproducibility in selection results as the learning criterion. Reproducibility, as we define it, can be computed without any knowledge of the ‘ground-truth’, but takes advantage of certain properties of microarray data to provide an asymptotically valid guide to expected loss under the true data-generating distribution. We are therefore able to indirectly minimize expected loss, and obtain results substantially more robust than conventional methods. We apply our method to simulated and oligonucleotide array data. Availability: By request to the corresponding author. Contact:
Modeling Microarray Data: Interpreting and communicating the biological results.
"... Various statistical models have been proposed for detecting differential gene expression in data from microarray experiments. Given such detection, we are usually interested in describing the differential expression patterns. Due to the large number of genes that are typically analysed in microarray ..."
Abstract
- Add to MetaCart
Various statistical models have been proposed for detecting differential gene expression in data from microarray experiments. Given such detection, we are usually interested in describing the differential expression patterns. Due to the large number of genes that are typically analysed in microarray experiments, possibly more than ten thousand, the tasks of interpretation and communication of all the corresponding statistical models pose a considerable challenge, except perhaps in the simplest experiment involving only two groups. A further challenge is to find methods to summarize the resulting models. These challenges increase with experimental complexity. Biologists often wish to sort genes into ‘classes ’ with similar response profiles/patterns. So, in this paper we describe a likelihood approach for assigning genes to these different class patterns for data from a replicated experimental design. The number of potential patterns increases very quickly as the number of combinations in the experimental design increases. In a two group experimental design there are only three patterns required to describe the mean response: up, down and no difference. For a factorial design with three treatments there are 13 different patterns, and with four levels
A STATISTICAL APPROACH FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN TIME-COURSE CDNA MICROARRAY EXPERIMENT WITHOUT REPLICATE
"... Replication of time series in microarray experiments is costly. To analyze time series data with no replicate, many model-specific approaches have been proposed. However, they fail to identify the genes whose expression patterns do not fit the pre-defined models. Besides, modeling the temporal expre ..."
Abstract
- Add to MetaCart
Replication of time series in microarray experiments is costly. To analyze time series data with no replicate, many model-specific approaches have been proposed. However, they fail to identify the genes whose expression patterns do not fit the pre-defined models. Besides, modeling the temporal expression patterns is difficult when the dynamics of gene expression in the experiment is poorly understood. We propose a method called PEM (Partial Energy ratio for Microarray) for the analysis of time course cDNA microarray data. In the PEM method, we assume the gene expressions vary smoothly in the temporal domain. This assumption is comparatively weak and hence the method is general enough to identify genes expressed in unexpected patterns. To identify the differentially expressed genes, a new statistic is developed by comparing the energies of two convoluted profiles. We further improve the statistic for microarray analysis by introducing the concept of partial energy. The PEM statistic can be easily incorporated into the SAM framework for significance analysis. We evaluated the PEM method with an artificial dataset and two published time course cDNA microarray datasets on yeast. The experimental results show the robustness and the generality of the PEM method in identifying the genes of interest.
PEM: A GENERAL STATISTICAL APPROACH FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN TIME-COURSE CDNA MICROARRAY EXPERIMENT WITHOUT REPLICATE
, 1231
"... Replication of time series in microarray experiments is costly. To analyze time series data with no replicate, many model-specific approaches have been proposed. However, they fail to identify the genes whose expression patterns do not fit the pre-defined models. Besides, modeling the temporal expre ..."
Abstract
- Add to MetaCart
Replication of time series in microarray experiments is costly. To analyze time series data with no replicate, many model-specific approaches have been proposed. However, they fail to identify the genes whose expression patterns do not fit the pre-defined models. Besides, modeling the temporal expression patterns is difficult when the dynamics of gene expression in the experiment is poorly understood. We propose a method called PEM (Partial Energy ratio for Microarray) for the analysis of time course cDNA microarray data. In the PEM method, we assume the gene expressions vary smoothly in the temporal domain. This assumption is comparatively weak and hence the method is general enough to identify genes expressed in unexpected patterns. To identify the differentially expressed genes, a new statistic is developed by comparing the energies of two convoluted profiles. We further improve the statistic for microarray analysis by introducing the concept of partial energy. The PEM statistic is incorporated into the permutation based SAM framework for significance analysis. We evaluated the PEM method with an artificial dataset and two published time course cDNA microarray datasets on yeast. The experimental results show the robustness and the generality of the PEM method. It outperforms the previous versions of SAM and the spline based EDGE approaches in identifying genes of interest, which are differentially expressed in various manner. Keywords: Time course, cDNA microarray, differentially expressed gene, PEM. 1.
BMC Biology BioMed Central
, 2007
"... Research article Gene expression profiling of cuticular proteins across the moult cycle of the crab Portunus pelagicus ..."
Abstract
- Add to MetaCart
Research article Gene expression profiling of cuticular proteins across the moult cycle of the crab Portunus pelagicus
A Bayesian Approach to Joint Modeling of Protein-DNA Binding, Gene Expression and Sequence Data
"... Abstract The genome-wide DNA-protein binding data, DNA sequence data and gene expression data represent complementary means to deciphering global and local transcriptional regulatory circuits. Combining these different types of data can not only improve the statistical power, but also provide a more ..."
Abstract
- Add to MetaCart
Abstract The genome-wide DNA-protein binding data, DNA sequence data and gene expression data represent complementary means to deciphering global and local transcriptional regulatory circuits. Combining these different types of data can not only improve the statistical power, but also provide a more comprehensive picture of gene regulation. In this paper, we propose a novel statistical model to augment proteinDNA binding data with gene expression and DNA sequence data when available. We specify a hierarchical Bayes model and use Markov chain Monte Carlo simulations to draw inferences. Both simulation studies and an analysis of an experimental dataset show that the proposed joint modeling method can significantly improve the specificity and sensitivity of identifying target genes as compared to conventional approaches relying on a single data source. Keywords Bayesian Model; ChIP-chip Data;Joint Modeling; Microarray. 0 1
NOTES Transcriptome Analysis of Shewanella oneidensis MR-1 in Response to Elevated Salt Conditions
, 2004
"... Whole-genomic expression patterns were examined in Shewanella oneidensis cells exposed to elevated sodium chloride. Genes involved in Na � extrusion and glutamate biosynthesis were significantly up-regulated, and the majority of chemotaxis/motility-related genes were significantly down-regulated. Th ..."
Abstract
- Add to MetaCart
Whole-genomic expression patterns were examined in Shewanella oneidensis cells exposed to elevated sodium chloride. Genes involved in Na � extrusion and glutamate biosynthesis were significantly up-regulated, and the majority of chemotaxis/motility-related genes were significantly down-regulated. The data also suggested an important role for metabolic adjustment in salt stress adaptation in S. oneidensis. Shewanella species inhabit diverse environments, including spoiled food (11) and infected animals (35), deep-sea and freshwater lake sediments (8, 45, 54), and oilfield waste sites (44). Shewanella oneidensis MR-1, a facultative, gram-negative bacterium, was isolated from sediments of Lake Oneida in New York (32). The bacterium can anaerobically respire numerous organic compounds, including fumarate and dimethyl sulfoxide (28), as well as reduce metals such as Fe(III), Mn(IV), Cr(VI), and U(VI) (22, 29, 32). Because of the respiratory versatility, which may be exploited for immobilization of environmental pollutants (i.e., chromium and uranium) in soil and groundwater, the metal-reducing capabilities of Shewanella
Project
"... Copyright c○2004 by the authors. Differential Expression with the Bioconductor ..."
Abstract
- Add to MetaCart
Copyright c○2004 by the authors. Differential Expression with the Bioconductor

