Results 1 - 10
of
42
ab initio prediction of transcription factor targets using structural knowledge
- PLoS Comput Biol
, 2005
"... Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structur ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structural information to infer context-specific amino acid–nucleotide recognition preferences. These are used to predict binding sites for novel transcription factors from the same structural family. We demonstrate our approach on the Cys 2His 2 Zinc Finger protein family, and show that the learned DNA-recognition preferences are compatible with experimental results. We use these preferences to perform a genome-wide scan for direct targets of Drosophila melanogaster Cys 2His 2 transcription factors. By analyzing the predicted targets along with gene annotation and expression data we infer the function and activity of these proteins. Citation: Kaplan T, Friedman N, Margalit H (2005) Ab initio prediction of transcription factor targets using structural knowledge. PLoS Comp Biol 1(1): e1.
Multi-class Discriminant Kernel Learning via Convex Programming
"... Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of pre-specified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binary-class case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semi-infinite linear programming (SILP) formulation is derived to further improve the efficiency. We extend these formulations to the multi-class case based on a key result established in this paper. That is, the multi-class RKDA kernel learning problem can be decomposed into a set of binary-class kernel learning problems which are constrained to share a common kernel. Based on this decomposition property, SDP formulations are proposed for the multi-class case. Furthermore, it leads naturally to QCQP and SILP formulations. As the performance of RKDA depends on the regularization parameter, we show that this parameter can also be optimized in a joint framework with the kernel. Extensive experiments have been conducted and analyzed, and connections to other algorithms are discussed.
Multi-label Multiple Kernel Learning
"... We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms. 1
REDfly: a regulatory element database for Drosophila
- Bioinformatics
, 2006
"... Summary: Bioinformatics studies of transcriptional regulation in the metazoa are significantly hindered by the absence of readily available data on large numbers of transcriptional cis-regulatory modules (CRMs). Even the richly annotated Drosophila melanogaster genome lacks extensive CRM information ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Summary: Bioinformatics studies of transcriptional regulation in the metazoa are significantly hindered by the absence of readily available data on large numbers of transcriptional cis-regulatory modules (CRMs). Even the richly annotated Drosophila melanogaster genome lacks extensive CRM information. We therefore present here a database of Drosophila CRMs curated from the literature complete with both DNA sequence and a searchable description of the gene expression pattern regulated by each CRM. This resource should greatly facilitate the development of computational approaches to CRM discovery as well as bioinformatics analyses of regulatory sequence properties and evolution.
Clustering gene expression patterns of fly embryos
- Proc. ISBI 2006
, 2006
"... The spatio-temporal patterning of gene expression in early embryos is an important source of information for understanding the functions of genes involved in development. Most analyses to date rely on biologists ' visual inspection of microscope images, which for large-scale datasets becomes impract ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
The spatio-temporal patterning of gene expression in early embryos is an important source of information for understanding the functions of genes involved in development. Most analyses to date rely on biologists ' visual inspection of microscope images, which for large-scale datasets becomes impractical and subjective. In this paper, we introduce a new method for clustering 2D images of gene expression patterns in Drosophila melanogaster (fruit fly) embryos. These patterns, typically generated from in situ hybridization of mRNA probes, reveal when, where and how abundantly a target gene is expressed. Our method involves two steps. First, we use an eigen-embryo model to reduce noise and generate feature vectors that form a better basis for capturing the salient aspects of quantized embryo images. Second, we cluster these feature vectors by an efficient minimum-spanning-tree partition algorithm. We investigate this approach on fly embryo datasets that span the entire course of embryogenesis. The experimental results show that our clustering algorithm produces superior pattern clusters. We also find previously unobserved clusters of genes that share biologically interesting patterns of gene-expression. 1.
Constructing a quantitative spatio-temporal atlas of gene expression in Drosophila blastoderm. to appear in: Cell (Accepted
, 2008
"... To fully understand animal transcription networks, it is essential to accurately measure the spatial and temporal expression patterns of transcription factors and their targets. We describe a registration technique that takes image-based data from hundreds of Drosophila blastoderm embryos, each co-s ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
To fully understand animal transcription networks, it is essential to accurately measure the spatial and temporal expression patterns of transcription factors and their targets. We describe a registration technique that takes image-based data from hundreds of Drosophila blastoderm embryos, each co-stained for a reference gene and one of a set of genes of interest, and builds a model VirtualEmbryo. This model captures in a common framework the average expression patterns for many genes in spite of significant variation in morphology and expression between individual embryos. We establish the method’s accuracy by showing that relationships between a pair of genes ’ expression inferred from the model are nearly identical to those measured in embryos co-stained for the pair. We present a VirtualEmbryo containing data for 95 genes at six time cohorts. We show that known regulatory interactions within the network can be recovered from this dataset and predict hundreds of new interactions. 2
A Least Squares Formulation for Canonical Correlation Analysis
"... Canonical Correlation Analysis (CCA) is a well-known technique for finding the correlations between two sets of multi-dimensional variables. It projects both sets of variables into a lower-dimensional space in which they are maximally correlated. CCA is commonly applied for supervised dimensionality ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Canonical Correlation Analysis (CCA) is a well-known technique for finding the correlations between two sets of multi-dimensional variables. It projects both sets of variables into a lower-dimensional space in which they are maximally correlated. CCA is commonly applied for supervised dimensionality reduction, in which one of the multi-dimensional variables is derived from the class label. It has been shown that CCA can be formulated as a least squares problem in the binaryclass case. However, their relationship in the more general setting remains unclear. In this paper, we show that, under a mild condition which tends to hold for high-dimensional data, CCA in multi-label classifications can be formulated as a least squares problem. Based on this equivalence relationship, we propose several CCA extensions including sparse CCA using 1-norm regularization. Experiments on multi-label data sets confirm the established equivalence relationship. Results also demonstrate the effectiveness of the proposed CCA extensions. 1.
Analyzing in situ Gene Expression in the Mouse Brain with Image Registration, Feature Extraction and Block Clustering
"... Background: Many important high throughput projects use in situ hybridization and may require the analysis of images of spatial cross sections of organisms taken with cellular level resolution. Projects creating gene expression atlases at unprecedented scales for the embryonic fruit fly as well as t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Background: Many important high throughput projects use in situ hybridization and may require the analysis of images of spatial cross sections of organisms taken with cellular level resolution. Projects creating gene expression atlases at unprecedented scales for the embryonic fruit fly as well as the embryonic and adult mouse already involve the analysis of hundreds of thousands of high resolution experimental images mapping mRNA expression patterns. Challenges include accurate registration of highly deformed tissues, associating cells with known anatomical regions, and identifying groups of genes whose expression is coordinately regulated with respect to both concentration and spatial location. Solutions to these and other challenges will lead to a richer understanding of the complex system aspects of gene regulation in heterogeneous tissue. Results: We present an end-to-end approach for processing raw in situ expression imagery and performing subsequent analysis. We use a non-linear, information theoretic based image registration technique specifically adapted for mapping expression images to anatomical annotations and a method for extracting expression information within an anatomical region. Our method consists of coarse registration, fine registration, and expression feature extraction steps. From this we obtain a matrix for expression characteristics with rows corresponding to genes and columns corresponding to anatomical sub-structures. We perform matrix block cluster analysis using a novel
Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning
"... The Berkeley Drosophila Genome Project (BDGP) has produced a large number of gene expression patterns, many of which have been annotated textually with anatomical and developmental terms. These terms spatially correspond to local regions of the images; however, they are attached collectively to grou ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The Berkeley Drosophila Genome Project (BDGP) has produced a large number of gene expression patterns, many of which have been annotated textually with anatomical and developmental terms. These terms spatially correspond to local regions of the images; however, they are attached collectively to groups of images, such that it is unknown which term is assigned to which region of which image in the group. This poses a challenge to the development of the computational method to automate the textual description of expression patterns contained in each image. In this paper, we show that the underlying nature of this task matches well with a new machine learning framework, Multi-Instance Multi-Label learning (MIML). We propose a new MIML support vector machine to solve the problems that beset the annotation task. Empirical study shows that the proposed method outperforms the state-of-the-art Drosophila gene expression pattern annotation methods. 1

