Results 1 - 10
of
26
Genetic Network Inference: From Co-Expression Clustering To Reverse Engineering
, 2000
"... motivation: Advances in molecular biological, analytical and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using highthroughput gene expression assays, we are able to measure the output of the ge ..."
Abstract
-
Cited by 156 (0 self)
- Add to MetaCart
motivation: Advances in molecular biological, analytical and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using highthroughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-cluster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e. who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting and bioengineering.
A Bayesian System Integrating Expression Data with Sequence Patterns for Localizing Proteins: Comprehensive Application to the Yeast Genome
"... proteins (on the yeast proteins with known localization, 92 % versus 74 %). Our training and testing also highlights which of the 30 features are informative and which are redundant (19 being particularly useful). After developing our system, we apply it to the 4700 yeast proteins with currently un ..."
Abstract
-
Cited by 53 (18 self)
- Add to MetaCart
proteins (on the yeast proteins with known localization, 92 % versus 74 %). Our training and testing also highlights which of the 30 features are informative and which are redundant (19 being particularly useful). After developing our system, we apply it to the 4700 yeast proteins with currently unknown localization and estimate the relative population of the various compartments in the entire yeast genome. An unbiased prior is essential to this extrapolated estimate; for this, we use the MIPS localization catalogue, and adapt recent results on the localization of yeast proteins obtained by Snyder and colleagues using a minitransposon system. Our nal localizations for all #6000 proteins in the yeast genome are available over the web at: http://bioinfo.mbb.yale.edu/genome/localize # 2000 Academic Press Keywords: proteomics; bioinformatics; machine learning; cDNA microarray analysis; subcellular localization *Co
Computational Methods for the Identification of Differential and Coordinated Gene Expression
- Human Molecular Genetics
, 1999
"... this article, I review the theoretical and computational approaches used to: (i) identify genes differentially expressed (across cell types, developmental stages, pathological conditions, etc.); (ii) identify genes expressed in a coordinated manner across a set of conditions; and (iii) delineate clu ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
this article, I review the theoretical and computational approaches used to: (i) identify genes differentially expressed (across cell types, developmental stages, pathological conditions, etc.); (ii) identify genes expressed in a coordinated manner across a set of conditions; and (iii) delineate clusters of genes sharing coherent expression features, eventually defining global biological pathways
Predicting gene function in Saccharomyces cerevisiae
- Bioinformatics
, 2003
"... Motivation: S.cerevisiae is one of the most important model organisms, and has has been the focus of over a century of study. In spite of these efforts, 40 % of its open reading frames (ORFs) remain classified as having unknown function (MIPS: Munich Information Center for Protein Sequences). We wis ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
Motivation: S.cerevisiae is one of the most important model organisms, and has has been the focus of over a century of study. In spite of these efforts, 40 % of its open reading frames (ORFs) remain classified as having unknown function (MIPS: Munich Information Center for Protein Sequences). We wished to make predictions for the function of these ORFs using data mining, as we have previously successfully done for the genomes of M.tuberculosis and E.coli. Applying this approach to the larger and eukaryotic S.cerevisiae genome involves modifying the machine learning and data mining algorithms, as this is a larger organism with more data available, and a more challenging functional classification. Results: Novel extensions to the machine learning and data mining algorithms have been devised in order to deal with the challenges. Accurate rules have been learned and predictions have been made for many of the ORFs whose function is currently unknown. The rules are informative, agree with known biology and allow for scientific discovery. Availability: All predictions are freely available from
Genome-Wide Analysis Relating Expression Level With Protein Subcellular Localization
, 2000
"... combined' dataset derived from averaging many different experiments. Note that further detail on this figure is shown in Figure 2, which gives the numbers of different transcripts for each compartment. Middle, fluctuation of expression levels during the time course of the yeast cell cycle with alpha ..."
Abstract
-
Cited by 24 (16 self)
- Add to MetaCart
combined' dataset derived from averaging many different experiments. Note that further detail on this figure is shown in Figure 2, which gives the numbers of different transcripts for each compartment. Middle, fluctuation of expression levels during the time course of the yeast cell cycle with alpha-factor arrest 6. To calculate the fluctuation, we start with the logarithm of the expression ratio for gene/at time t (Eqn 1): R(i, 8 = Iog2I r(i,t) - r akeground(i,t) I Eqn 1 ( g(i, t) - gBakground(i, t) ) where r(t) and g(/;t) represent the red and green fluorescent signals at a particular time point. It is usual to analyse the logarithm of the expression ratio, rather than the expression ratio itself, because the logarithm is generally distributed symmetrically around zer. Note that because of the structure of microarray experiments, all gene-to-gene comparisons have to be done using expression ratios rather than absolute measurements. Then we calculate the standard deviation "0 in th
Revisiting the Codon Adaptation Index From a Whole-Genome Perspective: Analyzing the Relationship Between Gene Expression and Codon Occurrence in Yeast Using a Variety of Models
"... Highly expressed genes in many bacteria and small eukaryotes often have a strong compositional bias, in terms of codon usage. Two widely used numerical indices, the codon adaptation index (CAI) and the codon usage, use this bias to predict the expression level of genes. When these indices were rst i ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Highly expressed genes in many bacteria and small eukaryotes often have a strong compositional bias, in terms of codon usage. Two widely used numerical indices, the codon adaptation index (CAI) and the codon usage, use this bias to predict the expression level of genes. When these indices were rst introduced, they were based on fairly simple assumptions about which genes are most highly expressed: the CAI was originally based on the codon composition of a set of only 24 highly expressed genes, and the codon usage on assumptions about which functional classes of genes are highly expressed in fast-growing bacteria. Given the recent advent of genomewide expression data, we should be able to improve on these assumptions. Here, we measure, in yeast, the degree to which consideration of the current genome -wide expression data sets improves the performance of both numerical indices. Indeed, we nd that by changing the parameterization of each model its correlation with actual expression levels can be somewhat improved, although both indices are fairly insensitive to the exact way they are parameterized. This insensitivity indicates a consistent codon bias amongst highly expressed genes. We also attempt direct linear regression of codon composition against genome-wide expression levels (and protein abundance data). This has some similarity with the CAI formalism and yields an alternative model for the prediction of expression levels based on the coding sequences of genes. More information is available at http://bioinfo.mbb.yale.edu/expression/codons.
A question of size: the eukaryotic proteome and the problems in defining it
- Nucl. Acids. Res
, 2002
"... We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is sti ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is still not defined. New small genes are still being discovered, and a large number of existing annotations are being called into question, with these questionable ORFs (qORFs) comprising up to a fifth of the ‘current ’ proteome. We discuss these in context of an ideal genome-annotation strategy that considers the proteome as a rigorously defined subset of all possible coding sequences (‘the orfome’). (ii) Despite the greater apparent complexity of the fly (more cells, more complex physiology, longer lifespan), the nematode worm appears to have more genes. To explain this, we compare the annotated proteomes of worm and fly, relating to both genomeannotation and genome evolution issues. (iii) The unexpectedly small size of the gene complement estimated for the complete human genome provoked much public debate about the nature of biological complexity. However, in the first instance, for the human genome the relationship between gene numberandproteomesizeisfarfromsimple.We survey the current estimates for the numbers of human genes and, from this, we estimate the range in the size of the human proteome. The determination of this is substantially hampered by the unknown extent of the cohort of pseudogenes (‘dead ’ genes), in combination with the prevalence of alternative splicing. (Further information relating to yeast is available at
Methods in comparative genomics: Genome correspondence, gene identification, and regulatory motif discovery
- Journal of Computational Biology
, 2004
"... In Kellis et al. (2003), we reported the genome sequences of S. paradoxus, S. mikatae, and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genomewide comparative analysis allowed the identification of functionally important sequences, both coding and noncodi ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
In Kellis et al. (2003), we reported the genome sequences of S. paradoxus, S. mikatae, and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genomewide comparative analysis allowed the identification of functionally important sequences, both coding and noncoding. In this companion paper we describe the mathematical and algorithmic results underpinning the analysis of these genomes. (1) We present methods for the automatic determination of genome correspondence. The algorithms enabled the automatic identification of orthologs for more than 90 % of genes and intergenic regions across the four species despite the large number of duplicated genes in the yeast genome. The remaining ambiguities in the gene correspondence revealed recent gene family expansions in regions of rapid genomic change. (2) We present methods for the identification of proteincoding genes based on their patterns of nucleotide conservation across related species. We observed the pressure to conserve the reading frame of functional proteins and developed a test for gene identification with high sensitivity and specificity. We used this test to revisit the genome of S. cerevisiae, reducing the overall gene count by 500 genes (10 % of previously
A Public Database for Gene Expression in Human Cancers
, 2002
"... A public database, SAGEmap, was created as a component of the Cancer Genome Anatomy Project to provide a central location for depositing, retrieving, and analyzing human gene expression data. This database uses serial analysis of gene expression to quantify transcript levels in both malignant and no ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
A public database, SAGEmap, was created as a component of the Cancer Genome Anatomy Project to provide a central location for depositing, retrieving, and analyzing human gene expression data. This database uses serial analysis of gene expression to quantify transcript levels in both malignant and normal human tissues. By accessing SAGEmap (http://www.ncbi.nlm.nih- .gov/SAGE) the user can compare transcript populations between any of the posted libraries. As an initial demonstration of the database's utility, gene expression in human glioblastomas was compared with that of normal brain white matter. Of the 47,174 unique transcripts expressed in these two tissues, 471 (1.0%) were differentially expressed by more than 5-fold (P < 0.001). Classification of these genes revealed functions consistent with the biological properties of glioblastomas, in particular: angiogenesis, transcription, and cell cycle related genes.
Snapshots of Systems -- Metabolic Control Analysis . . .
, 1999
"... ... what may be a large impact of this on the control structure. In the absence of compartmentation or channelling, such cycles also serve to connect segments of metabolism usually considered rather distant from each other. Simplified (`top-down') methods in which the system structure is assumed a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
... what may be a large impact of this on the control structure. In the absence of compartmentation or channelling, such cycles also serve to connect segments of metabolism usually considered rather distant from each other. Simplified (`top-down') methods in which the system structure is assumed a priori often will not work to give unequivocal answers for complex systems where the combinatorial explosion of possible interactions requires much more sophisticated methods for system identification. Dual-inhibitor titrations can reveal unsuspected direct kinetic interactions between individual catalytic activities in appropriate cases, but these are cleanly apparent only in the regime of large changes (such that experimental studies in which only small perturbations are revealed will cause (or allow) them to be missed). No example exists in which one can extrapolate the conventional control coefficients to provide reliable and quantitative predictions a pri

