Results 1 - 10
of
132
Mining the Biomedical Literature in the Genomic Era: An Overview
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2003
"... The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last f ..."
Abstract
-
Cited by 72 (2 self)
- Add to MetaCart
The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last few years there is a lot of interest within the scientific community in literature-mining tools to help sort through this abundance of literature, and find the nuggets of information most relevant and useful for specific analysis tasks. This paper
PatternHunter II: Highly Sensitive and Fast Homology Search
, 2003
"... Extending the single optimized spaced seed of PatternHunter [20] to multiple ones, PatternHunter II simultaneously remedies the lack of sensitivity of Blastn and the lack of speed of SmithWaterman, for homology search. At Blastn speed, PatternHunter II approaches Smith-Waterman sensitivity, bring ..."
Abstract
-
Cited by 71 (12 self)
- Add to MetaCart
Extending the single optimized spaced seed of PatternHunter [20] to multiple ones, PatternHunter II simultaneously remedies the lack of sensitivity of Blastn and the lack of speed of SmithWaterman, for homology search. At Blastn speed, PatternHunter II approaches Smith-Waterman sensitivity, bringing homology search technology back to a full circle.
Duplication models for biological networks
- Journal of Computational Biology
, 2003
"... Are biological networks different from other large complex networks? Both large biological and nonbiological networks exhibit power-law graphs (number of nodes with degree k, N.k / � k ¡ ¯), yet the exponents, ¯, fall into different ranges. This may be because duplication of the information in the ..."
Abstract
-
Cited by 45 (4 self)
- Add to MetaCart
Are biological networks different from other large complex networks? Both large biological and nonbiological networks exhibit power-law graphs (number of nodes with degree k, N.k / � k ¡ ¯), yet the exponents, ¯, fall into different ranges. This may be because duplication of the information in the genome is a dominant evolutionary force in shaping biological networks (like gene regulatory networks and protein–protein interaction networks) and is fundamentally different from the mechanisms thought to dominate the growth of most nonbiological networks (such as the Internet). The preferential choice models used for nonbiological networks like web graphs can only produce power-law graphs with exponents greater than 2. We use combinatorial probabilistic methods to examine the evolution of graphs by node duplication processes and derive exact analytical relationships between the exponent of the power law and the parameters of the model. Both full duplication of nodes (with all their connections) as well as partial duplication (with only some connections) are analyzed. We demonstrate that partial duplication can produce power-law graphs with exponents less than 2, consistent with current data on biological networks. The power-law exponent for large graphs depends only on the growth process, not on the starting graph.
What is bioinformatics? A proposed definition and overview of the field
"... BACKGROUND: The recent flood of data from genome sequencing and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. OBJECTIVES: Here we propose a definition for this new field and review some the research that is being pursued, p ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
BACKGROUND: The recent flood of data from genome sequencing and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. OBJECTIVES: Here we propose a definition for this new field and review some the research that is being pursued, particularly in relation to transcriptional regulatory systems. METHODS: Our definition is as follows: Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. RESULTS & CONCLUSIONS: Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments (eg expression data). Additional information includes the text of scientific papers and "relationship data" from metabolic pathways, taxonomy trees, and proteinprotein interaction networks. Bioinformatics employs a wide range of computational topics including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, prediction of protein structure and function, gene finding, and expression data clustering. The emphasis is on approaches that integrate a variety of computational techniques and heterogeneous data sources. Finally, bioinformatics is a practical discipline. We survey some representative applications, such as finding homologues, designing drugs, and performing large-scale censuses. Additional information pertinent to the review is available over the w...
LGL: creating a map of protein function with an algorithm for visualizing very large biological networks
- Journal of Molecular Biology
, 2004
"... Supplementary data associated with this article can be found at doi: 10.1016/j.jmb.2004.04.047 ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Supplementary data associated with this article can be found at doi: 10.1016/j.jmb.2004.04.047
The Haplotyping Problem: An Overview of Computational Models and Solutions
- Journal of Computer Science and Technology
, 2003
"... The investigation of genetic di#erences among humans has given evidence that mutations in DNA sequences are responsible for some genetic diseases. The most common mutation is the one that involves only a single nucleotide of the DNA sequence, which is called a single nucleotide polymorphism (SNP) ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
The investigation of genetic di#erences among humans has given evidence that mutations in DNA sequences are responsible for some genetic diseases. The most common mutation is the one that involves only a single nucleotide of the DNA sequence, which is called a single nucleotide polymorphism (SNP). As a consequence, computing a complete map of all SNPs occurring in the human populations is one of the primary goals of recent studies in human genomics. The construction of such a map requires to determine the DNA sequences that from all chromosomes. In diploid organisms like humans, each chromosome consists of two sequences called haplotypes.
Prediction of human protein function from post-translational modifications and localization features
- J Mol Biol
, 2002
"... Out of the 35,000 to 50,000 genes believed to be present in the human genome, no more than 40–60 % can be assigned a functional role based on homology to proteins with known function. 1,2 ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
Out of the 35,000 to 50,000 genes believed to be present in the human genome, no more than 40–60 % can be assigned a functional role based on homology to proteins with known function. 1,2
Ovcharenko I: rVista 2.0: evolutionary analysis of transcription factor binding sites
- Nucleic Acids Res 2004, 32(Web Server
"... Identifying and characterizing the transcription factor binding site (TFBS) patterns of cis-regulatory elements represents a challenge, but holds promise to reveal the regulatory language the genome uses to dictate transcriptional dynamics. Several studies have demonstrated that regulatory modules a ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
Identifying and characterizing the transcription factor binding site (TFBS) patterns of cis-regulatory elements represents a challenge, but holds promise to reveal the regulatory language the genome uses to dictate transcriptional dynamics. Several studies have demonstrated that regulatory modules are under positive selection and, therefore, are often conserved between related species. Using this evolutionary principle, we have created a comparative tool, rVISTA, for analyzing the regulatory potential of noncoding sequences. Our ability to experimentally identify functional noncoding sequences is extremely limited, therefore, rVISTA attempts to fill this great gap in genomic analysis by offering a powerful approach for eliminating TFBSs least likely to be biologically relevant. The rVISTA tool combines TFBS predictions, sequence comparisons and cluster analysis to identify noncoding DNA regions that are evolutionarilyconservedandpresentinaspecificconfiguration within genomic sequences. Here, we present the newly developed version 2.0 of the rVISTA tool, which can process alignments generated by both the zPicture and blastz alignment programs or use pre-computed pairwise alignments of several vertebrate genomes available from the ECR Browser and GALA database. The rVISTA web server is closely interconnected with the TRANSFAC database, allowing users to either search for matrices present in the TRANSFAC library collection or search for userdefined consensus sequences. The rVISTA tool is publicly available at
Pattern Recognition Techniques in Microarray Data Analysis: A Survey. Annals of the New York Academy of Sciences
- of Sciences, techniques in Bioinformatics and Medical Informatics
, 2002
"... analysis Abstract: Recent development of technologies (e.g. microarray technology) that are capable of producing massive amounts of genetic data has highlighted the need for new pattern recognition techniques that can mine and discover “biologically meaningful ” knowledge in large data sets. Many re ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
analysis Abstract: Recent development of technologies (e.g. microarray technology) that are capable of producing massive amounts of genetic data has highlighted the need for new pattern recognition techniques that can mine and discover “biologically meaningful ” knowledge in large data sets. Many researchers have begun an endeavor in this direction to devise such datamining techniques. As such, there is a need for survey articles that periodically review and summarize the work that has been done in the area. This article presents one such survey. The first portion of the paper is meant to provide the basic biology (mostly for non-biologists) that is required in such a project. This part is only meant to be a starting point for those experts in the technical fields who wish to embark on this new area of bioinformatics. The second portion of the paper is a survey of various data mining techniques that have been used in mining microarray data for biological knowledge and information (such as sequence information). This survey is not meant to be treated as complete in any form, as the area is currently one of the most active, and the body of research is very large. Furthermore, the applications of the techniques mentioned here are not meant to be taken as the most significant applications of the techniques, but simply as some examples among many. Molecular Genome Biology
Tests for Gene Clustering
, 2002
"... Comparing chromosomal gene order in two or more related species is an important approach to studying the forces that guide genome organization and evolution. Linked clusters of similar genes found in related genomes are often used to support arguments of evolutionary relatedness or functional select ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
Comparing chromosomal gene order in two or more related species is an important approach to studying the forces that guide genome organization and evolution. Linked clusters of similar genes found in related genomes are often used to support arguments of evolutionary relatedness or functional selection. However, as the gene order and the gene complement of sister genomes diverge progressively due to large scale rearrangements, horizontal gene transfer, gene duplication and gene loss, it becomes increasingly difficult to determine whether observed similarities in local genomic structure are indeed remnants of common ancestral gene order, or are merely coincidences.

