Results 1 - 10
of
55
Mining the Biomedical Literature in the Genomic Era: An Overview
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2003
"... The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last f ..."
Abstract
-
Cited by 75 (2 self)
- Add to MetaCart
The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last few years there is a lot of interest within the scientific community in literature-mining tools to help sort through this abundance of literature, and find the nuggets of information most relevant and useful for specific analysis tasks. This paper
Associating Genes with Gene Ontology Codes Using a Maximum Entropy Analysis of Biomedical Literature
, 2002
"... this paper but has been provided elsewhere (Ratnaparkhi 1997; Manning and Schutze 1999) ..."
Abstract
-
Cited by 58 (3 self)
- Add to MetaCart
this paper but has been provided elsewhere (Ratnaparkhi 1997; Manning and Schutze 1999)
PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria
, 2003
"... ..."
Predicting Subcellular Localization of Proteins using Machine-Learned Classifiers
- Bioinformatics
, 2004
"... Motivation: Identifying the destination or localization of proteins is key to understanding their function and facilitating their purification. A number of existing computational prediction methods are based on sequence analysis. However, these methods are limited in scope, accuracy and most particu ..."
Abstract
-
Cited by 52 (4 self)
- Add to MetaCart
Motivation: Identifying the destination or localization of proteins is key to understanding their function and facilitating their purification. A number of existing computational prediction methods are based on sequence analysis. However, these methods are limited in scope, accuracy and most particularly breadth of coverage. Rather than using sequence information alone, we have explored the use of database text annotations from homologs and machine learning to substantially improve the prediction of subcellular location. Results: We have constructed five machine-learning classifiers for predicting subcellular localization of proteins from animals, plants, fungi, Gram-negative bacteria and Grampositive bacteria, which are 81 % accurate for fungi and 92– 94 % accurate for the other four categories.These are the most accurate subcellular predictors across the widest set of organisms ever published. Our predictors are part of the Proteome Analyst web-service.
Sequence Conserved for Subcellular Localization
, 2002
"... The more proteins diveins in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. He ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
The more proteins diveins in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. Here, we presented the first large-scale analysis of the relation between sequence similarity and identity in subcellular localization.
Multiclass multiple kernel learning
- In ICML. ACM
"... In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semi-infinite linear programs (SILPs) on toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets. 1.
Target space for structural genomics revisited
, 2002
"... Motivation: Structural genomics eventually aims at determining structures for all proteins. However, in the beginning experimentalists are likely to focus on globular proteins to achieve a rapid basic coverage of protein sequence space. How many proteins will structural genomics have to target? How ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
Motivation: Structural genomics eventually aims at determining structures for all proteins. However, in the beginning experimentalists are likely to focus on globular proteins to achieve a rapid basic coverage of protein sequence space. How many proteins will structural genomics have to target? How many proteins will be excluded since we already have structural information for these or since they are not globular? We have to answer these questions in the context of our target selection for the North-East Structural Genomics Consortium (NESG). Results: We estimated that structural information is available for about 6–38 % of all proteins; 6 % if we require high accuracy in comparative modelling, 38 % if we are satisfied with having a rough idea about the fold. Excluding all regions that are not globular, we found that structural genomics may have to target about 48 % of all proteins. This corresponded to a similar percentage of residues of the entire proteomes (52%). We explored a number of different strategies to cluster protein space in order to find the number of families representing these 48 % of structurally unknown proteins. For the subset of all entirely sequenced eukaryotes, we found over 18 000 fragment clusters each of which may be a suitable target for structural genomics. Availability: All data are available from the authors, most results are summarized at:
Predicting subcellular localization of proteins in a hybridization space
- Bioinformatics
, 2004
"... 1 Motivation: The localization of a protein in a cell is closely correlated with its biological function. With the number of sequences entering into databanks has been rapidly increasing, the importance of developing a powerful high-throughput tool to determine protein subcellular location has becom ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
1 Motivation: The localization of a protein in a cell is closely correlated with its biological function. With the number of sequences entering into databanks has been rapidly increasing, the importance of developing a powerful high-throughput tool to determine protein subcellular location has become self-evident. In view of this, the Nearest Neighbour Algorithm was developed for predicting the protein subcellular location using the strategy by hybridizing the information derived from the recent development in gene ontology with that from the functional domain composition

