Results 1 - 10
of
10
Large scale multiple kernel learning
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We s ..."
Abstract
-
Cited by 129 (13 self)
- Add to MetaCart
While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm works for hundred thousands of examples or hundreds of kernels to be combined, and helps for automatic model selection, improving the interpretability of the learning result. In a second part we discuss general speed up mechanism for SVMs, especially when used with sparse feature maps as appear for string kernels, allowing us to train a string kernel SVM on a 10 million real-world splice data set from computational biology. We integrated multiple kernel learning in our machine learning toolbox SHOGUN for which the source code is publicly available at
BioMed Central
, 2006
"... A novel approach to phylogenetic tree construction using stochastic optimization and clustering ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A novel approach to phylogenetic tree construction using stochastic optimization and clustering
Rätsch G: PALMA: mRNA to genome alignments using large margin algorithms
- Bioinformatics
"... Motivation: Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. Results: We present a novel approach based on large margin lea ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Motivation: Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. Results: We present a novel approach based on large margin learning that combines accurate splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm – called PALMA – tunes the parameters of the model such that true alignments score higher than other alignments. We study the accuracy of alignments of mRNAs containing artificially generated micro-exons to genomic DNA. In a carefully designed experiment, we show that our algorithm accurately identifies the intron boundaries as well as boundaries of the optimal local alignment. It outperforms all other methods: for 5702 artificially shortened EST sequences from C. elegans and human it correctly identifies the intron boundaries in all except two cases. The best other method is a recently proposed method called exalin which misaligns 37 of the sequences. Our method also demonstrates robustness to mutations, insertions and deletions, retaining accuracy even at high noise levels. Availability: Datasets for training, evaluation and testing, additional results and a stand-alone alignment tool implemented in C++ and python are available at
TRANSCRIPT NORMALIZATION AND SEGMENTATION OF TILING ARRAY DATA
"... For the analysis of transcriptional tiling arrays we have developed two methods based on stateof-the-art machine learning algorithms. First, we present a novel transcript normalization technique to alleviate the effect of oligonucleotide probe sequences on hybridization intensity. It is specifically ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
For the analysis of transcriptional tiling arrays we have developed two methods based on stateof-the-art machine learning algorithms. First, we present a novel transcript normalization technique to alleviate the effect of oligonucleotide probe sequences on hybridization intensity. It is specifically designed to decrease the variability observed for individual probes complementary to the same transcript. Applying this normalization technique to Arabidopsis tiling arrays, we are able to reduce sequence biases and also significantly improve separation in signal intensity between exonic and intronic/intergenic probes. Our second contribution is a method for transcript mapping. It extends an algorithm proposed for yeast tiling arrays to the more challenging task of spliced transcript identification. When evaluated on raw versus normalized intensities our method achieves highest prediction accuracy when segmentation is performed on transcriptnormalized tiling array data. Datasets, software and the appendix are available for download at
Exploring Alternative Splicing Features using Support Vector Machines
"... Alternative splicing is a mechanism for generating different gene transcripts (called isoforms) from the same genomic sequence. Finding alternative splicing events experimentally is both expensive and time consuming. Computational methods, in general, and machine learning algorithms, in particular, ..."
Abstract
- Add to MetaCart
Alternative splicing is a mechanism for generating different gene transcripts (called isoforms) from the same genomic sequence. Finding alternative splicing events experimentally is both expensive and time consuming. Computational methods, in general, and machine learning algorithms, in particular, can be used to complement experimental methods in the process of identifying alternative splicing events. In this paper, we explore the predictive power of a rich set of features that have been experimentally shown to affect alternative splicing. We use these features to build support vector machine (SVM) classifiers for distinguishing between alternatively spliced exons and constitutive exons. Our results show that simple linear SVM classifiers built from a rich set of features give results comparable to those of more sophisticated SVM classifiers that use more basic sequence features. Furthermore, we use feature selection methods to identify computationally the most informative features for the prediction problem considered. 1.
The Feature Importance Ranking Measure
"... Abstract. Most accurate predictions are typically obtained by learning machines with complex feature spaces (as e.g. induced by kernels). Unfortunately, such decision rules are hardly accessible to humans and cannot easily be used to gain insights about the application domain. Therefore, one often r ..."
Abstract
- Add to MetaCart
Abstract. Most accurate predictions are typically obtained by learning machines with complex feature spaces (as e.g. induced by kernels). Unfortunately, such decision rules are hardly accessible to humans and cannot easily be used to gain insights about the application domain. Therefore, one often resorts to linear models in combination with variable selection, thereby sacrificing some predictive power for presumptive interpretability. Here, we introduce the Feature Importance Ranking Measure (FIRM), which by retrospective analysis of arbitrary learning machines allows to achieve both excellent predictive performance and superior interpretation. In contrast to standard raw feature weighting, FIRM takes the underlying correlation structure of the features into account. Thereby, it is able to discover the most relevant features, even if their appearance in the training data is entirely prevented by noise. The desirable properties of FIRM are investigated analytically and illustrated in simulations. 1
METHOD Open Access
"... SpliceGrapher: detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data ..."
Abstract
- Add to MetaCart
SpliceGrapher: detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data
Gunnar Rätsch
, 2009
"... Most accurate predictions are typically obtained by learning machines with complex feature spaces (as e.g. induced by kernels). Unfortunately, such decision rules are hardly accessible to humans and cannot easily be used to gain insights about the application domain. Therefore, one often resorts to ..."
Abstract
- Add to MetaCart
Most accurate predictions are typically obtained by learning machines with complex feature spaces (as e.g. induced by kernels). Unfortunately, such decision rules are hardly accessible to humans and cannot easily be used to gain insights about the application domain. Therefore, one often resorts to linear models in combination with variable selection, thereby sacrificing some predictive power for presumptive interpretability. Here, we introduce the Feature Importance Ranking Measure (FIRM), which by retrospective analysis of arbitrary learning machines allows to achieve both excellent predictive performance and superior interpretation. In contrast to standard raw feature weighting, FIRM takes the underlying correlation structure of the features into account. Thereby, it is able to discover the most relevant features, even if their appearance in the training data is entirely prevented by noise. The desirable properties of FIRM are investigated analytically and illustrated in simulations. 1

