Results 1  10
of
175
The group Lasso for logistic regression
 Journal of the Royal Statistical Society, Series B
, 2008
"... Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regressi ..."
Abstract

Cited by 276 (11 self)
 Add to MetaCart
(Show Context)
Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a twostage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the twostage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences.
The grouplasso for generalized linear models: uniqueness of solutions and efficient
, 2008
"... The GroupLasso method for finding important explanatory factors suffers from the potential nonuniqueness of solutions and also from high computational costs. We formulate conditions for the uniqueness of GroupLasso solutions which lead to an easily implementable test procedure that allows us to i ..."
Abstract

Cited by 70 (0 self)
 Add to MetaCart
(Show Context)
The GroupLasso method for finding important explanatory factors suffers from the potential nonuniqueness of solutions and also from high computational costs. We formulate conditions for the uniqueness of GroupLasso solutions which lead to an easily implementable test procedure that allows us to identify all potentially active groups. These results are used to derive an efficient algorithm that can deal with input dimensions in the millions and can approximate the solution path efficiently. The derived methods are applied to largescale learning problems where they exhibit excellent performance and where the testing procedure helps to avoid misinterpretations of the solutions. 1.
Human Splicing Finder: an online bioinformatics tool to predict splicing signals
 Nucleic Acids Res
, 2009
"... to predict splicing signals ..."
(Show Context)
Gene prediction with conditional random fields
, 2005
"... Given a sequence of DNA nucleotide bases, the task of gene prediction is to find subsequences of bases that encode proteins. Reasonable performance on this task has been achieved using generatively trained sequence models, such as hidden Markov models. We propose instead the use of a discriminitivel ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
(Show Context)
Given a sequence of DNA nucleotide bases, the task of gene prediction is to find subsequences of bases that encode proteins. Reasonable performance on this task has been achieved using generatively trained sequence models, such as hidden Markov models. We propose instead the use of a discriminitively trained sequence model, the conditional random field (CRF). CRFs can naturally incorporate arbitrary, nonindependent features of the input without making conditional independence assumptions among the features. This can be particularly important for gene finding, where including evidence from protein databases, EST data, or tiling arrays may improve accuracy. We evaluate our model on human genomic data, and show that CRFs perform better than HMMbased models at incorporating homology evidence from protein databases, achieving a 10 % reduction in baselevel errors. 1
Statistical analysis strategies for association studies involving rare variants.
 Nature Reviews Genetics,
, 2010
"... ..."
Conserved RNA secondary structures promote alternative splicing
 RNA
, 2008
"... Conserved RNA secondary structures promote alternative splicing ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
(Show Context)
Conserved RNA secondary structures promote alternative splicing
Intrinsic differences between authentic and cryptic 5 0 splice sites
 Nucleic Acids Res
, 2003
"... Cryptic splice sites are used only when use of a natural splice site is disrupted by mutation. To determine the features that distinguish authentic from cryptic 5 ¢ splice sites (5¢ss), we systematically analyzed a set of 76 cryptic 5¢ss derived from 46 human genes. These cryptic 5¢ss have a similar ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
(Show Context)
Cryptic splice sites are used only when use of a natural splice site is disrupted by mutation. To determine the features that distinguish authentic from cryptic 5 ¢ splice sites (5¢ss), we systematically analyzed a set of 76 cryptic 5¢ss derived from 46 human genes. These cryptic 5¢ss have a similar frequency distribution in exons and introns, and are usually located close to the authentic 5¢ss. Statistical analysis of the strengths of the 5¢ss using the Shapiro and Senapathy matrix revealed that authentic 5¢ss have signi®cantly higher score values than cryptic 5¢ss, which in turn have higher values than the mutant ones. bGlobin provides an interesting exception to this rule, so we chose it for detailed experimental analysis in vitro. We found that the sequences of the bglobin authentic and cryptic 5¢ss, but not their surrounding context, determine the correct 5¢ss choice, although their respective scores do not re¯ect this functional difference. Our analysis provides a statistical basis to explain the competitive advantage of authentic over cryptic 5¢ss in most cases, and should facilitate the development of tools to reliably predict the effect of diseaseassociated 5¢ssdisrupting mutations at the mRNA level.
HOLLYWOOD: a comparative relational database of alternative splicing
 Nucleic Acids Res
, 2006
"... splicing ..."
(Show Context)
Online Learning for Group Lasso
"... We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batchmode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batchmode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs in an online mode and scales well: at each iteration one can update the weight vector according to a closedform solution based on the average of previous subgradients. Therefore, the proposed online algorithm can be very efficient and scalable. This is guaranteed by its low worstcase time complexity and memory cost both in the order of O(d), where d is the number of dimensions. Moreover, in order to achieve more sparsity in both the group level and the individual feature level, we successively extend our online system to efficiently solve a number of variants of sparse group lasso models. We also show that the online system is applicable to other group lasso models, such as the group lasso with overlap and graph lasso. Finally, we demonstrate the merits of our algorithm by experimenting with both synthetic and realworld datasets. 1.
SpliceMachine: Predicting splice sites from highdimensional local context representations,”
 Bioinformatics,
, 2005
"... ABSTRACT Motivation: In this age of complete genome sequencing, finding the location and structure of genes is crucial for further molecular research. The accurate prediction of intron boundaries largely facilitates the correct prediction of gene structure in nuclear genomes. Many tools for localiz ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
ABSTRACT Motivation: In this age of complete genome sequencing, finding the location and structure of genes is crucial for further molecular research. The accurate prediction of intron boundaries largely facilitates the correct prediction of gene structure in nuclear genomes. Many tools for localizing these boundaries on DNA sequences have been developed and are available to researchers through the internet. Nevertheless, these tools still make many false positive predictions. Results: This manuscript presents a novel publicly available splice site prediction tool named SpliceMachine that (i) shows stateoftheart prediction performance on Arabidopsis thaliana and human sequences, (ii) performs a computationally fast annotation, and (iii) can be trained by the user on its own data.