## Sparsity and smoothness via the fused lasso (2005)

Venue: | Journal of the Royal Statistical Society Series B |

Citations: | 144 - 12 self |

### BibTeX

@ARTICLE{Tibshirani05sparsityand,

author = {Robert Tibshirani and Michael Saunders and Saharon Rosset and Ji Zhu and Keith Knight},

title = {Sparsity and smoothness via the fused lasso},

journal = {Journal of the Royal Statistical Society Series B},

year = {2005},

pages = {91--108}

}

### Years of Citing Articles

### OpenURL

### Abstract

The lasso (Tibshirani 1996) penalizes a least squares regression by the sum of the absolute values (L1 norm) of the coefficients. The form of this penalty encourages sparse solutions, that is, having many coefficients equal to zero. Here we propose the “fused lasso”, a generalization of the lasso designed for problems with features that can be ordered in some meaningful way. The fused lasso penalizes both the L1 norm of the coefficients and their successive differences. Thus it encourages both sparsity

### Citations

9539 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...ed. The full table appears in the technical report version of this paper. 10. Hinge loss For two-class problems the maximum margin approach used in the support vector classifier (Boser et al. 1992), (=-=Vapnik 1996-=-) is an attractive alternative to least squares. The maximum margin method can be expressed in terms of the “hinge” loss function (see e.g. Hastie et al. (2001), chapter 11). We minimize J(β0, β, ξ) =... |

1761 | Atomic Decomposition by Basis Pursuit - Chen, Donoho, et al. - 2001 |

1373 | A training algorithm for optimal margin classifiers
- Boser, Guyon, et al.
- 1992
(Show Context)
Citation Context ...guous blocks delineated. The full table appears in the technical report version of this paper. 10. Hinge loss For two-class problems the maximum margin approach used in the support vector classifier (=-=Boser et al. 1992-=-), (Vapnik 1996) is an attractive alternative to least squares. The maximum margin method can be expressed in terms of the “hinge” loss function (see e.g. Hastie et al. (2001), chapter 11). We minimiz... |

1289 | Molecular classification of cancer: class discovery and class prediction by gene-expression monitoring, Science 286 - Golub, Slonim, et al. - 1999 |

947 | The Elements of - Hastie, Tibshirani, et al. - 2009 |

799 | Least angle regression
- Efron, Hastie, et al.
(Show Context)
Citation Context ...t. a more restricted search is necessary. We first exploit the fact that the complete sequence of lasso and fusion problems can be solved efficiently using the LAR (least angle regression) procedure (=-=Efron et al. 2002-=-). The fusion problem is solved by first transforming X to Z = XL −1 with θ = Lβ, applying LAR and then transforming back. For a given problem, only some values of the bounds (s1, s2) will be attainab... |

537 |
Ridge Regression: Biased Estimation for Nonorthogonal Problems
- Hoerl, Kennard
- 2000
(Show Context)
Citation Context ... p may be larger then N, and typically is much larger than N in the applications that we consider. Many methods have been proposed for regularized or penalized regression, including ridge regression (=-=Hoerl & Kennard 1970-=-), partial least squares (Wold 1975) and principal components regression. Subset selection is more discrete, either including or excluding predictors from the model. The lasso (Tibshirani 1996) is sim... |

377 |
Diagnosis of multiple cancer types by shrunken centroids of gene expression
- Tibshirani, Hastie, et al.
- 2002
(Show Context)
Citation Context ... run the lasso on the full set of sites, and it produced error rates about the same as those reported for lasso here.] The results of various methods are shown in Table 2. Nearest shrunken centroids (=-=Tibshirani et al. 2001-=-) is essentially equivalent, in this two-class setting, to soft-thresholding of the univariate regression coefficients.8 A SIMULATION STUDY 22 Method Validation errors/108 df # sites s1 s2 Nearest sh... |

347 |
Regression shrinkage and selection via the
- Tibshirani
- 1996
(Show Context)
Citation Context ...n (Hoerl & Kennard 1970), partial least squares (Wold 1975) and principal components regression. Subset selection is more discrete, either including or excluding predictors from the model. The lasso (=-=Tibshirani 1996-=-) is similar to ridge regression, but uses the absolute values of the coefficients rather than their squares. The lasso finds the coefficients ˆ β = ( ˆ β1, ˆ β2, . . . ˆ βp) satisfying ˆβ = argmin ∑ ... |

305 | Estimation of the mean of a multivariate normal distribution - Stein - 1981 |

149 | Asymptotics for lasso-type estimators
- Knight, Fu
- 2000
(Show Context)
Citation Context ...d 2.0min 2000 200 Warm 16.6s Table 1: Timings for typical runs of fused lasso program 4. Asymptotic properties In this section we derive results for the fused lasso, analogous to those for the lasso (=-=Knight & Fu 2000-=-). The penalized least squares criterion N∑ i=1 (yi − x T i β) 2 + λ (1) N p∑ j=1 |βj| + λ (2) N p∑ |βj − βj−1| (6) j=2 with β = (β1, β2, . . . βp) T , xi = (xi1, xi2, . . . xip) T , and the Lagrange ... |

116 | Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359 - Petricoin, Ardekani, et al. - 2002 |

71 | Boosting as a regularized path to a maximum margin classifier
- Rosset, Zhu, et al.
- 2004
(Show Context)
Citation Context ...er mild (“non-redundancy”) conditions. This property extends to any convex loss function with a lasso penalty. It is proven explicitly, and the required non-redundancy conditions are spelled out, in (=-=Rosset et al. 2004-=-), Appendix A. The fused lasso turns out to have a similar sparsity property. Instead of applying to the number of non-zero coefficients, however, the sparsity property applies to the number of sequen... |

60 | Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy - Adam, Qu, et al. - 2002 |

12 |
On the asymptotics of convex stochastic optimization
- Geyer
- 1996
(Show Context)
Citation Context ... . j=25 SOFT-THRESHOLDING AND WAVELETS 15 Thus VN(u) →d V (u) (as defined above) with the finite dimensional convergence holding trivially. Since VN is convex and V has a unique minimum, it follows (=-=Geyer 1996-=-) that argmin(VN) = √ N( ̂ βN − β) →d argmin(V ). □ As a simple example, suppose that β1 = β2 ̸= 0. Then the joint limiting distribution of (√ N( β1N ̂ − β1), √ N( ̂ ) β2N − β2) will have probability ... |

11 | User’s guide for SQOPT 5.3: A fortran package for large-scale linear and quadratic programming - Gill, Murray, et al. - 1997 |

5 | Variable fusion: a new method of adaptive signal regression - Land, Friedman - 1996 |

4 |
Adaptable, efficient and robust methods for regression and classification via piecewise linear regularized coefficient paths
- Rosset, Zhu
- 2003
(Show Context)
Citation Context ..., and the set of active coefficients changes in a predictable way. One can show that the fused lasso solutions are piecewise linear functions as we move in a straight line in the (λ1, λ2) plane (see (=-=Rosset & Zhu 2003-=-)). Here (λ1, λ2) are the Lagrange multipliers corresponding to the bounds s1, s2. Hence it might be possible to develop a LARstyle algorithm for quickly solving the fused lasso problem along these st... |