Results 1  10
of
56
Greedy Function Approximation: A Gradient Boosting Machine
 Annals of Statistics
, 2000
"... Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additi ..."
Abstract

Cited by 951 (12 self)
 Add to MetaCart
Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additive expansions based on any tting criterion. Specic algorithms are presented for least{squares, least{absolute{deviation, and Huber{M loss functions for regression, and multi{class logistic likelihood for classication. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such \TreeBoost" models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classication, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire 1996, and Frie...
Optimally tuned iterative reconstruction algorithms for compressed sensing
 Selected Topics in Signal Processing
"... Abstract — We conducted an extensive computational experiment, lasting multiple CPUyears, to optimally select parameters for two important classes of algorithms for finding sparse solutions of underdetermined systems of linear equations. We make the optimally tuned implementations available at spar ..."
Abstract

Cited by 66 (4 self)
 Add to MetaCart
(Show Context)
Abstract — We conducted an extensive computational experiment, lasting multiple CPUyears, to optimally select parameters for two important classes of algorithms for finding sparse solutions of underdetermined systems of linear equations. We make the optimally tuned implementations available at sparselab.stanford.edu; they run ‘out of the box ’ with no user tuning: it is not necessary to select thresholds or know the likely degree of sparsity. Our class of algorithms includes iterative hard and soft thresholding with or without relaxation, as well as CoSaMP, subspace pursuit and some natural extensions. As a result, our optimally tuned algorithms dominate such proposals. Our notion of optimality is defined in terms of phase transitions, i.e. we maximize the number of nonzeros at which the algorithm can successfully operate. We show that the phase transition is a welldefined quantity with our suite of random underdetermined linear systems. Our tuning gives the highest transition possible within each class of algorithms. We verify by extensive computation the robustness of our recommendations to the amplitude distribution of the nonzero coefficients as well as the matrix ensemble defining the underdetermined system. Our findings include: (a) For all algorithms, the worst amplitude distribution for nonzeros is generally the constantamplitude randomsign distribution, where all nonzeros are the same amplitude. (b) Various random matrix ensembles give the same phase transitions; random partial isometries may give different transitions and require different tuning; (c) Optimally tuned subspace pursuit dominates optimally tuned CoSaMP, particularly so when the system is almost square. I.
1 Dictionary Learning for Sparse Approximations with the Majorization Method
"... Abstract—In order to find sparse approximations of signals, an appropriate generative model for the signal class has to be known. If the model is unknown, it can be adapted using a set of training samples. This paper presents a novel method for dictionary learning and extends the learning problem by ..."
Abstract

Cited by 50 (10 self)
 Add to MetaCart
(Show Context)
Abstract—In order to find sparse approximations of signals, an appropriate generative model for the signal class has to be known. If the model is unknown, it can be adapted using a set of training samples. This paper presents a novel method for dictionary learning and extends the learning problem by introducing different constraints on the dictionary. The convergence of the proposed method to a fixed point is guaranteed, unless the accumulation points form a continuum. This holds for different sparsity measures. The majorization method is an optimization method that substitutes the original objective function with a surrogate function that is updated in each optimization step. This method has been used successfully in sparse approximation and statistical estimation (e.g. Expectation Maximization (EM)) problems. This paper shows that the majorization method can be used for the dictionary learning problem too. The proposed method is compared with other methods on both synthetic and real data and different constraints on the dictionary are compared. Simulations show the advantages of the proposed method over other currently available dictionary learning methods not only in terms of average performance but also in terms of computation time.
The Analysis Of Foreign Exchange Data Using Waveform Dictionaries
 Journal of Empirical Finance
, 1995
"... . This paper uses waveform dictionaries to decompose the signals contained within three foreign exchange rates using tickbytick observations obtained world wide. The three exchange rates examined are the Japanese Yen and the German Deutsche Mark against the U.S. dollar and the Deutsche Mark agains ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
(Show Context)
. This paper uses waveform dictionaries to decompose the signals contained within three foreign exchange rates using tickbytick observations obtained world wide. The three exchange rates examined are the Japanese Yen and the German Deutsche Mark against the U.S. dollar and the Deutsche Mark against the Yen. The data were provided by Olsen Associates. A waveform dictionary is a class of transforms that generalizes both windowed Fourier transforms and wavelets. Each wave form is parameterized by location, frequency, and scale. Such transforms can analyze signals that have highly localized structures in either time or frequency space as well as broad band structures; that is, waveforms can, in principle, detect everything from shocks represented by Dirac Delta functions, to "chirps", short bursts of energy within a narrow band of frequencies, to the presence of frequencies that occur sporadically, and finally to the presence of frequencies that hold over the entire observed period. Wave...
Robust sampling and reconstruction methods for sparse signals in the precense of impulsive noise
, 2010
"... Recent results in compressed sensing show that a sparse or compressible signal can be reconstructed from a few incoherent measurements. Since noise is always present in practical data acquisition systems, sensing, and reconstruction methods are developed assuming a Gaussian (lighttailed) model for ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
(Show Context)
Recent results in compressed sensing show that a sparse or compressible signal can be reconstructed from a few incoherent measurements. Since noise is always present in practical data acquisition systems, sensing, and reconstruction methods are developed assuming a Gaussian (lighttailed) model for the corrupting noise. However, when the underlying signal and/or the measurements are corrupted by impulsive noise, commonly employed linear sampling operators, coupled with current reconstruction algorithms, fail to recover a close approximation of the signal. In this paper, we propose robust methods for sampling and reconstructing sparse signals in the presence of impulsive noise. To solve the problem of impulsive noise embedded in the underlying signal prior the measurement process, we propose a robust nonlinear measurement operator based on the weighed myriad estimator. In addition, we introduce a geometric optimization problem based on 1 minimization employing a Lorentzian norm constraint on the residual error to recover sparse signals from noisy measurements. Analysis of the proposed methods show that in impulsive environments when the noise posses infinite variance we have a finite reconstruction error and furthermore these methods yield successful reconstruction of the desired signal. Simulations demonstrate that the proposed methods significantly outperform commonly employed compressed sensing sampling and reconstruction techniques in impulsive environments, while providing comparable performance in less demanding, lighttailed environments.
Some theory for generalized boosting algorithms
 J. Machine Learning Research
, 2006
"... We give a review of various aspects of boosting, clarifying the issues through a few simple results, and relate our work and that of others to the minimax paradigm of statistics. We consider the population version of the boosting algorithm and prove its convergence to the Bayes classifier as a corol ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
We give a review of various aspects of boosting, clarifying the issues through a few simple results, and relate our work and that of others to the minimax paradigm of statistics. We consider the population version of the boosting algorithm and prove its convergence to the Bayes classifier as a corollary of a general result about GaussSouthwell optimization in Hilbert space. We then investigate the algorithmic convergence of the sample version, and give bounds to the time until perfect separation of the sample. We conclude by some results on the statistical optimality of the L2 boosting.
M.: Parametric Dictionary Design for Sparse Coding
 IEEE Trans. on Signal Processing
, 2009
"... Abstract—This paper introduces a new dictionary design method for sparse coding of a class of signals. It has been shown that one can sparsely approximate some natural signals using an overcomplete set of parametric functions, e.g. [1], [2]. A problem in using these parametric dictionaries is how to ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
Abstract—This paper introduces a new dictionary design method for sparse coding of a class of signals. It has been shown that one can sparsely approximate some natural signals using an overcomplete set of parametric functions, e.g. [1], [2]. A problem in using these parametric dictionaries is how to choose the parameters. In practice these parameters have been chosen by an expert or through a set of experiments. In the sparse approximation context, it has been shown that an incoherent dictionary is appropriate for the sparse approximation methods. In this paper we first characterize the dictionary design problem, subject to a constraint on the dictionary. Then we briefly explain that equiangular tight frames have minimum coherence. The complexity of the problem does not allow it to be solved exactly. We introduce a practical method to approximately solve it. Some experiments show the advantages one gets by using these dictionaries.
Multiscale Document Page Segmentation Using Soft Decision Integration
, 1997
"... A new algorithm for layout independent document image segmentation is suggested. Text, image and graphics regions in a document image are treated as three different "texture" classes. Feature vectors based on multiscale wavelet packet representation are used for local classification. Segm ..."
Abstract

Cited by 14 (8 self)
 Add to MetaCart
A new algorithm for layout independent document image segmentation is suggested. Text, image and graphics regions in a document image are treated as three different "texture" classes. Feature vectors based on multiscale wavelet packet representation are used for local classification. Segmentation is performed by propagating soft local decisions made on small windows across neighboring blocks and integrating them to reduce their "ambiguities" and increase their "confidence" as more contextual evidence is obtained from the image data. Local votes propagate in a neighborhood, within and across scales, and majorities of weighted votes give the final decisions. The method has been tested on document page decomposition tasks, and the results of these tests are presented. The algorithm is general, can be applied to other segmentation and classification tasks, is based on parallel, distributed and independent computations and has low complexity. Keywords: Document Processing, Multiscale Anal...