Results 1  10
of
91
Greedy Function Approximation: A Gradient Boosting Machine
 Annals of Statistics
, 2000
"... Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additi ..."
Abstract

Cited by 951 (12 self)
 Add to MetaCart
Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additive expansions based on any tting criterion. Specic algorithms are presented for least{squares, least{absolute{deviation, and Huber{M loss functions for regression, and multi{class logistic likelihood for classication. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such \TreeBoost" models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classication, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire 1996, and Frie...
Text Categorization Based on Regularized Linear Classification Methods
 Information Retrieval
, 2000
"... A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of doc ..."
Abstract

Cited by 113 (3 self)
 Add to MetaCart
(Show Context)
A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of document vectors from its complement. However, support vector machines are so far considered special in that they have been demonstrated to achieve the state of the art performance. It is therefore worthwhile to understand whether such good performance is unique to the SVM design, or if it can also be achieved by other linear classification methods. In this paper, we compare a number of known linear classification methods as well as some variants in the framework of regularized linear systems. We will discuss the statistical and numerical properties of these algorithms, with a focus on text categorization. We will also provide some numerical experiments to illustrate these algorithms on a number of datasets.
Classifier technology and the illusion of progress. Statist
 Sci
, 2006
"... Abstract. A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to ..."
Abstract

Cited by 83 (2 self)
 Add to MetaCart
(Show Context)
Abstract. A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to establish the relative superiority of these methods. This paper argues that these comparisons often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illusion. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm.
On clustering of fMRI time series
, 1997
"... Introduction. The spatiotemporal fMRI signal is a combination of several interacting components: The locally correlated hemodynamic response, the network of neuronal activations, and global components such as the cardiac cycle, breathing etc. A priori this implies that the signal is correlated in t ..."
Abstract

Cited by 69 (3 self)
 Add to MetaCart
(Show Context)
Introduction. The spatiotemporal fMRI signal is a combination of several interacting components: The locally correlated hemodynamic response, the network of neuronal activations, and global components such as the cardiac cycle, breathing etc. A priori this implies that the signal is correlated in time and space, and that these correlations have both short and long range components. Clustering is a classical nonparametric approach to explorative analysis data. By clustering we can group signals according to a given objective function. Clustering of waveforms has already been used in fMRI signal analysis, see e.g. (1). Clustering of stochastic data, however, is hard optimization problem with many potential pitfalls. The "optimal" cluster configuration depends on the particular choice of clustering scheme (e.g. kmeans, kmedians, hierachical clustering) examples are legio (2), but just as importantly on the choice of distance metr
A Framework For Computational Anatomy
, 2002
"... The rapid collection of brain images from healthy and diseased subjects has stimulated the development of powerful mathematical algorithms to compare, pool and average brain data across whole populations. Brain structure is so complex and variable that new approaches in computer vision, partial diff ..."
Abstract

Cited by 47 (16 self)
 Add to MetaCart
The rapid collection of brain images from healthy and diseased subjects has stimulated the development of powerful mathematical algorithms to compare, pool and average brain data across whole populations. Brain structure is so complex and variable that new approaches in computer vision, partial differential equations, and statistical field theory are being formulated to detect and visualize diseasespecific patterns. We present some novel mathematical strategies for computational anatomy, focusing on the creation of populationbased brain atlases. These atlases describe how the brain varies with age, gender, genetics, and over time. We review applications in Alzheimer's disease, schizophrenia and brain development, outlining some current challenges in the field.
Hierarchical Clustering of SelfOrganizing Maps for Cloud Classification
 Neurocomputing
, 2000
"... This paper presents a new method for segmenting multispectral satellite images. The proposed method is unsupervised and consists of two steps. During the rst step the pixels of a learning set are summarized by a set of codebook vectors using a Probabilistic SelfOrganizing Map (PSOM, [9]) In a secon ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
This paper presents a new method for segmenting multispectral satellite images. The proposed method is unsupervised and consists of two steps. During the rst step the pixels of a learning set are summarized by a set of codebook vectors using a Probabilistic SelfOrganizing Map (PSOM, [9]) In a second step the codebook vectors of the map are clustered using Agglomerative Hierarchical Clustering (AHC, [7]). Each pixel takes the label of its nearest codebook vector. A practical application to Meteosat images illustrates the relevance of our approach.
Neural Minimal Distance Methods
 PROC. 3RD CONF. ON NEURAL NETWORKS AND THEIR APPLICATIONS
, 1997
"... Minimal distance methods are simple and in some circumstances highly accurate. In this paper relations between neural and minimal distance methods are investigated. Neural realization facilitates new versions of minimal distance methods. Parametrization of distance functions, distancebased weighti ..."
Abstract

Cited by 14 (13 self)
 Add to MetaCart
Minimal distance methods are simple and in some circumstances highly accurate. In this paper relations between neural and minimal distance methods are investigated. Neural realization facilitates new versions of minimal distance methods. Parametrization of distance functions, distancebased weighting of neighbors, active selection of reference vectors from the training set and relations to the casebased reasoning are discussed.
Comparing methods of analyzing fmri statistical parametric maps
 NeuroImage
, 2004
"... Approaches for the analysis of statistical parametric maps (SPMs) can be crudely grouped into three main categories in which different philosophies are applied to delineate activated regions. These being type I error control thresholding, false discovery rate (FDR) control thresholding and posterior ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Approaches for the analysis of statistical parametric maps (SPMs) can be crudely grouped into three main categories in which different philosophies are applied to delineate activated regions. These being type I error control thresholding, false discovery rate (FDR) control thresholding and posterior probability thresholding. To better understand the properties of these main approaches, we carried out a simulation study to compare the approaches as they would be used on real data sets. Using default settings, we find that posterior probability thresholding is the most powerful approach, and type I error control thresholding provides the lowest levels of type I error. False discovery rate control thresholding performs in between the other approaches for both these criteria, although for some parameter settings this approach can approximate the performance of posterior probability thresholding. Based on these results, we discuss the relative merits of the three approaches in an attempt to decide upon an optimal approach. We conclude that viewing the problem of delineating areas of activation as a classification problem provides a highly interpretable framework for comparing the methods. Within this framework, we highlight the role of the loss function, which explicitly penalizes the types of errors that may occur in a given analysis.
The Centrality of
, 1992
"... This Article is brought to you for free and open access by the Biochemistry, Department of at DigitalCommons@University of Nebraska Lincoln. It ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
This Article is brought to you for free and open access by the Biochemistry, Department of at DigitalCommons@University of Nebraska Lincoln. It