Results 11  20
of
417
Machine learning for sequential data: A review
 Structural, Syntactic, and Statistical Pattern Recognition
, 2002
"... Abstract. Statistical learning problems in many fields involve sequential data. This paper formalizes the principal learning tasks and describes the methods that have been developed within the machine learning research community for addressing these problems. These methods include sliding window met ..."
Abstract

Cited by 84 (1 self)
 Add to MetaCart
Abstract. Statistical learning problems in many fields involve sequential data. This paper formalizes the principal learning tasks and describes the methods that have been developed within the machine learning research community for addressing these problems. These methods include sliding window methods, recurrent sliding windows, hidden Markov models, conditional random fields, and graph transformer networks. The paper also discusses some open research issues. 1
Text Categorization Based on Regularized Linear Classification Methods
 Information Retrieval
, 2000
"... A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of document ..."
Abstract

Cited by 81 (2 self)
 Add to MetaCart
A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of document vectors from its complement. However, support vector machines are so far considered special in that they have been demonstrated to achieve the state of the art performance. It is therefore worthwhile to understand whether such good performance is unique to the SVM design, or if it can also be achieved by other linear classification methods. In this paper, we compare a number of known linear classification methods as well as some variants in the framework of regularized linear systems. We will discuss the statistical and numerical properties of these algorithms, with a focus on text categorization. We will also provide some numerical experiments to illustrate these algorithms on a number of datasets.
Adaptive Sparseness for Supervised Learning
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2003
"... The goal of supervised learning is to infer a functional mapping based on a set of training examples. To achieve good generalization, it is necessary to control the "complexity" of the learned function. In Bayesian approaches, this is done by adopting a prior for the parameters of the function bei ..."
Abstract

Cited by 81 (4 self)
 Add to MetaCart
The goal of supervised learning is to infer a functional mapping based on a set of training examples. To achieve good generalization, it is necessary to control the "complexity" of the learned function. In Bayesian approaches, this is done by adopting a prior for the parameters of the function being learned. We propose a Bayesian approach to supervised learning, which leads to sparse solutions; that is, in which irrelevant parameters are automatically set exactly to zero. Other ways to obtain sparse classifiers (such as Laplacian priors, support vector machines) involve (hyper)parameters which control the degree of sparseness of the resulting classifiers; these parameters have to be somehow adjusted/estimated from the training data. In contrast, our approach does not involve any (hyper)parameters to be adjusted or estimated. This is achieved by a hierarchicalBayes interpretation of the Laplacian prior, which is then modified by the adoption of a Jeffreys' noninformative hyperprior. Implementation is carried out by an expectationmaximization (EM) algorithm. Experiments with several benchmark data sets show that the proposed approach yields stateoftheart performance. In particular, our method outperforms SVMs and performs competitively with the best alternative techniques, although it involves no tuning or adjustment of sparsenesscontrolling hyperparameters.
Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space
 Journal of Machine Learning Research
, 2003
"... We present a novel and flexible approach to the problem of feature selection, called grafting.Rather than considering feature selection as separate from learning, grafting treats the selection of suitable features as an integral part of learning a predictor in a regularized learning framework. To ..."
Abstract

Cited by 79 (2 self)
 Add to MetaCart
We present a novel and flexible approach to the problem of feature selection, called grafting.Rather than considering feature selection as separate from learning, grafting treats the selection of suitable features as an integral part of learning a predictor in a regularized learning framework. To make this regularized learning process sufficiently fast for large scale problems, grafting operates in an incremental iterative fashion, gradually building up a feature set while training a predictor model using gradient descent. At each iteration, a fast gradientbased heuristic is used to quickly assess which feature is most likely to improve the existing model, that feature is then added to the model, and the model is incrementally optimized using gradient descent. The algorithm scales linearly with the number of data points and at most quadratically with the number of features. Grafting can be used with a variety of predictor model classes, both linear and nonlinear, and can be used for both classification and regression. Experiments are reported here on a variant of grafting for classification, using both linear and nonlinear models, and using a logistic regressioninspired loss function. Results on a variety of synthetic and real world data sets are presented. Finally the relationship between grafting, stagewise additive modelling, and boosting is explored.
On a Kernelbased Method for Pattern Recognition, Regression, Approximation, and Operator Inversion
, 1997
"... We present a Kernelbased framework for Pattern Recognition, Regression Estimation, Function Approximation and multiple Operator Inversion. Previous approaches such as ridgeregression, Support Vector methods and regression by Smoothing Kernels are included as special cases. We will show connection ..."
Abstract

Cited by 77 (25 self)
 Add to MetaCart
We present a Kernelbased framework for Pattern Recognition, Regression Estimation, Function Approximation and multiple Operator Inversion. Previous approaches such as ridgeregression, Support Vector methods and regression by Smoothing Kernels are included as special cases. We will show connections between the costfunction and some properties up to now believed to apply to Support Vector Machines only. The optimal solution of all the problems described above can be found by solving a simple quadratic programming problem. The paper closes with a proof of the equivalence between Support Vector kernels and Greene's functions of regularization operators.
The composite absolute penalties family for grouped and hierarchical variable selection
 Ann. Statist
"... Extracting useful information from highdimensional data is an important focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the ..."
Abstract

Cited by 69 (3 self)
 Add to MetaCart
Extracting useful information from highdimensional data is an important focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the L1penalized squared error minimization method Lasso has been popular in regression models and beyond. In this paper, we combine different norms including L1 to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penalties (CAP) family, which allows given grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and combining the properties of norm penalties at the acrossgroup and withingroup levels. Grouped selection occurs for nonoverlapping groups. Hierarchical variable selection is reached
Tree induction vs. logistic regression: A learningcurve analysis
 CEDER WORKING PAPER #IS0102, STERN SCHOOL OF BUSINESS
, 2001
"... Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership pr ..."
Abstract

Cited by 64 (16 self)
 Add to MetaCart
Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership probabilities. We use a learningcurve analysis to examine the relationship of these measures to the size of the training set. The results of the study show several remarkable things. (1) Contrary to prior observations, logistic regression does not generally outperform tree induction. (2) More specifically, and not surprisingly, logistic regression is better for smaller training sets and tree induction for larger data sets. Importantly, this often holds for training sets drawn from the same domain (i.e., the learning curves cross), so conclusions about inductionalgorithm superiority on a given domain must be based on an analysis of the learning curves. (3) Contrary to conventional wisdom, tree induction is effective atproducing probabilitybased rankings, although apparently comparatively less so foragiven training{set size than at making classifications. Finally, (4) the domains on which tree induction and logistic regression are ultimately preferable canbecharacterized surprisingly well by a simple measure of signaltonoise ratio.
MultiWeight Enveloping: LeastSquares Approximation Techniques for Skin Animation
, 2002
"... We present a process called multiweight enveloping for deforming the skin geometry of the body of a digital creature around its skeleton. It is based on a deformation equation whose coefficients we compute using a statistical fit to an input training exercise. In this input, the skeleton and the sk ..."
Abstract

Cited by 63 (0 self)
 Add to MetaCart
We present a process called multiweight enveloping for deforming the skin geometry of the body of a digital creature around its skeleton. It is based on a deformation equation whose coefficients we compute using a statistical fit to an input training exercise. In this input, the skeleton and the skin move together, by arbitrary external means, through a range of motion representative of what the creature is expected to achieve in practice. The input can also come from existing pieces of handcrafted skin animation. Using a modified leastsquares fitting technique, we compute the coefficients, or “weights”, of the deformation equation. The result is that the equation generalizes the skin movement so that it applies well to other sequences of animation. The multiweight deformation equation is computationally efficient to evaluate; once the training process is complete, even creatures with high levels of geometric detail can move at interactive frames rates with a look that approximates that of anatomical, physicallybased models. We demonstrate the technique in a feature film production environment, on a human model whose input poses are sculpted by hand and an animal model whose input poses come from the output of an anatomicallybased dynamic simulation.
A semidefinite framework for trust region subproblems with applications to large scale minimization
 Math. Programming
, 1997
"... This is an abbreviated revision of the University of Waterloo research report CORR 9432. y ..."
Abstract

Cited by 60 (8 self)
 Add to MetaCart
This is an abbreviated revision of the University of Waterloo research report CORR 9432. y
Pedestrian Detection for Driving Assistance Systems: Singleframe Classification and System Level Performance
 IN PROCEEDINGS OF IEEE INTELLIGENT VEHICLES SYMPOSIUM
, 2004
"... We describe the functional and architectural breakdown of a monocular pedestrian detection system. We describe in detail our approach for singleframe classification based on a novel scheme of breaking down the class variability by repeatedly training a set of relatively simple classifiers on cluste ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
We describe the functional and architectural breakdown of a monocular pedestrian detection system. We describe in detail our approach for singleframe classification based on a novel scheme of breaking down the class variability by repeatedly training a set of relatively simple classifiers on clusters of the training set. Singleframe classification performance results and system level performance figures for daytime conditions are presented with a discussion about the remaining gap to meet a daytime normal weather condition production system.