Results 1  10
of
25
Sparse multinomial logistic regression: fast algorithms and generalization bounds
 IEEE Trans. on Pattern Analysis and Machine Intelligence
"... Abstract—Recently developed methods for learning sparse classifiers are among the stateoftheart in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsitypromoting priors encouraging the weight estimates to be either significantly larg ..."
Abstract

Cited by 113 (1 self)
 Add to MetaCart
Abstract—Recently developed methods for learning sparse classifiers are among the stateoftheart in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsitypromoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learningtheoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a componentwise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in highdimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsitypromoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmark data sets attest to the accuracy, sparsity, and efficiency of the proposed methods.
An efficient algorithm for local distance metric learning
 in Proceedings of AAAI
, 2006
"... Learning applicationspecific distance metrics from labeled data is critical for both statistical classification and information retrieval. Most of the earlier work in this area has focused on finding metrics that simultaneously optimize compactness and separability in a global sense. Specifically, ..."
Abstract

Cited by 30 (9 self)
 Add to MetaCart
Learning applicationspecific distance metrics from labeled data is critical for both statistical classification and information retrieval. Most of the earlier work in this area has focused on finding metrics that simultaneously optimize compactness and separability in a global sense. Specifically, such distance metrics attempt to keep all of the data points in each class close together while ensuring that data points from different classes are separated. However, particularly when classes exhibit multimodal data distributions, these goals conflict and thus cannot be simultaneously satisfied. This paper proposes a Local Distance Metric (LDM) that aims to optimize local compactness and local separability. We present an efficient algorithm that employs eigenvector analysis and bound optimization to learn the LDM from training data in a probabilistic framework. We demonstrate that LDM achieves significant improvements in both classification and retrieval accuracy compared to global distance learning and kernelbased KNN.
A General Model for Multiple View Unsupervised Learning
, 2008
"... Multiple view data, which have multiple representations from different feature spaces or graph spaces, arise in various data mining applications such as information retrieval, bioinformatics and social network analysis. Since different representations could have very different statistical properties ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Multiple view data, which have multiple representations from different feature spaces or graph spaces, arise in various data mining applications such as information retrieval, bioinformatics and social network analysis. Since different representations could have very different statistical properties, how to learn a consensus pattern from multiple representations is a challenging problem. In this paper, we propose a general model for multiple view unsupervised learning. The proposed model introduces the concept of mapping function to make the different patterns from different pattern spaces comparable and hence an optimal pattern can be learned from the multiple patterns of multiple representations. Under this model, we formulate two specific models for
A Probabilistic Approach for Optimizing Spectral Clustering
 In Advances in Neural Information Processing Systems 18
, 2005
"... Spectral clustering enjoys its success in both data clustering and semisupervised learning. But, most spectral clustering algorithms cannot handle multiclass clustering problems directly. Additional strategies are needed to extend spectral clustering algorithms to multiclass clustering problems. F ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
Spectral clustering enjoys its success in both data clustering and semisupervised learning. But, most spectral clustering algorithms cannot handle multiclass clustering problems directly. Additional strategies are needed to extend spectral clustering algorithms to multiclass clustering problems. Furthermore, most spectral clustering algorithms employ hard cluster membership, which is likely to be trapped by the local optimum. In this paper, we present a new spectral clustering algorithm, named “Soft Cut”. It improves the normalized cut algorithm by introducing soft membership, and can be efficiently computed using a bound optimization algorithm. Our experiments with a variety of datasets have shown the promising performance of the proposed clustering algorithm. 1
Distributed Latent Variable Models of Lexical Cooccurrences
 IN PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON ARTIFICIAL INTELLIGENCE AND STATISTICS
, 2005
"... Lowdimensional representations for lexical cooccurrence data have become increasingly important in alleviating the sparse data problem inherent in natural language processing tasks. This work presents a distributed latent variable model for inducing these lowdimensional representations. The ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Lowdimensional representations for lexical cooccurrence data have become increasingly important in alleviating the sparse data problem inherent in natural language processing tasks. This work presents a distributed latent variable model for inducing these lowdimensional representations. The model takes
Fast Global Kernel Density Mode Seeking with Application to Localisation and Tracking
 In IEEE International Conference on Computer Vision
, 2005
"... We address the problem of seeking the global mode of a density function using the mean shift algorithm. Mean shift, like other gradient ascent optimisation methods, is susceptible to local maxima, and hence often fails to find the desired global maximum. In this work, we propose a multibandwidth me ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
We address the problem of seeking the global mode of a density function using the mean shift algorithm. Mean shift, like other gradient ascent optimisation methods, is susceptible to local maxima, and hence often fails to find the desired global maximum. In this work, we propose a multibandwidth mean shift procedure that avoids this problem, which we term annealed mean shift, as it shares similarities with the annealed importance sampling procedure. The bandwidth of the algorithm plays the same role as the temperature in annealing. We observe that the oversmoothed density function with a sufficiently large bandwidth is unimodal. Using a continuation principle, the influence of the global peak in the density function is introduced gradually. In this way the global maximum is more reliably located. Generally, the price of this annealinglike procedure is that more iterations are required. Since it is imperative that the computation complexity is minimal in realtime applications such as visual tracking. We propose an accelerated version of the mean shift algorithm. Compared with the conventional mean shift algorithm, annealed mean shift can significantly decrease the number of iterations required for convergence. The proposed algorithm is applied to the problems of visual tracking and object localisation. We empirically show on various data sets that the proposed algorithm can reliably find the true object location when the starting position of mean shift is far away from the global maximum, in contrast with the conventional mean shift algorithm that will usually get trapped in a spurious local maximum.
Relational clustering by symmetric convex coding
 In ICML
, 2007
"... Relational data appear frequently in many machine learning applications. Relational data consist of the pairwise relations (similarities or dissimilarities) between each pair of implicit objects, and are usually stored in relation matrices and typically no other knowledge is available. Although rela ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Relational data appear frequently in many machine learning applications. Relational data consist of the pairwise relations (similarities or dissimilarities) between each pair of implicit objects, and are usually stored in relation matrices and typically no other knowledge is available. Although relational clustering can be formulated as graph partitioning in some applications, this formulation is not adequate for general relational data. In this paper, we propose a general model for relational clustering based on symmetric convex coding. The model is applicable to all types of relational data and unifies the existing graph partitioning formulation. Under this model, we derive two alternative bound optimization algorithms to solve the symmetric convex coding under two popular distance functions, Euclidean distance and generalized Idivergence. Experimental evaluation and theoretical analysis show the effectiveness and great potential of the proposed model and algorithms. 1.
Training Conditional Random Fields by Periodic Step Size Adaptation for LargeScale Text Mining
"... For applications with consecutive incoming training examples, online learning has the potential to achieve a likelihood as high as offline learning without scanning all available training examples and usually has a much smaller memory footprint. To train CRFs online, this paper presents the Perio ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
For applications with consecutive incoming training examples, online learning has the potential to achieve a likelihood as high as offline learning without scanning all available training examples and usually has a much smaller memory footprint. To train CRFs online, this paper presents the Periodic Step size Adaptation (PSA) method to dynamically adjust the learning rates in stochastic gradient descent. We applied our method to three large scale text mining tasks. Experimental results show that PSA outperforms the best offline algorithm, LBFGS, by many hundred times, and outperforms the best online algorithm, SMD, by an order of magnitude in terms of the number of passes required to scan the training data set. 1.
Flexible and efficient implementations of bayesian independent component analysis
, 2005
"... In this paper we present an empirical Bayes method for flexible and efficient Independent Component Analysis (ICA). The method is flexible with respect to choice of source prior, dimensionality and positivity of the mixing matrix, and structure of the noise covariance matrix. The efficiency is ensur ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In this paper we present an empirical Bayes method for flexible and efficient Independent Component Analysis (ICA). The method is flexible with respect to choice of source prior, dimensionality and positivity of the mixing matrix, and structure of the noise covariance matrix. The efficiency is ensured using parameter optimizers which are more advanced than the expectation maximization (EM) algorithm, but still easy to implement. These optimizers are the overrelaxed adaptive EM algorithm and the easy gradient recipe. The required expectations over the source posterior are estimated with accurate mean field methods: variational and the expectation consistent framework. We demonstrate the usefulness of
Triple jump acceleration for the em algorithm
 In ICDM ’05: Proceedings of the Fifth IEEE International Conference on Data Mining
, 2005
"... This paper presents the triple jump framework for accelerating the EM algorithm and other bound optimization methods. The idea is to extrapolate the third search point based on the previous two search points found by regular EM. As the convergence rate of regular EM becomes slower, the distance of t ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This paper presents the triple jump framework for accelerating the EM algorithm and other bound optimization methods. The idea is to extrapolate the third search point based on the previous two search points found by regular EM. As the convergence rate of regular EM becomes slower, the distance of the triple jump will be longer, and thus provide higher speedup for data sets where EM converges slowly. Experimental results show that the triple jump framework significantly outperforms EM and other acceleration methods of EM for a variety of probabilistic models, especially when the data set is sparse. The results also show that the triple jump framework is particularly effective for Cluster Models. 1.