Results 1 - 10
of
20
Least Squares Linear Discriminant Analysis
"... Linear Discriminant Analysis (LDA) is a well-known method for dimensionality reduction and classification. LDA in the binaryclass case has been shown to be equivalent to linear regression with the class label as the output. This implies that LDA for binary-class classifications can be formulated as ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
Linear Discriminant Analysis (LDA) is a well-known method for dimensionality reduction and classification. LDA in the binaryclass case has been shown to be equivalent to linear regression with the class label as the output. This implies that LDA for binary-class classifications can be formulated as a least squares problem. Previous studies have shown certain relationship between multivariate linear regression and LDA for the multi-class case. Many of these studies show that multivariate linear regression with a specific class indicator matrix as the output can be applied as a preprocessing step for LDA. However, directly casting LDA as a least squares problem is challenging for the multi-class case. In this paper, a novel formulation for multivariate linear regression is proposed. The equivalence relationship between the proposed least squares formulation and LDA for multi-class classifications is rigorously established under a mild condition, which is shown empirically to hold in many applications involving high-dimensional data. Several LDA extensions based on the equivalence relationship are discussed. 1.
Multi-class Discriminant Kernel Learning via Convex Programming
"... Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of pre-specified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binary-class case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semi-infinite linear programming (SILP) formulation is derived to further improve the efficiency. We extend these formulations to the multi-class case based on a key result established in this paper. That is, the multi-class RKDA kernel learning problem can be decomposed into a set of binary-class kernel learning problems which are constrained to share a common kernel. Based on this decomposition property, SDP formulations are proposed for the multi-class case. Furthermore, it leads naturally to QCQP and SILP formulations. As the performance of RKDA depends on the regularization parameter, we show that this parameter can also be optimized in a joint framework with the kernel. Extensive experiments have been conducted and analyzed, and connections to other algorithms are discussed.
Reduced support vector machines: A statistical theory
- IEEE Trans. Neural Netw
, 2007
"... In dealing with large datasets the reduced support vector machine (RSVM) was proposed for the practical objective to overcome the computational diffi-culties as well as to reduce the model complexity. 1 In this article, we study the RSVM from the viewpoint of robust design for model building and con ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
In dealing with large datasets the reduced support vector machine (RSVM) was proposed for the practical objective to overcome the computational diffi-culties as well as to reduce the model complexity. 1 In this article, we study the RSVM from the viewpoint of robust design for model building and consider the nonlinear separating surface as a mixture of kernels. The RSVM uses a reduced model representation instead of a full one. Our main results center on two major themes. One is on the robustness of the random subset mixture model. The robustness is judged by a few criteria: (1) model variation measure, (2) model bias (deviation) between the reduced model and the full model and (3) test power in distinguishing the reduced model from the full one. The other is on the spectral analysis of the reduced kernel. We compare the eigen-structures of the full kernel matrix and the approximation kernel matrix. The approximation kernels are generated by uniform random subsets. The small discrepancies between them indicate that the approximation kernels can retain most of the relevant information for learning tasks in the full kernel. We focus on some statistical theory of the reduced set method mainly in the context of the RSVM. The use of a uniform random subset is not limited to the RSVM. This approach can act as a supplemental-algorithm on top of a basic optimization algorithm, wherein the actual optimization takes place on the subset-approximated data. The statistical properties discussed in this paper are still valid. Key words and phrases: canonical angles, kernel methods, maximinity, min-imaxity, model complexity, reduced set, Monte-Carlo sampling, Nyström ap-
Optimising Kernel Parameters and Regularisation Coefficients for Non-Linear Discriminant Analysis
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this paper we consider a novel Bayesian interpretation of Fisher's discriminant analysis. We relate Rayleigh's coefficient to a noise model that minimises a cost based on the most probable class centres and that abandons the `regression to the labels' assumption used by other algorithms. Optimis ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper we consider a novel Bayesian interpretation of Fisher's discriminant analysis. We relate Rayleigh's coefficient to a noise model that minimises a cost based on the most probable class centres and that abandons the `regression to the labels' assumption used by other algorithms. Optimisation of the noise model yields a direction of discrimination equivalent to Fisher's discriminant, and with the incorporation of a prior we can apply Bayes' rule to infer the posterior distribution of the direction of discrimination. Nonetheless, we argue that an additional constraining distribution has to be included if sensible results are to be obtained. Going further, with the use of a Gaussian process prior we show the equivalence of our model to a regularised kernel Fisher's discriminant. A key advantage of our approach is the facility to determine kernel parameters and the regularisation coefficient through the optimisation of the marginal log-likelihood of the data. An added bonus of the new formulation is that it enables us to link the regularisation coefficient with the generalisation error.
On Relevant Dimensions in Kernel Feature Spaces
- J. Machine Learning Research
, 2008
"... We show that the relevant information of a supervised learning problem is contained up to negligible error in a finite number of leading kernel PCA components if the kernel matches the underlying learning problem in the sense that it can asymptotically represent the function to be learned and is suf ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We show that the relevant information of a supervised learning problem is contained up to negligible error in a finite number of leading kernel PCA components if the kernel matches the underlying learning problem in the sense that it can asymptotically represent the function to be learned and is sufficiently smooth. Thus, kernels do not only transform data sets such that good generalization can be achieved using only linear discriminant functions, but this transformation is also performed in a manner which makes economical use of feature space dimensions. In the best case, kernels provide efficient implicit representations of the data for supervised learning problems. Practically, we propose an algorithm which enables us to recover the number of leading kernel PCA components relevant for good classification. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to aid in model selection, and (3) to denoise in feature space in order to yield better classification results.
A hybrid hmm-based speech recognizer using kernel-based discriminants as acoustic models
- In 18th International Conference on Pattern Recognition (ICPR 2006), 20-24 August 2006, Hong Kong
, 2006
"... In this paper we propose a novel order-recursive training algorithm for kernel-based discriminants which is computationally efficient. We integrate this method in a hybrid HMM-based speech recognition system by translating the outputs of the kernel-based classifier into class-conditional probabiliti ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper we propose a novel order-recursive training algorithm for kernel-based discriminants which is computationally efficient. We integrate this method in a hybrid HMM-based speech recognition system by translating the outputs of the kernel-based classifier into class-conditional probabilities and using them instead of Gaussian mixtures as production probabilities of a HMM-based decoder for speech recognition. The performance of the described hybrid structure is demonstrated on the DARPA Resource Management (RM1) corpus.
Updates for nonlinear discriminants
- In 20th International Joint Conference on Artificial Intelligence (IJCAI 2007
, 2007
"... A novel training algorithm for nonlinear discriminants for classification and regression in Reproducing Kernel Hilbert Spaces (RKHSs) is presented. It is shown how the overdetermined linear leastsquares-problem in the corresponding RKHS may be solved within a greedy forward selection scheme by updat ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
A novel training algorithm for nonlinear discriminants for classification and regression in Reproducing Kernel Hilbert Spaces (RKHSs) is presented. It is shown how the overdetermined linear leastsquares-problem in the corresponding RKHS may be solved within a greedy forward selection scheme by updating the pseudoinverse in an order-recursive way. The described construction of the pseudoinverse gives rise to an update of the orthogonal decomposition of the reduced Gram matrix in linear time. Regularization in the spirit of Ridge regression may then easily be applied in the orthogonal space. Various experiments for both classification and regression are performed to show the competitiveness of the proposed method. 1
Acoustic modelling using kernel-based discriminants
- University of Patras
, 2005
"... In this paper we use kernel-based Fisher Discriminants (KFD) for classification by integrating this method in a HMM-based speech recognition system. We translate the outputs of the KFD-classifier into conditional probabilities and use them as production probabilities of a HMM-based decoder for speec ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper we use kernel-based Fisher Discriminants (KFD) for classification by integrating this method in a HMM-based speech recognition system. We translate the outputs of the KFD-classifier into conditional probabilities and use them as production probabilities of a HMM-based decoder for speech recognition. To obtain a good performance also in terms of computational complexity the Recursive Least Squares Algorithm (RLS-Algorithm) is enforced. We train and test the described hybrid structure on the Resource Management Corpus (RM1). 1.
Limited training data robust speech recognition using kernel-based acoustic models
- In IEEE International Conference on Acoustics, Speech, and Signal Processing
, 2006
"... Contemporary automatic speech recognition uses Hidden-Markov-Models (HMMs) to model the temporal structure of speech where one HMM is used for each phonetic unit. The states of the HMMs are associated with state-conditional probability density functions (PDFs) which are typically realized using mixt ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Contemporary automatic speech recognition uses Hidden-Markov-Models (HMMs) to model the temporal structure of speech where one HMM is used for each phonetic unit. The states of the HMMs are associated with state-conditional probability density functions (PDFs) which are typically realized using mixtures of Gaussian PDFs (GMMs). Training of GMMs is error-prone especially if training data size is limited. This paper evaluates two new methods of modeling state-conditional PDFs using probabilistically interpreted Support Vector Machines and Kernel Fisher Discriminants. Extensive experiments on the RM1 [1] corpus yield substantially improved recognition rates compared to traditional GMMs. Due to their generalization ability, our new methods reduce the word error rate by up to 13 % using the complete training set and up to 33 % when the training set size is reduced.
Kernel fisher discriminants as acoustic models in hmm-based speech recognition
- in 10th International Conference on Speech and Computer
, 2005
"... While the temporal dynamic of speech can be handled very efficiently by Hidden Markov Models (HMMs), the classification of the single speech units (phonemes) is usually done with Gaussian probability density functions which are not discriminative. In this paper we use the Kernel Fisher Discriminant ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
While the temporal dynamic of speech can be handled very efficiently by Hidden Markov Models (HMMs), the classification of the single speech units (phonemes) is usually done with Gaussian probability density functions which are not discriminative. In this paper we use the Kernel Fisher Discriminant (KFD) for classification by integrating this method in a HMM-based speech recognition system. In this structure we translate the outputs of the KFD into class-conditional probabilities and use them as production probabilities in an HMM-based speech decoder. The KFD has already shown good classification results in other fields (e. g. pattern recognition). To obtain good performance also in terms of computational complexity the KFD is implemented iteratively with a sparse greedy approach. We train and test the described hybrid structure on the Resource Management (RM1) task.

