Results 1  10
of
84
Markov Chain Monte Carlo methods and the label switching problem in Bayesian mixture modelling
 Statistical Science
"... Abstract. In the past ten years there has been a dramatic increase of interest in the Bayesian analysis of finite mixture models. This is primarily because of the emergence of Markov chain Monte Carlo (MCMC) methods. While MCMC provides a convenient way to draw inference from complicated statistical ..."
Abstract

Cited by 113 (5 self)
 Add to MetaCart
Abstract. In the past ten years there has been a dramatic increase of interest in the Bayesian analysis of finite mixture models. This is primarily because of the emergence of Markov chain Monte Carlo (MCMC) methods. While MCMC provides a convenient way to draw inference from complicated statistical models, there are many, perhaps underappreciated, problems associated with the MCMC analysis of mixtures. The problems are mainly caused by the nonidentifiability of the components under symmetric priors, which leads to socalled label switching in the MCMC output. This means that ergodic averages of component specific quantities will be identical and thus useless for inference. We review the solutions to the label switching problem, such as artificial identifiability constraints, relabelling algorithms and label invariant loss functions. We also review various MCMC sampling schemes that have been suggested for mixture models and discuss posterior sensitivity to prior specification.
Efficient and Robust Feature Extraction by Maximum Margin Criterion
 In Advances in Neural Information Processing Systems 16
, 2003
"... In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of data and to enhance the discriminatory information. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two most popular linear dimensionality reduction methods. Howev ..."
Abstract

Cited by 107 (4 self)
 Add to MetaCart
(Show Context)
In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of data and to enhance the discriminatory information. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two most popular linear dimensionality reduction methods. However, PCA is not very effective for the extraction of the most discriminant features and LDA is not stable due to the small sample size problem. In this paper, we propose some new (linear and nonlinear) feature extractors based on maximum margin criterion (MMC). Geometrically, feature extractors based on MMC maximize the (average) margin between classes after dimensionality reduction. It is shown that MMC can represent class separability better than PCA. As a connection to LDA, we may also derive LDA from MMC by incorporating some constraints. By using some other constraints, we establish a new linear feature extractor that does not suffer from the small sample size problem, which is known to cause serious stability problems for LDA. The kernelized (nonlinear) counterpart of this linear feature extractor is also established in the paper. Our extensive experiments demonstrate that the new feature extractors are effective, stable, and efficient.
Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2001
"... We derive a class of computationally inexpensive linear dimension reduction criteria by introducing a weighted variant of the wellknown Kclass Fisher criterion associated with linear discriminant analysis (LDA). It can be seen that LDA weights contributions of individual class pairs according to ..."
Abstract

Cited by 101 (7 self)
 Add to MetaCart
(Show Context)
We derive a class of computationally inexpensive linear dimension reduction criteria by introducing a weighted variant of the wellknown Kclass Fisher criterion associated with linear discriminant analysis (LDA). It can be seen that LDA weights contributions of individual class pairs according to the Euclidian distance of the respective class means. We generalize upon LDA by introducing a different weighting function.
Linear dimensionality reduction via a heteroscedastic extension of lda: The chernoff criterion
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2004
"... Abstract—We propose an eigenvectorbased heteroscedastic linear dimension reduction (LDR) technique for multiclass data. The technique is based on a heteroscedastic twoclass technique which utilizes the socalled Chernoff criterion, and successfully extends the wellknown linear discriminant analys ..."
Abstract

Cited by 90 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We propose an eigenvectorbased heteroscedastic linear dimension reduction (LDR) technique for multiclass data. The technique is based on a heteroscedastic twoclass technique which utilizes the socalled Chernoff criterion, and successfully extends the wellknown linear discriminant analysis (LDA). The latter, which is based on the Fisher criterion, is incapable of dealing with heteroscedastic data in a proper way. For the twoclass case, the betweenclass scatter is generalized so to capture differences in (co)variances. It is shown that the classical notion of betweenclass scatter can be associated with Euclidean distances between class means. From this viewpoint, the betweenclass scatter is generalized by employing the Chernoff distance measure, leading to our proposed heteroscedastic measure. Finally, using the results from the twoclass case, a multiclass extension of the Chernoff criterion is proposed. This criterion combines separation information present in the class mean as well as the class covariance matrices. Extensive experiments and a comparison with similar dimension reduction techniques are presented. Index Terms—Linear dimension reduction, linear discriminant analysis, Fisher criterion, Chernoff distance, Chernoff criterion. 1
Geometric mean for subspace selection
 TIANJIN UNIVERSITY. Downloaded on December 8, 2009 at 04:33 from IEEE Xplore. Restrictions apply. YUAN et al.: BINARY SPARSE NONNEGATIVE MATRIX FACTORIZATION 777
, 2009
"... Abstract—Subspace selection approaches are powerful tools in pattern classification and data visualization. One of the most important subspace approaches is the linear dimensionality reduction step in the Fisher’s linear discriminant analysis (FLDA), which has been successfully employed in many fiel ..."
Abstract

Cited by 52 (11 self)
 Add to MetaCart
(Show Context)
Abstract—Subspace selection approaches are powerful tools in pattern classification and data visualization. One of the most important subspace approaches is the linear dimensionality reduction step in the Fisher’s linear discriminant analysis (FLDA), which has been successfully employed in many fields such as biometrics, bioinformatics, and multimedia information management. However, the linear dimensionality reduction step in FLDA has a critical drawback: for a classification task with c classes, if the dimension of the projected subspace is strictly lower than c 1, the projection to a subspace tends to merge those classes, which are close together in the original feature space. If separate classes are sampled from Gaussian distributions, all with identical covariance matrices, then the linear dimensionality reduction step in FLDA maximizes the mean value of the KullbackLeibler (KL) divergences between different classes. Based on this viewpoint, the geometric mean for subspace selection is studied in this paper. Three criteria are analyzed: 1) maximization of the geometric mean of the KL divergences, 2) maximization of the geometric mean of the normalized KL divergences, and 3) the combination of 1 and 2. Preliminary experimental results based on synthetic data, UCI Machine Learning Repository, and handwriting digits show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions. Index Terms—Arithmetic mean, Fisher’s linear discriminant analysis (FLDA), geometric mean, KullbackLeibler (KL) divergence, machine learning, subspace selection (or dimensionality reduction), visualization. Ç 1
Classifiers in Almost Empty Spaces
 In 15th International Conference on Pattern Recognition
, 2000
"... Recent developments in defining and training statistical classifiers make it possible to build reliable classifiers in very small sample size problems. Using these techniques advanced problems may be tackled, such as pixel based image recognition and dissimilarity based object classification. It wil ..."
Abstract

Cited by 31 (7 self)
 Add to MetaCart
(Show Context)
Recent developments in defining and training statistical classifiers make it possible to build reliable classifiers in very small sample size problems. Using these techniques advanced problems may be tackled, such as pixel based image recognition and dissimilarity based object classification. It will be explained and illustrated how recognition systems based on support vector machines and subspace classifiers circumvent the curse of dimensionality, and even may find nonlinear decision boundaries for small training sets represented in Hilbert space.
Classification of High Dimensional Data With Limited Training Samples
, 1998
"...  iiTABLE OF CONTENTS ABSTRACT.......................................................................................iv ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
(Show Context)
 iiTABLE OF CONTENTS ABSTRACT.......................................................................................iv
A LeastSquares Framework for Component Analysis
, 2009
"... ... (SC) have been extensively used as a feature extraction step for modeling, clustering, classification, and visualization. CA techniques are appealing because many can be formulated as eigenproblems, offering great potential for learning linear and nonlinear representations of data in closedfo ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
... (SC) have been extensively used as a feature extraction step for modeling, clustering, classification, and visualization. CA techniques are appealing because many can be formulated as eigenproblems, offering great potential for learning linear and nonlinear representations of data in closedform. However, the eigenformulation often conceals important analytic and computational drawbacks of CA techniques, such as solving generalized eigenproblems with rank deficient matrices (e.g., small sample size problem), lacking intuitive interpretation of normalization factors, and understanding commonalities and differences between CA methods. This paper proposes a unified leastsquares framework to formulate many CA methods. We show how PCA, LDA, CCA, LE, SC, and their kernel and regularized extensions, correspond to a particular instance of leastsquares weighted kernel reduced rank regression (LSWKRRR). The LSWKRRR formulation of CA methods has several benefits: (1) provides a clean connection between many CA techniques and an intuitive framework to understand normalization factors; (2) yields efficient numerical schemes to solve CA techniques; (3) overcomes the small sample size problem; (4) provides a framework to easily extend CA methods. We derive new weighted generalizations of PCA, LDA, CCA and SC, and several novel CA techniques.
Information discriminant analysis: Feature extraction with an informationtheoretic objective
 IEEE Trans. Pattern Anal. Mach. Intell
, 2007
"... Abstract—Using elementary informationtheoretic tools, we develop a novel technique for linear transformation from the space of observations into a lowdimensional (feature) subspace for the purpose of classification. The technique is based on a numerical optimization of an informationtheoretic obj ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
Abstract—Using elementary informationtheoretic tools, we develop a novel technique for linear transformation from the space of observations into a lowdimensional (feature) subspace for the purpose of classification. The technique is based on a numerical optimization of an informationtheoretic objective function, which can be computed analytically. The advantages of the proposedmethod over several other techniques are discussed and the conditions underwhich themethod reduces to linear discriminant analysis are given. We show that the novel objective function enjoys many of the properties of the mutual information and the Bayes error and we give sufficient conditions for the method to be Bayesoptimal. Since the objective function is maximized numerically, we show how the calculations can be accelerated to yield feasible solutions. The performance of the method compares favorably to other linear discriminantbased feature extraction methods on a number of simulated and realworld data sets.