## Discriminant Analysis by Gaussian Mixtures (1996)

Venue: | Journal of the Royal Statistical Society, Series B |

Citations: | 158 - 10 self |

### BibTeX

@ARTICLE{Hastie96discriminantanalysis,

author = {Trevor Hastie and Robert Tibshirani},

title = {Discriminant Analysis by Gaussian Mixtures},

journal = {Journal of the Royal Statistical Society, Series B},

year = {1996},

volume = {58},

pages = {155--176}

}

### Years of Citing Articles

### OpenURL

### Abstract

Fisher-Rao linear discriminant analysis (LDA) is a valuable tool for multigroup classification. LDA is equivalent to maximum likelihood classification assuming Gaussian distributions for each class. In this paper, we fit Gaussian mixtures to each class to facilitate effective classification in non-normal settings, especially when the classes are clustered. Low dimensional views are an important by-product of LDA---our new techniques inherit this feature. We are able to control the within-class spread of the subclass centers relative to the between-class spread. Our technique for fitting these models permits a natural blend with nonparametric versions of LDA. Keywords: Classification, Pattern Recognition, Clustering, Nonparametric, Penalized. 1 Introduction In the generic classification or discrimination problem, the outcome of interest G falls into J unordered classes, which for convenience we denote by the set J = f1; 2; 3; \Delta \Delta \Delta Jg. We wish to build a rule for pred...

### Citations

4358 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...21-space, thereby forming vertices of a triangle, and each class is represented as a convex combination of a pair of vertices, and hence lie on an edge. The Bayes risk for this problem is about 0.14 (=-=Breiman et al. 1984); MDA com-=-es close to the optimal rate, which is not surprising since the structure of the MDA model is similar to the generating model. 5 MDA by optimal scoring One can use "optimal scoring"--- multi... |

1904 | Introduction to the Theory of Neural Computation - Hertz, Krogh, et al. - 1991 |

1611 | Generalized Additive Models - Hastie, Tibshirani - 1990 |

1010 | Multivariate analysis - Mardia, Kent, et al. - 1979 |

696 | Statistical Analysis of Finite Mixture Distributions - Titterington, Smith, et al. |

420 |
Discrimination Analysis and Statistical Pattern Recognition
- MCLACHLAN
- 1992
(Show Context)
Citation Context ...herefore to generalize LDA by assuming that each observed class is in fact a mixture of unobserved normally distributed subclasses. This approach is sometimes mentioned in the statistical literature (=-=McLachlan 1992-=-, Cheng & Titterington 1994, for example) and pattern recognition literature (Taxt, Hjort & Eikvil 1991), but does not seem to have generated much attention. In this paper we develop the mixture appro... |

255 | Multivariate Adaptive Regression Splines” (with discussion - Friedman - 1991 |

196 | Handwritten digit recognition with a backpropagation network - Cun, Boser, et al. - 1990 |

150 | Penalized discriminant analysis - Hastie, Buja, et al. - 1995 |

146 | Neural networks and related methods for classification - Ripley - 1994 |

116 | Flexible discriminant analysis by optimal scoring - Hastie, Tibshirani, et al. - 1994 |

95 |
Self-organization and associative memory: 3rd edition
- Kohonen
- 1989
(Show Context)
Citation Context ...o one less than the number of subclasses in the mixture representation. A technique known as Learning Vector Quantization or LVQ has received a lot of attention in the pattern recognition literature (=-=Kohonen 1989-=-); MDA can be viewed as a smooth version of LVQ. LVQ finds a set of cluster centers for each class; classification is performed by finding the closest center, and assigning the associated class. The o... |

73 | Generalized clustering networks and Kohonen’s self-organizing scheme - PAL, BEZDEK, et al. - 1993 |

50 | Canonical Correlation Analysis When the Data Are Curves - Leurgans, Moyeed, et al. - 1993 |

41 | Introduction to the Theory of Neural Computing - Krogh, Palmer - 1991 |

19 | Canonical variate analysis – a general formulation - Campbell - 1984 |

13 | Penalized discriminant analysis. The Annals of Statistics - HASTIE, BUJA, et al. - 1995 |

7 | Nonlinear discriminant analysis via scaling and ACE - Breiman, Ihaka - 1984 |

7 | Comparison of multivariate discriminant techniques for clinical data - application to the tyroid functional state - Coomans, Broeckaert, et al. |

5 | Canonical variate analysis of high-dimensional spectral data - Kiiveri - 1992 |

3 | Discriminant analysis by mixture estimation - Hastie, Tibshirani - 1995 |

2 |
Classification and clustering in spatial and image data, To appear, proc. of '15 Jahrestagung von Gesellschaft fur Klassification
- Ripley
- 1992
(Show Context)
Citation Context ...exible discriminant analysis, as described in section 5. 5. Neural network --- a single layer perceptron with sigmoidal outputs, crossentropy cost function, weight decay and variable metric optimizer(=-=Ripley 1992-=-). 6. LVQ2 --- a version of Kohonen's learning vector quantization, as described in Hertz, Krogh & Palmer (1991). 21 Table 3: Results for simulation 1. The values are averages over 10 simulations, wit... |

1 | Neural networks and statistical perspectives - Cheng - 1994 |

1 | Statistical classification using a linear mixture of multinormal probability densities - Taxt, Hjort - 1991 |

1 | Neural networks and statistical perspectives, submitted - Cheng - 1993 |