## Discrete principal component analysis (2005)

Citations: | 6 - 0 self |

### BibTeX

@TECHREPORT{Buntine05discreteprincipal,

author = {Wray Buntine and Aleks Jakulin},

title = {Discrete principal component analysis},

institution = {},

year = {2005}

}

### OpenURL

### Abstract

Abstract. This article presents a unified theory for analysis of components in discrete data, and compares the methods with techniques such as independent component analysis (ICA), non-negative matrix factorisation (NMF) and latent Dirichlet allocation (LDA). The main families of algorithms discussed are mean field, Gibbs sampling, and Rao-Blackwellised Gibbs sampling. Applications are presented for voting records from the United States Senate for 2003, and the use of components in subsequent classification.

### Citations

2366 | Latent Dirichlet Allocation
- Blei, Ng, et al.
(Show Context)
Citation Context ...of membership (GOM) [31], probabilistic latent semantic indexing (PLSI) [17], non-negative matrix factorisation (NMF) [23], genotype inference using admixtures [29], latent Dirichlet allocation (LDA) =-=[5]-=-, multinomial PCA (MPCA) [6], multiple aspect modelling [26], and Gamma-Poisson models (GaP) [9]. We refer to these methods jointly as Discrete PCA (DPCA), and this article provides a unifying model f... |

2362 | Modern Information Retrieval
- Baeza-Yates, Ribeiro-Neto
- 1999
(Show Context)
Citation Context ...a matrix made up of rows of such vectors of non-negative integers dominated by zeros, it is called here a large sparse discrete matrix. Bag of words is a basic representation in information retrieval =-=[2]-=-. The alternative is sequence of words. In DPCA, either representation can be used and the models act the same, up to any word order effects introduced by incremental algorithms. This detail is made p... |

1493 | Topographic independent component analysis
- Hyvarinen, Hoyer, et al.
(Show Context)
Citation Context ...enate for 2003, and the use of components in subsequent classification. 1 Introduction Principal component analysis (PCA), latent semantic indexing (LSI), and independent component analysis (ICA, see =-=[19]-=-) are key methods in the statistical engineering toolbox. They have a long history, are used in many different ways, and under different names. They were primarily developed in the engineering communi... |

1441 |
Making large-Scale SVM Learning Practical
- Joachims
- 1999
(Show Context)
Citation Context ...ommon use for PCA and ICA, and as a classification tool. For this, we used the 20 newsgroups collection described previously as well as the Reuters-21578 collection 6 . We employed the SVM light V5.0 =-=[22]-=- classifier with default settings. For classification, we added the class as a distinct multinomial (cf. Section 4.2) for the training data and left it empty for the test data, and then predicted the ... |

1363 |
Generalized linear models
- McCullagh, Nelder
- 1990
(Show Context)
Citation Context ...e mean. Our formulation, then, can be also be interpreted as letting w be exponential family with dual parameter given by (Θl). Our formulation then generalises PCA in the same way that linear models =-=[25]-=- generalises linear regression 3 . Note, an alternative has also been presented [13] where w has an exponential family distribution with natural parameters given by (Θl). For the Bernoulli with probab... |

1240 | Bayesian Data Analysis
- Gelman, Carlin, et al.
- 1995
(Show Context)
Citation Context ...ith this expectation relationship, the dimension of l can now be less than the dimension of w, and thus Θ would be a rectangular matrix. For p(w | Θ) in the so-called exponential family distributions =-=[15]-=-, the expected value of w is referred to as the dual parameter, and it is usually the parameter we know best. For the Bernoulli with probability p, the dual parameter is p, for the Poisson with rate λ... |

1039 |
Bayesian Theory
- Bernardo, Smith
- 1994
(Show Context)
Citation Context ...ribution above can then be represented as w ∼ Multinomial (Θm, L) . Note also, that marginalising out � k lk convolves a Poisson and a gamma distribution to produce a Poisson-Gamma distribution for L =-=[3]-=-. Thus, we have proven the following. � wj 5 Note that the algorithm for MPCA borrowed the techniques of LDA directly. (4)sLemma 1. Given a Gamma-Poisson model of Section 3.3 where the β hyperparamete... |

989 |
H.S.: Learning the parts of objects by non-negative matrix factorization
- Lee, Seung
- 1999
(Show Context)
Citation Context ...emingly similar approaches for discrete data that appears under many names: grade of membership (GOM) [31], probabilistic latent semantic indexing (PLSI) [17], non-negative matrix factorisation (NMF) =-=[23]-=-, genotype inference using admixtures [29], latent Dirichlet allocation (LDA) [5], multinomial PCA (MPCA) [6], multiple aspect modelling [26], and Gamma-Poisson models (GaP) [9]. We refer to these met... |

785 | Probabilistic latent semantic indexing
- Hofmann
- 1999
(Show Context)
Citation Context ...cal computing community has become aware of seemingly similar approaches for discrete data that appears under many names: grade of membership (GOM) [31], probabilistic latent semantic indexing (PLSI) =-=[17]-=-, non-negative matrix factorisation (NMF) [23], genotype inference using admixtures [29], latent Dirichlet allocation (LDA) [5], multinomial PCA (MPCA) [6], multiple aspect modelling [26], and Gamma-P... |

734 | H.S.: Algorithms for non-negative matrix factorization
- Lee, Seung
- 2001
(Show Context)
Citation Context ...d the document word data can be stored on disk and streamed, thus the main memory complexity is O(2JK). Correspondence with NMF A precursor to the GaP model is non-negative matrix factorisation (NMF) =-=[24]-=-, which is based on the matrix approximation paradigm using Kullback-Leibler divergence. The algorithm itself, converted to the notation used here, is as follows l k,(i) ←− l k,(i) � j θj,k � j θj,k w... |

625 |
Finding scientific topics
- Griffiths, Steyvers
- 2004
(Show Context)
Citation Context ...argest body of applied work in this area (using citation indexes) is in genotype inference due to the Structure program [29]. A growing body of work is in text classification and topic modelling (see =-=[16, 8]-=-), and language modelling in information retrieval (see [1, 7, 9]). Here we present in Section 3 a unified theory for analysis of components in discrete data, and compare the methods with related tech... |

548 | Distributional clustering of English words
- Pereira, Tishby, et al.
- 1993
(Show Context)
Citation Context ...al documents” in any usual sense. This applies to many other kinds of sparse discrete data: low intensity images (such as astronomical images) and verb-noun data used in language models introduced by =-=[27]-=-, for instance. DPCA, then, places constraints on the approximating matrices in Figure 1(b) so that they are also non-negative. Also, there are fundamentally two different kinds of large sample approx... |

479 | E.: Independent component analysis: Algorithms and applications
- Hyvarinen, Oja
- 2000
(Show Context)
Citation Context ...used if counts are small. DPCA then avoids Gaussian modelling of the data. 2.2 Independent Components Independent component analysis (ICA) was also developed as an alternative to PCA. Hyvänen and Oja =-=[20]-=- argue that PCA methods merely decorrelate data, finding uncorrelated components. ICA then was developed as a way of representing multivariate data with truly independent components. The basic formula... |

430 | Introduction to Probability Models - Ross - 2000 |

410 |
Inference of population structure using multilocus genotype data
- Pritchard, Stephens, et al.
- 2000
(Show Context)
Citation Context ...ta that appears under many names: grade of membership (GOM) [31], probabilistic latent semantic indexing (PLSI) [17], non-negative matrix factorisation (NMF) [23], genotype inference using admixtures =-=[29]-=-, latent Dirichlet allocation (LDA) [5], multinomial PCA (MPCA) [6], multiple aspect modelling [26], and Gamma-Poisson models (GaP) [9]. We refer to these methods jointly as Discrete PCA (DPCA), and t... |

197 | Pairwise data clustering by deterministic annealing - HOFMANN, BUHMANN - 1997 |

150 |
Rao-bleickwellisation of Sampling Schemes
- Casella, Robert
- 1996
(Show Context)
Citation Context ...field algorithms, excepting that sampling is done instead of maximisation or expectation. Rao-Blackwellised Gibbs sampling: The Griffiths-Steyvers’ style Gibbs sampling is a Rao-Blackwellisation (see =-=[11]-=-) of direct Gibbs sampling. It takes the likelihood versions of Equations (7) and (10) multiplied up for each document together with a prior on Θ, and marginalises out Θ. As before, denote the hidden ... |

110 | Expectation-propagation for the generative aspect model
- Minka, Lafferty
- 2002
(Show Context)
Citation Context ...exing (PLSI) [17], non-negative matrix factorisation (NMF) [23], genotype inference using admixtures [29], latent Dirichlet allocation (LDA) [5], multinomial PCA (MPCA) [6], multiple aspect modelling =-=[26]-=-, and Gamma-Poisson models (GaP) [9]. We refer to these methods jointly as Discrete PCA (DPCA), and this article provides a unifying model for them. Note also, that it is possible these methods existe... |

108 | A generalizaionof principal component analysis to the exponential family
- Collins, Dasgupta, et al.
- 2001
(Show Context)
Citation Context ...l family with dual parameter given by (Θl). Our formulation then generalises PCA in the same way that linear models [25] generalises linear regression 3 . Note, an alternative has also been presented =-=[13]-=- where w has an exponential family distribution with natural parameters given by (Θl). For the Bernoulli with probability p, the natural parameter is log p/(1 − p), for the Poisson with rate λ, the na... |

80 | Variational extensions to EM and multinomial PCA
- Buntine
- 2002
(Show Context)
Citation Context ... if it takes the same functional form as the prior probabilities for the hidden variables. The functional form of the approximation can be derived by inspection of the recursive functional forms (see =-=[6]-=- Equation (4)): q(l) ←→ 1 Zl q(v) ←→ 1 Zv exp � E q(v) {log p (l, v, w | Θ, α, β, K)} � exp � E q(l) {log p (l, v, w | Θ, α, β, K)} � , (11) where the Zl and Zr are normalising constants. An important... |

54 | Applying discrete PCA in data analysis
- Buntine, Jakulin
- 2004
(Show Context)
Citation Context ...s) is in genotype inference due to the Structure program [29]. A growing body of work is in text classification and topic modelling (see [16, 8]), and language modelling in information retrieval (see =-=[1, 7, 9]-=-). Here we present in Section 3 a unified theory for analysis of components in discrete data, and compare the methods with related techniques. The main families of algorithms discussed are mean field,... |

47 |
GAP: A factor model for discrete data
- Canny
- 2004
(Show Context)
Citation Context ...x factorisation (NMF) [23], genotype inference using admixtures [29], latent Dirichlet allocation (LDA) [5], multinomial PCA (MPCA) [6], multiple aspect modelling [26], and Gamma-Poisson models (GaP) =-=[9]-=-. We refer to these methods jointly as Discrete PCA (DPCA), and this article provides a unifying model for them. Note also, that it is possible these methods existed in reduced form in other statistic... |

30 | Analyzing attribute dependencies
- Jakulin, Bratko
- 2003
(Show Context)
Citation Context ...lude that components may help with tightly coupled categories that require conjunctions of words (20 newsgroups), but not with the keyword-identifiable categories (Reuters). Judging from the ideas in =-=[21]-=-, the components help in two cases: a) when the co-appearance of two words is more informative than sum of informativeness of individual appearancesof either word, and b) when the appearance of one wo... |

25 |
Non-parametric unfolding of binary choice data. Political Analysis 8:211–32. http://voteview. uh.edu/apsa2.pdf
- Poole
- 2000
(Show Context)
Citation Context ...ints ‘explain’ the positive correlations between the senators’ votes. The ideal points for each senator can be obtained either by optimization, for instance, with the optimal classification algorithm =-=[28]-=-, or through Bayesian modelling [12]. Unlike the spatial models, the DPCA interprets the correlations between votes through membership of the senators in similar blocs. Blocs correspond to hidden comp... |

20 | Risjbergen, K.: Investigating the Relationship between Language Model Perplexity and IR Precision-Recall Measures
- Azzopardi, Girolami, et al.
(Show Context)
Citation Context ...s) is in genotype inference due to the Structure program [29]. A growing body of work is in text classification and topic modelling (see [16, 8]), and language modelling in information retrieval (see =-=[1, 7, 9]-=-). Here we present in Section 3 a unified theory for analysis of components in discrete data, and compare the methods with related techniques. The main families of algorithms discussed are mean field,... |

17 | Topic identification in dynamical text by complexity pursuit
- Bingham, Kaban, et al.
- 2003
(Show Context)
Citation Context ...tly zeros: the equation can only hold if l and Θ are discrete as well and thus the gradientbased algorithms for ICA cannot be justified. To get around this in practice, when applying ICA to documents =-=[4]-=-, word counts are sometimes first turned into TF-IDF scores [2]. To arrive at a formulation more suited to discrete data, we can relax the equality in ICA (i.e., w = Θl) to be an expectation: Exp p(w|... |

15 | Dirichlet enhanced latent semantic analysis
- Yu, Yu, et al.
- 2005
(Show Context)
Citation Context ...easily support the discovery of several hundred components. Dirichlet processes have been developed as an alternative to the K-dimensional component priors in the Dirichletmultinomial/discrete model =-=[32]-=-, although in implementation the effect is to use K-dimensional Dirichlets for a large K and delete low performing components. 6 Applications This section briefly discusses a two applications of the m... |

12 |
The statistical analysis of roll call voting: A unified approach
- Clinton, Jackman, et al.
- 2004
(Show Context)
Citation Context ...oll call data in political science, and they often postulate a model of rational decision making. Each senator is modelled as a position or an ideal point in a continuous spatial model of preferences =-=[12]-=-. For example, the first dimension often delineates the liberal-conservative preference, and the second region or social issues preference. The proximities between ideal points ‘explain’ the positive ... |

9 | Using discrete PCA on web pages - Buntine, Perttu, et al. - 2004 |

7 |
Bayesian Model choice via MCMC
- Carlin, Chib
- 1995
(Show Context)
Citation Context ...e number of components K: A simple scheme exists within importance sampling in Gibbs to estimate the evidence term for a DPCA model, proposed by [7], first proposed in the general sampling context by =-=[10]-=-. One would like to find the value of K with the highest posterior probability (or, for instance, cross validation score). In popular terms, this could be used to find the “right” number of components... |

6 |
A new procedure for analysis of medical classification
- Woodbury, Manton
- 1982
(Show Context)
Citation Context ...data summarization. Relatively recently the statistical computing community has become aware of seemingly similar approaches for discrete data that appears under many names: grade of membership (GOM) =-=[31]-=-, probabilistic latent semantic indexing (PLSI) [17], non-negative matrix factorisation (NMF) [23], genotype inference using admixtures [29], latent Dirichlet allocation (LDA) [5], multinomial PCA (MP... |

4 |
Principal component analysis of binary data by iterated singular value decomposition
- Leeuw
- 2006
(Show Context)
Citation Context ... in blocs. Each senator is represented with a vertical bar of 5 squares that indicate his or her membership in blocs. We have arranged the senators from left to right using the binary PCA approach of =-=[14]-=-. This ordering attempts to sort senators from the most extreme to the most moderate and to the most extreme again. Figure 5 shows the Democrat senators and Figure 6 the Republicans. 1 2 3 4 5 Boxer (... |