## Is multinomial PCA multi-faceted clustering or dimensionality reduction (2003)

### Cached

### Download Links

Venue: | in Proceedings of the Ninth International Workshop on Artificial Intelligence and |

Citations: | 16 - 0 self |

### BibTeX

@INPROCEEDINGS{Buntine03ismultinomial,

author = {Wray Buntine and Sami Perttu},

title = {Is multinomial PCA multi-faceted clustering or dimensionality reduction},

booktitle = {in Proceedings of the Ninth International Workshop on Artificial Intelligence and},

year = {2003},

pages = {300--307}

}

### Years of Citing Articles

### OpenURL

### Abstract

Discrete analogues to Principal Components Analysis (PCA) are intended to handle discrete or positive-only data, for instance sets of documents. The class of methods is appropriately called multinomial PCA because it replaces the Gaussian in the probabilistic formulation of PCA with a multinomial. Experiments to date, however, have been on small data sets, for instance, from early information retrieval collections. This paper demonstrates the method on two large data sets and considers two extremes of behaviour: (1) dimensionality reduction where the feature set (i.e., bag of words) is considerably reduced, and (2) multi-faceted clustering (or aspect modelling) where clustering is done but items can now belong in several clusters at once.

### Citations

2775 | Normalized cuts and image segmentation
- SHI, MALIK
- 1997
(Show Context)
Citation Context ...ta such as documents is clustering or unsupervised learning. A rich variety of methods exist borrowing theory and algorithms from a broad spectrum of computer science: spectral (eigenvector) methods (=-=Shi & Malik, 2000-=-), kd-trees (Moore, 1998), using existing high-performance graph partitioning algorithms from CAD (Han, Karypis, Kumar, & Mobasher, 1997), clustering to be informative about an auxiliary variable (Per... |

2666 | Modern Information Retrieval
- Baeza-Yates, Ribeiro-Neto
- 1999
(Show Context)
Citation Context ...s a similar purpose (Kanth, Agrawal, Abbadi, & Singh, 1999). Ad hoc dimensionality reduction techniques range from greedy feature selection schemes to ordering by information or independence metrics (=-=Baeza-Yates & Ribeiro-Neto, 1999-=-). The state of the art here is Principal Components Analysis (PCA), even in discrete applications. In text applications it is a PCA variant called latent semantic indexing LSI (Baeza-Yates & Ribeiro-... |

2634 | Latent dirichlet allocation
- Blei, Ng, et al.
- 2003
(Show Context)
Citation Context ...ecently proposed discrete analogues to PCA. Methods include non-negative matrix factorization (Lee & Seung, 1999), probabilistic latent semantic analysis (Hofmann, 1999), latent Dirichlet allocation (=-=Blei, Ng, & Jordan, 2002-=-), and generative aspect models (Minka & Lafferty, 2002). A good discussion of the motivation for these techniques can be found in (Hofmann, 1999), a more sophisticated statistical analysis is (Minka ... |

1094 |
Learning the parts of objects by non-negative matrix factorization
- Lee, Seung
- 1999
(Show Context)
Citation Context ... tributional clustering (Baker & McCallum, 1998). As a substitute to PCA on discrete data, authors have recently proposed discrete analogues to PCA. Methods include non-negative matrix factorization (=-=Lee & Seung, 1999-=-), probabilistic latent semantic analysis (Hofmann, 1999), latent Dirichlet allocation (Blei, Ng, & Jordan, 2002), and generative aspect models (Minka & Lafferty, 2002). A good discussion of the motiv... |

874 | Probabilistic latent semantic indexing
- Hofmann
- 1999
(Show Context)
Citation Context ...s. In text applications it is a PCA variant called latent semantic indexing LSI (Baeza-Yates & Ribeiro-Neto, 1999). A rich body of practical experience indicates LSI is not ideal for the task (e.g., (=-=Hofmann, 1999-=-; Chakrabarti & Mehrotra, 2000)), and theoretical justifications use unrealistic assumptions (Papadimitriou, Raghavan, Tamaki, & Vempala, 1998). The use of cluster centers as vectors, a heuristic meth... |

568 | Distributional Clustering of English Words
- Pereira, Tishby, et al.
- 1993
(Show Context)
Citation Context ...000), kd-trees (Moore, 1998), using existing high-performance graph partitioning algorithms from CAD (Han, Karypis, Kumar, & Mobasher, 1997), clustering to be informative about an auxiliary variable (=-=Pereira, Tishby, & Lee, 1993-=-; Tishby, Pereira, & Bialek, 1999), hierarchical algorithms (Vaithyanathan & Dom, 2000) and data merging algorithms (Bradley, Fayyad, & Reina, 1998). All these methods, however, have one significant d... |

450 | The information bottleneck method
- Tishby, Pereira, et al.
- 1999
(Show Context)
Citation Context ...using existing high-performance graph partitioning algorithms from CAD (Han, Karypis, Kumar, & Mobasher, 1997), clustering to be informative about an auxiliary variable (Pereira, Tishby, & Lee, 1993; =-=Tishby, Pereira, & Bialek, 1999-=-), hierarchical algorithms (Vaithyanathan & Dom, 2000) and data merging algorithms (Bradley, Fayyad, & Reina, 1998). All these methods, however, have one significant drawback for typical application i... |

422 | Mixtures of probabilistic principal component analyzers
- Tipping, Bishop
- 1999
(Show Context)
Citation Context ...is in (Buntine, 2002). We refer to the method as multinomial PCA (mPCA) because it is a precise multinomial analogue to Tipping et al.’s elegant formulation of PCA as a Gaussian mixture of Gaussians (=-=Tipping & Bishop, 1999-=-). 1.2 OVERVIEW This paper describes our experiments intended to understand mPCA and whether it should be called a multi-faceted clustering algorithm or a dimensionality reduction algorithm. Note that... |

259 | Scaling clustering algorithms to large databases
- Bradley, Fayyad, et al.
- 1998
(Show Context)
Citation Context ...stering to be informative about an auxiliary variable (Pereira, Tishby, & Lee, 1993; Tishby, Pereira, & Bialek, 1999), hierarchical algorithms (Vaithyanathan & Dom, 2000) and data merging algorithms (=-=Bradley, Fayyad, & Reina, 1998-=-). All these methods, however, have one significant drawback for typical application in areas such as document or image analysis: each item/document is to be classified exclusively to one class. Their... |

252 | Distributional clustering of words for text classification
- Baker, McCallum
- 1998
(Show Context)
Citation Context ...mpala, 1998). The use of cluster centers as vectors, a heuristic method with no formal basis, performs better than LSI (Karypis & Han, 2000), as does dis2 MULTINOMIAL PCA 301 tributional clustering (=-=Baker & McCallum, 1998-=-). As a substitute to PCA on discrete data, authors have recently proposed discrete analogues to PCA. Methods include non-negative matrix factorization (Lee & Seung, 1999), probabilistic latent semant... |

110 | Local dimensionality reduction: A new approach to indexing high dimensional spaces
- Chakrabarti, Mehrotra
- 2000
(Show Context)
Citation Context ...ications it is a PCA variant called latent semantic indexing LSI (Baeza-Yates & Ribeiro-Neto, 1999). A rich body of practical experience indicates LSI is not ideal for the task (e.g., (Hofmann, 1999; =-=Chakrabarti & Mehrotra, 2000-=-)), and theoretical justifications use unrealistic assumptions (Papadimitriou, Raghavan, Tamaki, & Vempala, 1998). The use of cluster centers as vectors, a heuristic method with no formal basis, perfo... |

103 | Dimensionality reduction for similarity searching in dynamic databases
- Kanth, Agrawal, et al.
- 1998
(Show Context)
Citation Context ...austive or complex algorithm such as nearest neighbor or logistic regression. To perform search in high dimensional spaces, such as image databases, dimensionality reduction serves a similar purpose (=-=Kanth, Agrawal, Abbadi, & Singh, 1999-=-). Ad hoc dimensionality reduction techniques range from greedy feature selection schemes to ordering by information or independence metrics (Baeza-Yates & Ribeiro-Neto, 1999). The state of the art he... |

96 | Very fast em-based mixture model clustering using multiresolution kdtrees
- Moore
- 1999
(Show Context)
Citation Context ...ring or unsupervised learning. A rich variety of methods exist borrowing theory and algorithms from a broad spectrum of computer science: spectral (eigenvector) methods (Shi & Malik, 2000), kd-trees (=-=Moore, 1998-=-), using existing high-performance graph partitioning algorithms from CAD (Han, Karypis, Kumar, & Mobasher, 1997), clustering to be informative about an auxiliary variable (Pereira, Tishby, & Lee, 199... |

93 | Clustering based on association rule hypergraphs (position paper
- Han, Karypis, et al.
- 1997
(Show Context)
Citation Context ...gorithms from a broad spectrum of computer science: spectral (eigenvector) methods (Shi & Malik, 2000), kd-trees (Moore, 1998), using existing high-performance graph partitioning algorithms from CAD (=-=Han, Karypis, Kumar, & Mobasher, 1997-=-), clustering to be informative about an auxiliary variable (Pereira, Tishby, & Lee, 1993; Tishby, Pereira, & Bialek, 1999), hierarchical algorithms (Vaithyanathan & Dom, 2000) and data merging algori... |

82 | Variational extensions to EM and multinomial PCA
- Buntine
- 2002
(Show Context)
Citation Context ...2002). A good discussion of the motivation for these techniques can be found in (Hofmann, 1999), a more sophisticated statistical analysis is (Minka & Lafferty, 2002), and a unifying treatment is in (=-=Buntine, 2002).-=- We refer to the method as multinomial PCA (mPCA) because it is a precise multinomial analogue to Tipping et al.’s elegant formulation of PCA as a Gaussian mixture of Gaussians (Tipping & Bishop, 19... |

75 | Concept Indexing: a Fast Dimensionality Reduction Algorithm with Applications to Document Retrieval and Categorization
- Karypis, Han
- 2000
(Show Context)
Citation Context ...ustifications use unrealistic assumptions (Papadimitriou, Raghavan, Tamaki, & Vempala, 1998). The use of cluster centers as vectors, a heuristic method with no formal basis, performs better than LSI (=-=Karypis & Han, 2000-=-), as does dis2 MULTINOMIAL PCA 301 tributional clustering (Baker & McCallum, 1998). As a substitute to PCA on discrete data, authors have recently proposed discrete analogues to PCA. Methods include... |

61 | A new family of online algorithms for category ranking
- Crammer, Singer
(Show Context)
Citation Context ...by a document can be assigned using a convex combination to a number of clusters rather than uniquely to one cluster. This is an unsupervised version of the so-called multi-class classification task (=-=Crammer & Singer, 2002-=-). A body of techniques with completely different goals is known as dimensionality reduction: they seek to reduce the dimensions of an item/document. Suppose each is represented by a vector of 5000 di... |

7 |
Latent sematic indexing: A probabilistic analysis
- Papadimitriou, Raghavan, et al.
- 1998
(Show Context)
Citation Context ..., 1999). A rich body of practical experience indicates LSI is not ideal for the task (e.g., (Hofmann, 1999; Chakrabarti & Mehrotra, 2000)), and theoretical justifications use unrealistic assumptions (=-=Papadimitriou, Raghavan, Tamaki, & Vempala, 1998-=-). The use of cluster centers as vectors, a heuristic method with no formal basis, performs better than LSI (Karypis & Han, 2000), as does dis2 MULTINOMIAL PCA 301 tributional clustering (Baker & McC... |

1 |
Estimating a Dirichlet distribution. CMU. (Course notes
- Minka
- 2000
(Show Context)
Citation Context ...nd a Dirichlet sampling respectively. The last rewrite rule (4) for �α gives its dual parameters (according to exponential family convention), which can be immediately inverted using Minka’s meth=-=ods (Minka, 2000-=-). Note that Formula (5) can be updated while the rest of the major calculations are being done, and thus the probability bounds can be efficiently maintained. Be warned, however, that these are bound... |

1 | Modern information retrieval - Baeza-ates - 1999 |

1 | A new family of online algorithms for category ranking - Cramruer - 2002 |

1 | Concept indexing: A fast dimcnsionality reduction algorithm with applications to document retrieval and categorization - Karypis, &Han - 2000 |

1 | Very fast EM-based mixture modal clustering using multiresolution kd-tree - Moore - 1998 |