Results 1  10
of
120
Probabilistic Latent Semantic Indexing
, 1999
"... Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized ..."
Abstract

Cited by 1207 (11 self)
 Add to MetaCart
(Show Context)
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domainspecific synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing (LSI) by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and defines a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching methodsaswell as over LSI. In particular, the combination of models with different dimensionalities has proven to be advantageous.
Probabilistic Latent Semantic Analysis
 In Proc. of Uncertainty in Artificial Intelligence, UAI’99
, 1999
"... Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of twomode and cooccurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Sema ..."
Abstract

Cited by 760 (9 self)
 Add to MetaCart
(Show Context)
Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of twomode and cooccurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of cooccurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of experiments.
Learning probabilistic relational models
 In IJCAI
, 1999
"... A large portion of realworld data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with "flat " data representations. Thus, to apply these methods, we are forced to convert our data into a flat form, thereby losing much ..."
Abstract

Cited by 619 (31 self)
 Add to MetaCart
(Show Context)
A large portion of realworld data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with &quot;flat &quot; data representations. Thus, to apply these methods, we are forced to convert our data into a flat form, thereby losing much of the relational structure present in our database. This paper builds on the recent work on probabilistic relational models (PRMs), and describes how to learn them from databases. PRMs allow the properties of an object to depend probabilistically both on other properties of that object and on properties of related objects. Although PRMs are significantly more expressive than standard models, such as Bayesian networks, we show how to extend wellknown statistical methods for learning Bayesian networks to learn these models. We describe both parameter estimation and structure learning — the automatic induction of the dependency structure in a model. Moreover, we show how the learning procedure can exploit standard database retrieval techniques for efficient learning from large datasets. We present experimental results on both real and synthetic relational databases. 1
Unsupervised Learning by Probabilistic Latent Semantic Analysis
 Machine Learning
, 2001
"... Abstract. This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of cooccurren ..."
Abstract

Cited by 612 (4 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of cooccurrence tables, the proposed technique uses a generative latent class model to perform a probabilistic mixture decomposition. This results in a more principled approach with a solid foundation in statistical inference. More precisely, we propose to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice. Probabilistic Latent Semantic Analysis has many applications, most prominently in information retrieval, natural language processing, machine learning from text, and in related areas. The paper presents perplexity results for different types of text and linguistic data collections and discusses an application in automated document indexing. The experiments indicate substantial and consistent improvements of the probabilistic method over standard Latent Semantic Analysis.
Latent Class Models for Collaborative Filtering
 IN PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 1999
"... This paper presents a statistical approachto collaborative filtering and investigates the use of latent class models for predicting individual choices and preferences based on observed preference behavior. Two models are discussed and compared: the aspect model, a probabilistic latent space model w ..."
Abstract

Cited by 208 (4 self)
 Add to MetaCart
This paper presents a statistical approachto collaborative filtering and investigates the use of latent class models for predicting individual choices and preferences based on observed preference behavior. Two models are discussed and compared: the aspect model, a probabilistic latent space model which models individual preferences as a convex combination of preference factors, and the twosided clustering model, which simultaneously partitions persons and objects into clusters. We present EM algorithms for different variants of the aspect model and derive an approximate EM algorithm based on a variational principle for the twosided clustering model. The benefits of the different models are experimentally investigated on a large movie data set.
A Generalized Maximum Entropy Approach to Bregman Coclustering and Matrix Approximation
 In KDD
, 2004
"... Coclustering is a powerful data mining technique with varied applications such as text clustering, microarray analysis and recommender systems. Recently, an informationtheoretic coclustering approach applicable to empirical joint probability distributions was proposed. In many situations, coclust ..."
Abstract

Cited by 133 (29 self)
 Add to MetaCart
(Show Context)
Coclustering is a powerful data mining technique with varied applications such as text clustering, microarray analysis and recommender systems. Recently, an informationtheoretic coclustering approach applicable to empirical joint probability distributions was proposed. In many situations, coclustering of more general matrices is desired. In this paper, we present a substantially generalized coclustering framework wherein any Bregman divergence can be used in the objective function, and various conditional expectation based constraints can be considered based on the statistics that need to be preserved. Analysis of the coclustering problem leads to the minimum Bregman information principle, which generalizes the maximum entropy principle, and yields an elegant meta algorithm that is guaranteed to achieve local optimality. Our methodology yields new algorithms and also encompasses several previously known clustering and coclustering algorithms based on alternate minimization.
Inducing a Semantically Annotated Lexicon via EMBased Clustering
, 1999
"... We present a technique for automatic induction of slot annotations for subcategorization frames, based on induction of hidden classes in the EM framework of statistical estimation. The models are empirically evalutated by a general decision test. Induction of slot labeling for subcategorization fram ..."
Abstract

Cited by 113 (5 self)
 Add to MetaCart
We present a technique for automatic induction of slot annotations for subcategorization frames, based on induction of hidden classes in the EM framework of statistical estimation. The models are empirically evalutated by a general decision test. Induction of slot labeling for subcategorization frames is accomplished by a further application of EM, and applied experimentally on frame observations derived from parsing large corpora. We outline an interpretation of the learned representations as theoreticallinguistic decompositional lexical entries.
Discriminative Methods for MultiLabeled Classification
 In Proceedings of the 8th PacificAsia Conference on Knowledge Discovery and Data Mining
, 2004
"... In this paper we present methods of enhancing existing discriminative classifiers for multilabeled predictions. Discriminative methods like support vector machines perform very well for unilabeled text classification tasks. Multilabeled classification is a harder task subject to relatively le ..."
Abstract

Cited by 101 (0 self)
 Add to MetaCart
(Show Context)
In this paper we present methods of enhancing existing discriminative classifiers for multilabeled predictions. Discriminative methods like support vector machines perform very well for unilabeled text classification tasks. Multilabeled classification is a harder task subject to relatively less attention. In the multilabeled setting, classes are often related to each other or part of a isa hierarchy. We present a new technique for combining text features and features indicating relationships between classes, which can be used with any discriminative algorithm.
Rich Probabilistic Models for Gene Expression
, 2001
"... Clustering is commonly used for analyzing gene expression data. Despite their successes, clustering methods suffer from a number of limitations. First, these methods reveal similarities that exist over all of the measurements, while obscuring relationships that exist over only a subset of the data. ..."
Abstract

Cited by 88 (8 self)
 Add to MetaCart
Clustering is commonly used for analyzing gene expression data. Despite their successes, clustering methods suffer from a number of limitations. First, these methods reveal similarities that exist over all of the measurements, while obscuring relationships that exist over only a subset of the data. Second, clustering methods cannot readily incorporate additional types of information, such as clinical data or known attributes of genes. To circumvent these shortcomings, we propose the use of a single coherent probabilistic model, that encompasses much of the rich structure in the genomic expression data, while incorporating additional information such as experiment type, putative binding sites, or functional information. We show how this model can be learned from the data, allowing us to discover patterns in the data and dependencies between the gene expression patterns and additional attributes. The learned model reveals contextspecific relationships, that exist only over a subset of the experiments in the dataset. We demonstrate the power of our approach on synthetic data and on two realworld gene expression data sets for yeast. For example, we demonstrate a novel functionality that falls naturally out of our framework: predicting the “cluster” of the array resulting from a gene mutation based only on the gene’s expression pattern in the context of other mutations.
Clustering Based on Conditional Distributions in an Auxiliary Space
 Neural Computation
, 2001
"... We study the problem of learning groups or categories that are local ..."
Abstract

Cited by 84 (23 self)
 Add to MetaCart
We study the problem of learning groups or categories that are local