Results 1 
6 of
6
Probabilistic Latent Semantic Indexing
, 1999
"... Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized ..."
Abstract

Cited by 874 (8 self)
 Add to MetaCart
(Show Context)
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domainspecific synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing (LSI) by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and defines a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching methodsaswell as over LSI. In particular, the combination of models with different dimensionalities has proven to be advantageous.
Probabilistic Latent Semantic Analysis
 In Proc. of Uncertainty in Artificial Intelligence, UAI’99
, 1999
"... Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of twomode and cooccurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Sema ..."
Abstract

Cited by 573 (7 self)
 Add to MetaCart
(Show Context)
Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of twomode and cooccurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of cooccurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of experiments.
Compared to standard
"... Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two{mode and cooccurrence data, which has applications in information retrieval and ltering, natural language processing, machine learning from text, and in related areas. ..."
Abstract
 Add to MetaCart
(Show Context)
Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two{mode and cooccurrence data, which has applications in information retrieval and ltering, natural language processing, machine learning from text, and in related areas.
Using Probabilistic Latent Semantic Analysis to Identify Web User Segments
"... Web usage mining techniques are often used to identify user access patterns. However, to understand the factors that lead to common navigational patterns, it is necessary to develop techniques that can automatically characterize the underlying tasks and to map these tasks to user segments. Latent Se ..."
Abstract
 Add to MetaCart
(Show Context)
Web usage mining techniques are often used to identify user access patterns. However, to understand the factors that lead to common navigational patterns, it is necessary to develop techniques that can automatically characterize the underlying tasks and to map these tasks to user segments. Latent Semantic Analysis (LSA) has been successfully applied in a variety of applications such as information retrieval and filtering, text learning, and cocitation analysis. In this paper, we use a probabilistic variant of LSA to analyze Web usage data and identify the latent factors which explain visitor segments based on their navigational behavior. Our experiments, performed on real usage data, show that this approach can successfully distinguish between different types of Web user segments according to the types of tasks performed by these users. 1
Latent Semantic Indexing Based on Factor Analysis
"... The main purpose of this paper is to propose a novel latent semantic indexing (LSI), statistical approach to simultaneously mapping documents and terms into a latent semantic space. This approach can index documents more effectively than the vector space model (VSM). Latent semantic indexing (LSI), ..."
Abstract
 Add to MetaCart
The main purpose of this paper is to propose a novel latent semantic indexing (LSI), statistical approach to simultaneously mapping documents and terms into a latent semantic space. This approach can index documents more effectively than the vector space model (VSM). Latent semantic indexing (LSI), which is based on singular value decomposition (SVD), and probabilistic latent semantic indexing (PLSI) have already been proposed to overcome problems in document indexing, but critical problems remain. In contrast to LSI and PLSI, our method uses a more meaningful, robust statistical model based on factor analysis and information theory. As a result, this model can solve the remaining critical problems in LSI and PLSI. Experimental results with a test collection showed that our method is superior to LSI and PLSI from the viewpoints of information retrieval and classification. We also propose a new term weighting method based on entropy. 1.