Results 1  10
of
668
Itembased Collaborative Filtering Recommendation Algorithms
 PROC. 10TH INTERNATIONAL CONFERENCE ON THE WORLD WIDE WEB
, 2001
"... ..."
Tensor Decompositions and Applications
 SIAM REVIEW
, 2009
"... This survey provides an overview of higherorder tensor decompositions, their applications, and available software. A tensor is a multidimensional or N way array. Decompositions of higherorder tensors (i.e., N way arrays with N â¥ 3) have applications in psychometrics, chemometrics, signal proce ..."
Abstract

Cited by 705 (17 self)
 Add to MetaCart
(Show Context)
This survey provides an overview of higherorder tensor decompositions, their applications, and available software. A tensor is a multidimensional or N way array. Decompositions of higherorder tensors (i.e., N way arrays with N â¥ 3) have applications in psychometrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, etc. Two particular tensor decompositions can be considered to be higherorder extensions of the matrix singular value decompo
sition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rankone tensors, and the Tucker decomposition is a higherorder form of principal components analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The Nway Toolbox and Tensor Toolbox, both for MATLAB, and the Multilinear Engine are examples of software packages for working with tensors.
Unsupervised Learning by Probabilistic Latent Semantic Analysis
 Machine Learning
, 2001
"... Abstract. This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of cooccurren ..."
Abstract

Cited by 612 (4 self)
 Add to MetaCart
Abstract. This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of cooccurrence tables, the proposed technique uses a generative latent class model to perform a probabilistic mixture decomposition. This results in a more principled approach with a solid foundation in statistical inference. More precisely, we propose to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice. Probabilistic Latent Semantic Analysis has many applications, most prominently in information retrieval, natural language processing, machine learning from text, and in related areas. The paper presents perplexity results for different types of text and linguistic data collections and discusses an application in automated document indexing. The experiments indicate substantial and consistent improvements of the probabilistic method over standard Latent Semantic Analysis.
Coclustering documents and words using Bipartite Spectral Graph Partitioning
, 2001
"... ..."
(Show Context)
Concept Decompositions for Large Sparse Text Data using Clustering
 Machine Learning
, 2000
"... . Unlabeled document collections are becoming increasingly common and available; mining such data sets represents a major contemporary challenge. Using words as features, text documents are often represented as highdimensional and sparse vectorsa few thousand dimensions and a sparsity of 95 to 99 ..."
Abstract

Cited by 401 (27 self)
 Add to MetaCart
(Show Context)
. Unlabeled document collections are becoming increasingly common and available; mining such data sets represents a major contemporary challenge. Using words as features, text documents are often represented as highdimensional and sparse vectorsa few thousand dimensions and a sparsity of 95 to 99% is typical. In this paper, we study a certain spherical kmeans algorithm for clustering such document vectors. The algorithm outputs k disjoint clusters each with a concept vector that is the centroid of the cluster normalized to have unit Euclidean norm. As our first contribution, we empirically demonstrate that, owing to the highdimensionality and sparsity of the text data, the clusters produced by the algorithm have a certain "fractallike" and "selfsimilar" behavior. As our second contribution, we introduce concept decompositions to approximate the matrix of document vectors; these decompositions are obtained by taking the leastsquares approximation onto the linear subspace spanned...
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract

Cited by 400 (0 self)
 Add to MetaCart
(Show Context)
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
Latent semantic indexing: A probabilistic analysis
, 1998
"... Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the termdocument matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underl ..."
Abstract

Cited by 315 (7 self)
 Add to MetaCart
Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the termdocument matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. We also propose the technique of random projection as a way of speeding up LSI. We complement our theorems with encouraging experimental results. We also argue that our results may be viewed in a more general framework, as a theoretical basis for the use of spectral methods in a wider class of applications such as collaborative filtering.
Dependencybased construction of semantic space models
 Computational Linguistics
, 2007
"... Traditionally, vectorbased semantic space models use word cooccurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that takes syntactic relations into account. We introduce a formalization for this class of m ..."
Abstract

Cited by 229 (14 self)
 Add to MetaCart
Traditionally, vectorbased semantic space models use word cooccurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that takes syntactic relations into account. We introduce a formalization for this class of models, which allows linguistic knowledge to guide the construction process. We evaluate our framework on a range of tasks relevant for cognitive science and natural language processing: semantic priming, synonymy detection, and word sense disambiguation. In all cases, our framework obtains results that are comparable or superior to the state of the art. 1.
Recovering DocumentationtoSourceCode Traceability Links using Latent Semantic Indexing
"... An information retrieval technique, latent semantic indexing, is used to automatically identi traceability links from system documentation to program source code. The results of two experiments to identi links in existing software systems (i.e., the LEDA library, and Albergate) are presented. These ..."
Abstract

Cited by 228 (12 self)
 Add to MetaCart
An information retrieval technique, latent semantic indexing, is used to automatically identi traceability links from system documentation to program source code. The results of two experiments to identi links in existing software systems (i.e., the LEDA library, and Albergate) are presented. These results are compared with other similar type experimental results of traceability link identification using different types of information retrieval techniques. The method presented proves to give good results by comparison and additionally it is a low cost, highly flexible method to apply with regards to preprocessing and/or parsing of the source code and documentation.