Results 1  10
of
231
Indexing by latent semantic analysis
 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
, 1990
"... A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higherorder structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The p ..."
Abstract

Cited by 2703 (32 self)
 Add to MetaCart
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higherorder structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singularvalue decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudodocument vectors formed from weighted combinations of terms, and documents with suprathreshold cosine values are returned. initial tests find this completely automatic method for retrieval to be promising.
Tensor Decompositions and Applications
 SIAM REVIEW
, 2009
"... This survey provides an overview of higherorder tensor decompositions, their applications, and available software. A tensor is a multidimensional or N way array. Decompositions of higherorder tensors (i.e., N way arrays with N â¥ 3) have applications in psychometrics, chemometrics, signal proce ..."
Abstract

Cited by 228 (14 self)
 Add to MetaCart
This survey provides an overview of higherorder tensor decompositions, their applications, and available software. A tensor is a multidimensional or N way array. Decompositions of higherorder tensors (i.e., N way arrays with N â¥ 3) have applications in psychometrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, etc. Two particular tensor decompositions can be considered to be higherorder extensions of the matrix singular value decompo
sition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rankone tensors, and the Tucker decomposition is a higherorder form of principal components analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The Nway Toolbox and Tensor Toolbox, both for MATLAB, and the Multilinear Engine are examples of software packages for working with tensors.
From frequency to meaning : Vector space models of semantics
 Journal of Artificial Intelligence Research
, 2010
"... Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are begi ..."
Abstract

Cited by 116 (2 self)
 Add to MetaCart
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field. 1.
Nonnegative tensor factorization with applications to statistics and computer vision
 In Proceedings of the International Conference on Machine Learning (ICML
, 2005
"... We derive algorithms for finding a nonnegative ndimensional tensor factorization (nNTF) which includes the nonnegative matrix factorization (NMF) as a particular case when n = 2. We motivate the use of nNTF in three areas of data analysis: (i) connection to latent class models in statistics, (ii ..."
Abstract

Cited by 78 (5 self)
 Add to MetaCart
We derive algorithms for finding a nonnegative ndimensional tensor factorization (nNTF) which includes the nonnegative matrix factorization (NMF) as a particular case when n = 2. We motivate the use of nNTF in three areas of data analysis: (i) connection to latent class models in statistics, (ii) sparse image coding in computer vision, and (iii) model selection problems. We derive a ”direct ” positivepreserving gradient descent algorithm and an alternating scheme based on repeated multiple rank1 problems. 1.
TENSOR RANK AND THE ILLPOSEDNESS OF THE BEST LOWRANK APPROXIMATION PROBLEM
"... There has been continued interest in seeking a theorem describing optimal lowrank approximations to tensors of order 3 or higher, that parallels the Eckart–Young theorem for matrices. In this paper, we argue that the naive approach to this problem is doomed to failure because, unlike matrices, te ..."
Abstract

Cited by 75 (10 self)
 Add to MetaCart
There has been continued interest in seeking a theorem describing optimal lowrank approximations to tensors of order 3 or higher, that parallels the Eckart–Young theorem for matrices. In this paper, we argue that the naive approach to this problem is doomed to failure because, unlike matrices, tensors of order 3 or higher can fail to have best rankr approximations. The phenomenon is much more widespread than one might suspect: examples of this failure can be constructed over a wide range of dimensions, orders and ranks, regardless of the choice of norm (or even Brègman divergence). Moreover, we show that in many instances these counterexamples have positive volume: they cannot be regarded as isolated phenomena. In one extreme case, we exhibit a tensor space in which no rank3 tensor has an optimal rank2 approximation. The notable exceptions to this misbehavior are rank1 tensors and order2 tensors (i.e. matrices). In a more positive spirit, we propose a natural way of overcoming the illposedness of the lowrank approximation problem, by using weak solutions when true solutions do not exist. For this to work, it is necessary to characterize the set of weak solutions, and we do this in the case of rank 2, order 3 (in arbitrary dimensions). In our work we emphasize the importance of closely studying concrete lowdimensional examples as a first step towards more general results. To this end, we present a detailed analysis of equivalence classes of 2 × 2 × 2 tensors, and we develop methods for extending results upwards to higher orders and dimensions. Finally, we link our work to existing studies of tensors from an algebraic geometric point of view. The rank of a tensor can in theory be given a semialgebraic description; in other words, can be determined by a system of polynomial inequalities. We study some of these polynomials in cases of interest to us; in particular we make extensive use of the hyperdeterminant ∆ on R 2×2×2.
Blind PARAFAC receivers for DSCDMA systems
 IEEE TRANS. SIGNAL PROCESSING
, 2000
"... This paper links the directsequence codedivision multiple access (DSCDMA) multiuser separationequalizationdetection problem to the parallel factor (PARAFAC) model, which is an analysis tool rooted in psychometrics and chemometrics. Exploiting this link, it derives a deterministic blind PARAFAC ..."
Abstract

Cited by 71 (14 self)
 Add to MetaCart
This paper links the directsequence codedivision multiple access (DSCDMA) multiuser separationequalizationdetection problem to the parallel factor (PARAFAC) model, which is an analysis tool rooted in psychometrics and chemometrics. Exploiting this link, it derives a deterministic blind PARAFAC DSCDMA receiver with performance close to nonblind minimum meansquared error (MMSE). The proposed PARAFAC receiver capitalizes on code, spatial, and temporal diversitycombining, thereby supporting small sample sizes, more users than sensors, and/or less spreading than users. Interestingly, PARAFAC does not require knowledge of spreading codes, the specifics of multipath (interchip interference), DOAcalibration information, finite alphabet/constant modulus, or statistical independence/whiteness to recover the informationbearing signals. Instead, PARAFAC relies on a fundamental result regarding the uniqueness of lowrank threeway array decomposition due to Kruskal (and generalized herein to the complexvalued case) that guarantees identifiability of all relevant signals and propagation parameters. These and other issues are also demonstrated in pertinent simulation experiments.
Beyond streams and graphs: Dynamic tensor analysis
 In KDD
, 2006
"... How do we find patterns in authorkeyword associations, evolving over time? Or in DataCubes, with productbranchcustomer sales information? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule ..."
Abstract

Cited by 70 (11 self)
 Add to MetaCart
How do we find patterns in authorkeyword associations, evolving over time? Or in DataCubes, with productbranchcustomer sales information? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks and many more. However, they have only two orders, like author and keyword, in the above example. We propose to envision such higher order data as tensors, and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semiinfinite streams. Thus, we introduce the dynamic tensor analysis (DTA) method, and its variants. DTA provides a compact summary for highorder and highdimensional data, and it also reveals the hidden correlations. Algorithmically, we designed DTA very carefully so that it is (a) scalable, (b) space efficient (it does not need to store the past) and (c) fully automatic with no need for user defined parameters. Moreover, we propose STA, a streaming tensor analysis method, which provides a fast, streaming approximation to DTA. We implemented all our methods, and applied them in two real settings, namely, anomaly detection and multiway latent semantic indexing. We used two real, large datasets, one on network flow data (100GB over 1 month) and one from DBLP (200MB over 25 years). Our experiments show that our methods are fast, accurate and that they find interesting patterns and outliers on the real datasets. 1.
Parallel Factor Analysis in Sensor Array Processing
 IEEE TRANS. SIGNAL PROCESSING
, 2000
"... This paper links multiple invariance sensor array processing (MISAP) to parallel factor (PARAFAC) analysis, which is a tool rooted in psychometrics and chemometrics. PARAFAC is a common name for lowrank decomposition of three and higher way arrays. This link facilitates the derivation of power ..."
Abstract

Cited by 69 (15 self)
 Add to MetaCart
This paper links multiple invariance sensor array processing (MISAP) to parallel factor (PARAFAC) analysis, which is a tool rooted in psychometrics and chemometrics. PARAFAC is a common name for lowrank decomposition of three and higher way arrays. This link facilitates the derivation of powerful identifiability results for MISAP, shows that the uniqueness of single and multipleinvariance ESPRIT stems from uniqueness of lowrank decomposition of threeway arrays, and allows tapping on the available expertise for fitting the PARAFAC model. The results are applicable to both datadomain and subspace MISAP formulations. The paper also includes a constructive uniqueness proof for a special PARAFAC model.
Symmetric tensors and symmetric tensor rank
 Scientific Computing and Computational Mathematics (SCCM
, 2006
"... Abstract. A symmetric tensor is a higher order generalization of a symmetric matrix. In this paper, we study various properties of symmetric tensors in relation to a decomposition into a symmetric sum of outer product of vectors. A rank1 orderk tensor is the outer product of k nonzero vectors. An ..."
Abstract

Cited by 46 (19 self)
 Add to MetaCart
Abstract. A symmetric tensor is a higher order generalization of a symmetric matrix. In this paper, we study various properties of symmetric tensors in relation to a decomposition into a symmetric sum of outer product of vectors. A rank1 orderk tensor is the outer product of k nonzero vectors. Any symmetric tensor can be decomposed into a linear combination of rank1 tensors, each of them being symmetric or not. The rank of a symmetric tensor is the minimal number of rank1 tensors that is necessary to reconstruct it. The symmetric rank is obtained when the constituting rank1 tensors are imposed to be themselves symmetric. It is shown that rank and symmetric rank are equal in a number of cases, and that they always exist in an algebraically closed field. We will discuss the notion of the generic symmetric rank, which, due to the work of Alexander and Hirschowitz, is now known for any values of dimension and order. We will also show that the set of symmetric tensors of symmetric rank at most r is not closed, unless r = 1. Key words. Tensors, multiway arrays, outer product decomposition, symmetric outer product decomposition, candecomp, parafac, tensor rank, symmetric rank, symmetric tensor rank, generic symmetric rank, maximal symmetric rank, quantics AMS subject classifications. 15A03, 15A21, 15A72, 15A69, 15A18 1. Introduction. We
Efficient MATLAB computations with sparse and factored tensors
 SIAM JOURNAL ON SCIENTIFIC COMPUTING
, 2007
"... In this paper, the term tensor refers simply to a multidimensional or $N$way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast majority of the elements are zero. We propose stori ..."
Abstract

Cited by 45 (13 self)
 Add to MetaCart
In this paper, the term tensor refers simply to a multidimensional or $N$way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast majority of the elements are zero. We propose storing sparse tensors using coordinate format and describe the computational efficiency of this scheme for various mathematical operations, including those typical to tensor decomposition algorithms. Second, we study factored tensors, which have the property that they can be assembled from more basic components. We consider two specific types: A Tucker tensor can be expressed as the product of a core tensor (which itself may be dense, sparse, or factored) and a matrix along each mode, and a Kruskal tensor can be expressed as the sum of rank1 tensors. We are interested in the case where the storage of the components is less than the storage of the full tensor, and we demonstrate that many elementary operations can be computed using only the components. All of the efficiencies described in this paper are implemented in the Tensor Toolbox for MATLAB.