Results 1  10
of
271
Indexing by latent semantic analysis
 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
, 1990
"... A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higherorder structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The p ..."
Abstract

Cited by 2703 (32 self)
 Add to MetaCart
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higherorder structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singularvalue decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudodocument vectors formed from weighted combinations of terms, and documents with suprathreshold cosine values are returned. initial tests find this completely automatic method for retrieval to be promising.
Attention, similarity, and the identificationCategorization Relationship
, 1986
"... A unified quantitative approach to modeling subjects ' identification and categorization of multidimensional perceptual stimuli is proposed and tested. Two subjects identified and categorized the same set of perceptually confusable stimuli varying on separable dimensions. The identification data wer ..."
Abstract

Cited by 416 (27 self)
 Add to MetaCart
A unified quantitative approach to modeling subjects ' identification and categorization of multidimensional perceptual stimuli is proposed and tested. Two subjects identified and categorized the same set of perceptually confusable stimuli varying on separable dimensions. The identification data were modeled using Sbepard's (1957) multidimensional scalingchoice framework. This framework was then extended to model the subjects ' categorization performance. The categorization model, which generalizes the context theory of classification developed by Medin and Schaffer (1978), assumes that subjects store category exemplars in memory. Classification decisions are based on the similarity of stimuli to the stored exemplars. It is assumed that the same multidimensional perceptual representation underlies performance in both the identification and Categorization paradigms. However, because of the influence of selective attention, similarity relationships change systematically across the two paradigms. Some support was gained for the hypothesis that subjects distribute attention among component dimensions so as to optimize categorization performance. Evidence was also obtained that subjects may have augmented their category representations with inferred exemplars. Implications of the results for theories of multidimensional scaling and categorization are discussed.
Tensor Decompositions and Applications
 SIAM REVIEW
, 2009
"... This survey provides an overview of higherorder tensor decompositions, their applications, and available software. A tensor is a multidimensional or N way array. Decompositions of higherorder tensors (i.e., N way arrays with N â¥ 3) have applications in psychometrics, chemometrics, signal proce ..."
Abstract

Cited by 228 (14 self)
 Add to MetaCart
This survey provides an overview of higherorder tensor decompositions, their applications, and available software. A tensor is a multidimensional or N way array. Decompositions of higherorder tensors (i.e., N way arrays with N â¥ 3) have applications in psychometrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, etc. Two particular tensor decompositions can be considered to be higherorder extensions of the matrix singular value decompo
sition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rankone tensors, and the Tucker decomposition is a higherorder form of principal components analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The Nway Toolbox and Tensor Toolbox, both for MATLAB, and the Multilinear Engine are examples of software packages for working with tensors.
From frequency to meaning : Vector space models of semantics
 Journal of Artificial Intelligence Research
, 2010
"... Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are begi ..."
Abstract

Cited by 116 (2 self)
 Add to MetaCart
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field. 1.
A Framework for Robust Subspace Learning
 International Journal of Computer Vision
, 2003
"... Many computer vision, signal processing and statistical problems can be posed as problems of learning low dimensional linear or multilinear models. These models have been widely used for the representation of shape, appearance, motion, etc, in computer vision applications. ..."
Abstract

Cited by 94 (6 self)
 Add to MetaCart
Many computer vision, signal processing and statistical problems can be posed as problems of learning low dimensional linear or multilinear models. These models have been widely used for the representation of shape, appearance, motion, etc, in computer vision applications.
Toward a unified theory of similarity and recognition
 Psychological Review
, 1988
"... A new theory of similarity, rooted in the detection and recognition literatures, is developed. The general recognition theory assumes that the perceptual effect of a stimulus is random but that on any single trial it can be represented as a point in a multidimensional space. Similarity is a function ..."
Abstract

Cited by 75 (6 self)
 Add to MetaCart
A new theory of similarity, rooted in the detection and recognition literatures, is developed. The general recognition theory assumes that the perceptual effect of a stimulus is random but that on any single trial it can be represented as a point in a multidimensional space. Similarity is a function of the overlap of perceptual distributions. It is shown that the general recognition theory contains Euclidean distance models of similarity as a special case but that unlike them, it is not constrained by any distance axioms. Three experiments are reported that test the empirical validity of the theory. In these experiments the general recognition theory accounts for similarity data as well as the currently popular similarity theories do, and it accounts for identification data as well as the longstanding "champion " identification model does. The concept of similarity is of fundamental importance in psychology. Not only is there a vast literature concerned directly with the interpretation of subjective similarity judgments (e.g., as in multidimensional scaling) but the concept also plays a crucial but less direct role in the modeling of many psychophysical tasks. This is particularly true in the case of pattern and form recognition. It is frequently assumed that the greater the similarity between a pair of stimuli, the more likely one will be confused with the other in a recognition task (e.g., Luce, 1963; Shepard, 1964; Tversky & Gati, 1982). Yet despite the potentially close relationship between the two, there have been only a few attempts at developing theories that unify the similarity and recognition literatures. Most attempts to link the two have used a distancebased similarity measure to predict the confusions in recognition ex
Blind PARAFAC receivers for DSCDMA systems
 IEEE TRANS. SIGNAL PROCESSING
, 2000
"... This paper links the directsequence codedivision multiple access (DSCDMA) multiuser separationequalizationdetection problem to the parallel factor (PARAFAC) model, which is an analysis tool rooted in psychometrics and chemometrics. Exploiting this link, it derives a deterministic blind PARAFAC ..."
Abstract

Cited by 71 (14 self)
 Add to MetaCart
This paper links the directsequence codedivision multiple access (DSCDMA) multiuser separationequalizationdetection problem to the parallel factor (PARAFAC) model, which is an analysis tool rooted in psychometrics and chemometrics. Exploiting this link, it derives a deterministic blind PARAFAC DSCDMA receiver with performance close to nonblind minimum meansquared error (MMSE). The proposed PARAFAC receiver capitalizes on code, spatial, and temporal diversitycombining, thereby supporting small sample sizes, more users than sensors, and/or less spreading than users. Interestingly, PARAFAC does not require knowledge of spreading codes, the specifics of multipath (interchip interference), DOAcalibration information, finite alphabet/constant modulus, or statistical independence/whiteness to recover the informationbearing signals. Instead, PARAFAC relies on a fundamental result regarding the uniqueness of lowrank threeway array decomposition due to Kruskal (and generalized herein to the complexvalued case) that guarantees identifiability of all relevant signals and propagation parameters. These and other issues are also demonstrated in pertinent simulation experiments.
Beyond streams and graphs: Dynamic tensor analysis
 In KDD
, 2006
"... How do we find patterns in authorkeyword associations, evolving over time? Or in DataCubes, with productbranchcustomer sales information? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule ..."
Abstract

Cited by 70 (11 self)
 Add to MetaCart
How do we find patterns in authorkeyword associations, evolving over time? Or in DataCubes, with productbranchcustomer sales information? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks and many more. However, they have only two orders, like author and keyword, in the above example. We propose to envision such higher order data as tensors, and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semiinfinite streams. Thus, we introduce the dynamic tensor analysis (DTA) method, and its variants. DTA provides a compact summary for highorder and highdimensional data, and it also reveals the hidden correlations. Algorithmically, we designed DTA very carefully so that it is (a) scalable, (b) space efficient (it does not need to store the past) and (c) fully automatic with no need for user defined parameters. Moreover, we propose STA, a streaming tensor analysis method, which provides a fast, streaming approximation to DTA. We implemented all our methods, and applied them in two real settings, namely, anomaly detection and multiway latent semantic indexing. We used two real, large datasets, one on network flow data (100GB over 1 month) and one from DBLP (200MB over 25 years). Our experiments show that our methods are fast, accurate and that they find interesting patterns and outliers on the real datasets. 1.
ON THE DANGERS OF AVERAGING ACROSS SUBJECTS WHEN USING MULTIDIMENSIONAL SCALING OR THE SIMILARITYCHOICE MODEL
 PSYCHOLOGICAL SCIENCE
, 1994
"... When ratings of judged similarity or frequencies of stimulus identification are averaged across subjects, the psychological structure ofthe data is fundamentally changed. Regardless of the structure of the individualsubject data, the averaged similarity data will likely be well fit by a standard mu ..."
Abstract

Cited by 62 (31 self)
 Add to MetaCart
When ratings of judged similarity or frequencies of stimulus identification are averaged across subjects, the psychological structure ofthe data is fundamentally changed. Regardless of the structure of the individualsubject data, the averaged similarity data will likely be well fit by a standard multidimensional scaling model, and the averaged identification data will likely be well fit by the similaritychoice model. In fact, both models often provide excellent fits to averaged data, even if they fail to fit the data of each individual subject. Thus, a good fit of either model to averaged data cannot be taken as evidence that the model describes the psychological structure that characterizes individual subjects. We hypothesize that these effects are due to the increased symmetry that is a mathematical consequence of the averaging operation. It is common practice to average across subjects when analyzing
Symmetric tensors and symmetric tensor rank
 Scientific Computing and Computational Mathematics (SCCM
, 2006
"... Abstract. A symmetric tensor is a higher order generalization of a symmetric matrix. In this paper, we study various properties of symmetric tensors in relation to a decomposition into a symmetric sum of outer product of vectors. A rank1 orderk tensor is the outer product of k nonzero vectors. An ..."
Abstract

Cited by 46 (19 self)
 Add to MetaCart
Abstract. A symmetric tensor is a higher order generalization of a symmetric matrix. In this paper, we study various properties of symmetric tensors in relation to a decomposition into a symmetric sum of outer product of vectors. A rank1 orderk tensor is the outer product of k nonzero vectors. Any symmetric tensor can be decomposed into a linear combination of rank1 tensors, each of them being symmetric or not. The rank of a symmetric tensor is the minimal number of rank1 tensors that is necessary to reconstruct it. The symmetric rank is obtained when the constituting rank1 tensors are imposed to be themselves symmetric. It is shown that rank and symmetric rank are equal in a number of cases, and that they always exist in an algebraically closed field. We will discuss the notion of the generic symmetric rank, which, due to the work of Alexander and Hirschowitz, is now known for any values of dimension and order. We will also show that the set of symmetric tensors of symmetric rank at most r is not closed, unless r = 1. Key words. Tensors, multiway arrays, outer product decomposition, symmetric outer product decomposition, candecomp, parafac, tensor rank, symmetric rank, symmetric tensor rank, generic symmetric rank, maximal symmetric rank, quantics AMS subject classifications. 15A03, 15A21, 15A72, 15A69, 15A18 1. Introduction. We