Results 1 - 10
of
14
Indexing by latent semantic analysis
- JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
, 1990
"... A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The p ..."
Abstract
-
Cited by 2168 (30 self)
- Add to MetaCart
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 or-thogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are re-turned. initial tests find this completely automatic method for retrieval to be promising.
Tensor Decompositions and Applications
- SIAM REVIEW
, 2009
"... This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or N -way array. Decompositions of higher-order tensors (i.e., N -way arrays with N ⥠3) have applications in psychometrics, chemometrics, signal proce ..."
Abstract
-
Cited by 95 (13 self)
- Add to MetaCart
This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or N -way array. Decompositions of higher-order tensors (i.e., N -way arrays with N ⥠3) have applications in psychometrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, etc. Two particular tensor decompositions can be considered to be higher-order extensions of the matrix singular value decompo-
sition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rank-one tensors, and the Tucker decomposition is a higher-order form of principal components analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The N-way Toolbox and Tensor Toolbox, both for MATLAB, and the Multilinear Engine are examples of software packages for working with tensors.
Coclustering of Human Cancer Microarrays Using Minimum Sum-Squared Residue
"... Abstract—It is a consensus in microarray analysis that identifying potential local patterns, characterized by coherent groups of genes and conditions, may shed light on the discovery of previously undetectable biological cellular processes of genes, as well as macroscopic phenotypes of related sampl ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Abstract—It is a consensus in microarray analysis that identifying potential local patterns, characterized by coherent groups of genes and conditions, may shed light on the discovery of previously undetectable biological cellular processes of genes, as well as macroscopic phenotypes of related samples. In order to simultaneously cluster genes and conditions, we have previously developed a fast coclustering algorithm, Minimum Sum-Squared Residue Coclustering (MSSRCC), which employs an alternating minimization scheme and generates what we call coclusters in a “checkerboard ” structure. In this paper, we propose specific strategies that enable MSSRCC to escape poor local minima and resolve the degeneracy problem in partitional clustering algorithms. The strategies include binormalization, deterministic spectral initialization, and incremental local search. We assess the effects of various strategies on both synthetic gene expression data sets and real human cancer microarrays and provide empirical evidence that MSSRCC with the proposed strategies performs better than existing coclustering and clustering algorithms. In particular, the combination of all the three strategies leads to the best performance. Furthermore, we illustrate coherence of the resulting coclusters in a checkerboard structure, where genes in a cocluster manifest the phenotype structure of corresponding specific samples and evaluate the enrichment of functional annotations in Gene Ontology (GO). Index Terms—Microarray analysis, coclustering, binormalization, deterministic spectral initialization, local search, gene ontology. 1
Nonnegative approximations of nonnegative tensors
- Jour. Chemometrics
, 2009
"... Abstract. We study the decomposition of a nonnegative tensor into a minimal sum of outer product of nonnegative vectors and the associated parsimonious naïve Bayes probabilistic model. We show that the corresponding approximation problem, which is central to nonnegative parafac, will always have opt ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Abstract. We study the decomposition of a nonnegative tensor into a minimal sum of outer product of nonnegative vectors and the associated parsimonious naïve Bayes probabilistic model. We show that the corresponding approximation problem, which is central to nonnegative parafac, will always have optimal solutions. The result holds for any choice of norms and, under a mild assumption, even Brègman divergences. hal-00410056, version 1- 16 Aug 2009 1. Dedication This article is dedicated to the memory of our late colleague Richard Allan Harshman. It is loosely organized around two of Harshman’s best known works — parafac [19] and lsi [13], and answers two questions that he posed. We target this article to a technometrics readership. In Section 4, we discussed a few aspects of nonnegative tensor factorization and Hofmann’s plsi, a variant of the lsi model co-proposed by Harshman [13]. In Section 5, we answered a question of Harshman on why the apparently unrelated construction of Bini, Capovani, Lotti, and Romani in [1] should be regarded as the first example of what he called ‘parafac degeneracy ’ [27]. Finally in Section 6, we showed that such parafac degeneracy will not happen for nonnegative approximations of nonnegative tensors, answering another question of his. 2.
Toward Large-Scale Information Retrieval Using Latent Semantic Indexing
- Department of Computer Science, University of Tennessee
, 1996
"... As the amount of electronic information increases, traditional lexical (or Boolean) information retrieval techniques will become less useful. Large, heterogeneous collections will be difficult to search since the sheer volume of unranked documents returned in response to a query will overwhelm the u ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
As the amount of electronic information increases, traditional lexical (or Boolean) information retrieval techniques will become less useful. Large, heterogeneous collections will be difficult to search since the sheer volume of unranked documents returned in response to a query will overwhelm the user. Vector-space approaches to information retrieval, on the other hand, allow the user to search for concepts rather than specific words and rank the results of the search according to their relative similarity to the query. One vector-space approach, Latent Semantic Indexing (LSI), has achieved up to 30% better retrieval performance than lexical searching techniques by employing a reduced-rank model of the term-document space. However, the original implementation of LSI lacked the execution efficiency required to make LSI useful for large data sets. A new implementation of LSI, LSI++, seeks to make LSI efficient, extensible, portable, and maintainable. The LSI++ Application Programming In...
Generic and typical ranks of threeway arrays
- Research Report ISRN I3S/RR-2006-29-FR, I3S, Sophia-Antipolis
"... The concept of tensor rank, introduced in the twenties, has been popularized at the beginning of the seventies. This has allowed to carry out Factor Analysis on arrays with more than two indices. The generic rank may be seen as an upper bound to the number of factors that can be extracted from a giv ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
The concept of tensor rank, introduced in the twenties, has been popularized at the beginning of the seventies. This has allowed to carry out Factor Analysis on arrays with more than two indices. The generic rank may be seen as an upper bound to the number of factors that can be extracted from a given tensor. We explain in this short paper how to obtain numerically the generic rank of tensors of arbitrary dimensions, and compare it with the rare algebraic results already known at order three. In particular, we examine the cases of symmetric tensors, tensors with symmetric matrix slices, or tensors with free entries. Résumé La notion de rang tensoriel, proposée dans les années vingt, a été popularisée au début des années soixante-dix. Ceci a permis de mettre en oeuvre l’Analyse de Facteurs sur des tableaux de données comportant plus de deux indices. Le rang générique peut être vu comme une borne supérieure sur le nombre de facteurs pouvant être extraits d’un tenseur donné. Nous expliquons dans ce court article comment trouver numériquement le rang générique d’un tenseur de dimensions arbitraires, et le comparons aux quelques rares résultats algébriques déjà connus à l’ordre trois. Nous examinons notamment les cas des tenseurs symétriques, des tenseurs à tranches matricielles symétriques, ou des tenseurs à éléments libres.
Experiments on Routing, Filtering and Chinese Text Retrieval in TREC-5
- In Text REtrieval Conference(TREC-5
, 1996
"... We describes our experiments in the routing, filtering and Chinese text retrieval. We based our routing and filtering experiments on our discriminant project algorithm. The algorithm sequentially constructs a series of orthogonal axis from the training documents using the GramSchmidt procedure. It t ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We describes our experiments in the routing, filtering and Chinese text retrieval. We based our routing and filtering experiments on our discriminant project algorithm. The algorithm sequentially constructs a series of orthogonal axis from the training documents using the GramSchmidt procedure. It then rotates the resulting subspace using principal component analysis so that the axis are ordered by their importance. For Chinese text retrieval, we experimented both with an automatic method and a manual method. For the automatic method, we use all phrases in the description field and compute the aggregate scores using the simple tf:idf formula. We then manually construct boolean phrase queries which are thought to improve the results. 1 Introduction This paper describes the routing, filtering and Chinese text retrieval experiments performed by the Information Technology Institute for TREC-5. We based our routing and filtering experiments on our discriminant project algorithm as described...
Document Routing by Discriminant Projection : TREC-4
- In D. K. Harman, The 4th Text REtrieval Conference (TREC-4). Gaithersburg, MD: NIST SP
, 1996
"... We present document routing as a standard problem in discriminant analysis. The standard solution involves the inversion of a large matrix whose dimension is the number of indexed terms. Typically, the solution does not exist because the number of training documents are much smaller compared to the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present document routing as a standard problem in discriminant analysis. The standard solution involves the inversion of a large matrix whose dimension is the number of indexed terms. Typically, the solution does not exist because the number of training documents are much smaller compared to the number of terms. We show that one can project this raw document space into a lower dimensional space where solution is possible. Our projection algorithm exploits the characterisitics of the empty space, using only the training documents for efficient coding of the relevance information. Its complexity is linear with respect to the number of terms, and second order with respect to the number of training documents. We can therefore fully exploit the power of discriminant analysis without imposing severe computational and storage constraints. 1 Introduction This paper describes the experiments performed by the Information Technology Institute for TREC-4. We are participating in TREC for the fi...
The Hungarian palatal stop The Hungarian palatal stop: Phonological considerations and phonetic data
"... This study examines the movement trajectories of the dorsal tongue movements during symmetrical /VCa /-sequences, where /V / was one of the Hungarian long or short vowels /i,a,u / and C either the voiceless palatal or velar stop consonants. General aims of this study were to deliver a data-driven ac ..."
Abstract
- Add to MetaCart
This study examines the movement trajectories of the dorsal tongue movements during symmetrical /VCa /-sequences, where /V / was one of the Hungarian long or short vowels /i,a,u / and C either the voiceless palatal or velar stop consonants. General aims of this study were to deliver a data-driven account for (a) the evidence of the division between dorsality and coronality and (b) for the potential role coarticulatory factors could play for the relative frequency of velar palatalization processes in genetically unrelated languages. Results suggest a clear-cut demarcation between the behaviour of purely dorsal velars and the coronal palatals. Morevover, factors arising from a general movement economy might contribute to the palatalization processes mentioned. 1
Indexing by Latent Semantic Analysis
- Journal of the American Society for Information Science
, 2001
"... A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents ("semantic structure") in order to improve the detection of relevant documents on the basis of terms found in queries. The p ..."
Abstract
- Add to MetaCart
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents ("semantic structure") in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. Initial tests find this completely automatic method for retrieval to be promising. Deerwester - 1 - 1.

