#### DMCA

## Indexing by latent semantic analysis (1990)

### Cached

### Download Links

- [www.cs.bham.ac.uk]
- [ssrc.cse.ucsc.edu]
- [www.ise.gmu.edu]
- [lyle.smu.edu]
- [ssrc.cse.ucsc.edu]
- [www.ssrc.ucsc.edu]
- [lyle.smu.edu]
- [www.si.umich.edu]
- [furnas.people.si.umich.edu]
- [www.stat.cmu.edu]
- [www.stat.cmu.edu]
- [www.kibazen.com]
- [www.aifb.uni-karlsruhe.de]
- [www.cis.temple.edu]
- [www.scils.rutgers.edu]
- [knight.cis.temple.edu]
- [nats-www.informatik.uni-hamburg.de]
- [www.indiana.edu]
- [parnec.nuaa.edu.cn]
- [wortschatz.uni-leipzig.de]
- [nlp.cs.swarthmore.edu]
- [www.albany.edu]
- [cis-linux1.temple.edu]
- [knight.cis.temple.edu]
- [ftp-db.deis.unibo.it]
- [www.si.umich.edu]
- [superbook.bellcore.com]
- [twiki.di.uniroma1.it]
- [lsa.colorado.edu]
- [rakaposhi.eas.asu.edu]
- [lsa3.colorado.edu]
- [mainline.brynmawr.edu]
- [lsi.argreenhouse.com]
- [lsi.research.telcordia.com]
- [research.microsoft.com]
- [www.iro.umontreal.ca]
- [toetswijzer.kennisnet.nl]
- CiteULike
- DBLP

### Other Repositories/Bibliography

Venue: | JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE |

Citations: | 3704 - 35 self |

### Citations

3943 |
Introduction to Modern Information Retrieval
- Salton, McGill
- 1985
(Show Context)
Citation Context ...ally only n parameters for n objects). Empirically, clustering improves the computational efficiency of search; whether or not it improves retrieval success is unclear (Jardin & van Rijsbergen, 1971; =-=Salton & McGill, 1983-=-; Voorhees, 1985). JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-September 1990 393Previously tried factor analytic approaches have taken a square symmetric matrix of similarities between p... |

588 |
Analysis of individual differences in multidimensional scaling via an N-way generalization of ¸SEckart-Young ˇT
- Carroll, Chang
- 1970
(Show Context)
Citation Context ...terms and documents would appear as points in a single space with similarity related monotonically to Euclidean distance. Another is two-mode factor analysis (Harshman, 1970; Harshman & Lundy, 1984a; =-=Carroll & Chang, 1970-=-; Kruskal, 1978), in which terms and documents would again be represented as points in a space, but similarity is given by the inner product between points. A final candidate is unfolding in trees (Fu... |

565 | A statistical interpretation of term specificity and its application in retrieval
- Jones
- 1972
(Show Context)
Citation Context ...dditional search terms. The drawback for fully automatic methods is that some added terms may have different meaning from that intended (the polysemy effect) leading to rapid degradation of precision =-=[6]-=- . It is worth noting in passing that experiments with small interactive data bases have shown monotonic improvements in recall rate without overall loss of precision as more indexing terms, either ta... |

549 | The Vocabulary Problem in Human-System Communication - Furnas, K, et al. - 1987 |

539 |
Foundations of the PARAFAC procedure: Models and conditions for an \explanatory" multi-modal factor analysis. UCLA working papers in phonetics 16
- Harshman
- 1970
(Show Context)
Citation Context ... Desarbo & Carroll. 1985), in which both terms and documents would appear as points in a single space with similarity related monotonically to Euclidean distance. Another is two-mode factor analysis (=-=Harshman, 1970-=-; Harshman & Lundy, 1984a; Carroll & Chang, 1970; Kruskal, 1978), in which terms and documents would again be represented as points in a space, but similarity is given by the inner product between poi... |

305 |
Automatic Information Organization and Retrieval
- Salton
- 1968
(Show Context)
Citation Context ...ering latent proximity structure has at least two lines of precedence in the literature. Hierarchical classification analyses are frequently used for term and document clustering (Sparck Jones, 1971; =-=Salton, 1968-=-; Jardin & van Rijsbergen, 1971). Latent class analysis (Baker, 1962) and factor analysis (Atherton & Borko, 1965; Borko & Bemick, 1963; Ossorio, 1966) have also been explored before for automatic doc... |

234 |
A theory of data
- Coombs
- 1964
(Show Context)
Citation Context ...roximity methods (Carroll and Arabie, 1980), that start with a rectangular matrix and construct explicit representations of both row and column objects. One such method is multidimensional unfolding (=-=Coombs, 1964-=-; Heiser, 1981; Desarbo & Carroll. 1985), in which both terms and documents would appear as points in a single space with similarity related monotonically to Euclidean distance. Another is two-mode fa... |

158 |
The use of hierarchical clustering in information retrieval
- Jardine, Rijsbergen
- 1971
(Show Context)
Citation Context ...l by discovering latent proximity structure has at least two lines of precedence in the literature. Hierarchical classification analyses are frequently used for term and document clustering [11] [12] =-=[13]-=- . Latent class analysis [14] and factor analysis [15] [16] [17] have also been explored before for automatic document indexing and retrieval. In document clustering, for example, a notion of distance... |

143 |
Automatic keyword classification for information retrieval. Butterworths
- Jones, K
- 1971
(Show Context)
Citation Context ...n retrieval by discovering latent proximity structure has at least two lines of precedence in the literature. Hierarchical classification analyses are frequently used for term and document clustering =-=[11]-=- [12] [13] . Latent class analysis [14] and factor analysis [15] [16] [17] have also been explored before for automatic document indexing and retrieval. In document clustering, for example, a notion o... |

140 |
A Theoretical Basis for the Use of Co-occurrence Data in Information Retrieval
- Rijsbergen
- 1977
(Show Context)
Citation Context ...to do with the way in which current automatic indexing and and retrieval systems actually work. In such systems each word type is treated as independent of any other (see, for example, van Rijsbergen =-=[9]-=- ). Thus matching (or not) both of two terms that almost always occur together is counted as heavily as matching two that are rarely found in the same document. Thus the scoring of success, in either ... |

112 |
A critical analysis of vector space model for information retrieval
- Raghavan, Wong
- 1986
(Show Context)
Citation Context ...ation have been arbitrary and after the fact. An exception to this is a proposal by Koll [20] in which both terms and documents are represented in the same space of concepts (see also Raghavan & Wong =-=[21]-=- ). While Koll’s approach is quite close in spirit to the one we propose, his concept space was of very low dimensionality (only seven underlying dimensions), and the dimensions were hand-chosen and n... |

103 | Subject access in online catalogs: a design model
- Bates
- 1986
(Show Context)
Citation Context ...en reported in studies of interindexer consistency (Tarr & Borko, 1974) and in the generation of search terms by either expert intermediaries (Fidel, 1985) or less experienced searchers (Liley, 1954; =-=Bates, 1986-=-). The prevalence of synonyms tends to decrease the “recall” performance of retrieval systems. By polysemy we refer to the general fact that most words have more than one distinct meaning (homography)... |

82 |
The cluster hypothesis revisited
- Voorhees
- 1985
(Show Context)
Citation Context ...for n objects). Empirically, clustering improves the computational efficiency of search; whether or not it improves retrieval success is unclear (Jardin & van Rijsbergen, 1971; Salton & McGill, 1983; =-=Voorhees, 1985-=-). JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-September 1990 393Previously tried factor analytic approaches have taken a square symmetric matrix of similarities between pairs of document... |

73 |
Statistical semantics: Analysis of the potential performance of keyword information systems
- Furnas, Landauer, et al.
- 1983
(Show Context)
Citation Context ...ree of variability in descriptive term usage is much greater than is commonly suspected. For example, two people choose the same main key word for a single well-known object less than 20% of the time =-=[1]-=- . Comparably poor agreement has been reported in studies of inter-indexer consistency [2] and in the generation of search terms by either expert intermediaries [3] or less experienced searchers [4] [... |

65 |
Multidimensional Scaling
- CARROLL, ARABIE
- 1980
(Show Context)
Citation Context ... have positions in the structure. Then a query can be placed at the centroid of its term points. Thus for both elegance and retrieval mechanisms, we needed what are called two-mode proximity methods (=-=Carroll and Arabie, 1980-=-), that start with a rectangular matrix and construct explicit representations of both row and column objects. One such method is multidimensional unfolding (Coombs, 1964; Heiser, 1981; Desarbo & Carr... |

62 |
Disambiguation by short contexts
- Choueka, Lusignan
- 1985
(Show Context)
Citation Context ...r term has several distinct meanings and to subcategorize it and place it in several points in the space. We have not yet found a satisfactory way to do that (but see Amsler [36] ; Choueka & Lusignan =-=[37]-=- ; Lesk [38] ). The latent semantic indexing methods that we have discussed, and in particular the singularvalue decomposition technique that we have tested, are capable of improving the way in which ... |

60 |
The PARAFAC model for threeway factor analysis and multidimensional scaling, in Research Methods for Multimode Data Analysis
- Harshman, Lundy
- 1984
(Show Context)
Citation Context ...ll. 1985), in which both terms and documents would appear as points in a single space with similarity related monotonically to Euclidean distance. Another is two-mode factor analysis (Harshman, 1970; =-=Harshman & Lundy, 1984-=-a; Carroll & Chang, 1970; Kruskal, 1978), in which terms and documents would again be represented as points in a space, but similarity is given by the inner product between points. A final candidate i... |

49 | Relevance assessments and retrieval system evaluation - Lesk, Salton - 1968 |

40 |
Automatic document classification
- Borko, Bernick
- 1963
(Show Context)
Citation Context ...re frequently used for term and document clustering (Sparck Jones, 1971; Salton, 1968; Jardin & van Rijsbergen, 1971). Latent class analysis (Baker, 1962) and factor analysis (Atherton & Borko, 1965; =-=Borko & Bemick, 1963-=-; Ossorio, 1966) have also been explored before for automatic document indexing and retrieval. In document clustering, for example, a notion of distance is defined such that two documents are consider... |

38 |
Data preprocessing and the extended Parafac model, in Research Methods for Multimode Data Analysis
- Harshman, Lundy
- 1984
(Show Context)
Citation Context ...ll. 1985), in which both terms and documents would appear as points in a single space with similarity related monotonically to Euclidean distance. Another is two-mode factor analysis (Harshman, 1970; =-=Harshman & Lundy, 1984-=-a; Carroll & Chang, 1970; Kruskal, 1978), in which terms and documents would again be represented as points in a space, but similarity is given by the inner product between points. A final candidate i... |

37 |
A block Lanczos method for computing the singular values and corresponding singular vectors of a matrix
- Golub, Luk, et al.
- 1981
(Show Context)
Citation Context ...31] ), a program for the iterative numerical solution of multi-mode factor-analysis problems, was used for the studies reported below. (Other programs for more standard SVD are also available - e.g., =-=[33]-=- [34] .) "Documents" consist of the full text of the title and abstract. Each document is indexed automatically; all terms occurring in more than one document and not on a stop list of 439 common word... |

31 |
Experience with an adaptive indexing scheme
- Furnas
- 1985
(Show Context)
Citation Context ...s have shown monotonic improvements in recall rate without overall loss of precision as more indexing terms, either taken from the documents or from large samples of actual users’ words are added [7] =-=[8]-=- . Whether this "unlimited aliasing" method, which we have described elsewhere, will be effective in very large data bases remains to be determined. Not only is there a potential issue of ambiguity an... |

19 |
Machine-readable dictionaries
- Amsler
- 1984
(Show Context)
Citation Context ...er - 22 - that a particular term has several distinct meanings and to subcategorize it and place it in several points in the space. We have not yet found a satisfactory way to do that (but see Amsler =-=[36]-=- ; Choueka & Lusignan [37] ; Lesk [38] ). The latent semantic indexing methods that we have discussed, and in particular the singularvalue decomposition technique that we have tested, are capable of i... |

14 |
Unfolding Analysis of Proximity Data
- Heiser
- 1981
(Show Context)
Citation Context ...ds (Carroll and Arabie, 1980), that start with a rectangular matrix and construct explicit representations of both row and column objects. One such method is multidimensional unfolding (Coombs, 1964; =-=Heiser, 1981-=-; Desarbo & Carroll. 1985), in which both terms and documents would appear as points in a single space with similarity related monotonically to Euclidean distance. Another is two-mode factor analysis ... |

12 |
Classification Space - A Multivariate Procedure for Automatic Document Indexing and Retrieval
- Ossorio
- 1965
(Show Context)
Citation Context ... term and document clustering (Sparck Jones, 1971; Salton, 1968; Jardin & van Rijsbergen, 1971). Latent class analysis (Baker, 1962) and factor analysis (Atherton & Borko, 1965; Borko & Bemick, 1963; =-=Ossorio, 1966-=-) have also been explored before for automatic document indexing and retrieval. In document clustering, for example, a notion of distance is defined such that two documents are considered close to the... |

11 |
A Lanczos algorithm for computing singular values and vectors of large matrices
- Cullum, Willoughby, et al.
- 1983
(Show Context)
Citation Context ...ative numerical solution of multi-mode factor-analysis problems, was used for the studies reported below. (Other programs for more standard SVD are also available-e.g., Golub, Luk, and Overton, 1981; =-=Cullum, Willoughby, and Lake, 1983-=-.) “Documents” consist of the full text of the title and abstract. Each document is indexed automatically; all terms occurring in more than one document and not on a stop list of 439 common words used... |

10 |
Factors influencing inter-indexer consistency
- Tarr, Borko
- 1974
(Show Context)
Citation Context ...e main key word for a single well-known object less than 20% of the time (Furnas, Landauer, Gomez, & Dumais, 1987). Comparably poor agreement has been reported in studies of interindexer consistency (=-=Tarr & Borko, 1974-=-) and in the generation of search terms by either expert intermediaries (Fidel, 1985) or less experienced searchers (Liley, 1954; Bates, 1986). The prevalence of synonyms tends to decrease the “recall... |

10 |
People can retrieve more objects with enriched key-word vocabularies. But is there a human per- formance cost
- M, Lochbaum
(Show Context)
Citation Context ...bases have shown monotonic improvements in recall rate without overall loss of precision as more indexing terms, either taken from the documents or from large samples of actual users’ words are added =-=[7]-=- [8] . Whether this "unlimited aliasing" method, which we have described elsewhere, will be effective in very large data bases remains to be determined. Not only is there a potential issue of ambiguit... |

10 |
Three-way metric unfolding via alternating weighted least squares. Psychometrika
- DeSarbo, Carroll
- 1985
(Show Context)
Citation Context ...methods (Carroll and Arabie [10] ), that start with a rectangular matrix and construct explicit representations of both row and column objects. One such method is multidimensional unfolding [22] [23] =-=[24]-=- , in which both terms and documents would appear as points in a single space with similarity related monotonically to Euclidean distance. Another is two-mode factor analysis [25] [26] [27] [28] , in ... |

8 |
Individual variability in online searching behavior
- Fidel
- 1985
(Show Context)
Citation Context ...wn object less than 20% of the time [1] . Comparably poor agreement has been reported in studies of inter-indexer consistency [2] and in the generation of search terms by either expert intermediaries =-=[3]-=- or less experienced searchers [4] [5] . The prevalence of synonyms tends to decrease the "recall" performance of retrieval systems. By polysemy we refer to the general fact that most words have more ... |

8 |
WEIRD: An approach to concept-based information retrieval
- Koll
- 1979
(Show Context)
Citation Context ...., either term clustering or document clustering). Any attempts to put the ignored entity back in the representation have been arbitrary and after the fact. An exception to this is a proposal by Koll =-=[20]-=- in which both terms and documents are represented in the same space of concepts (see also Raghavan & Wong [21] ). While Koll’s approach is quite close in spirit to the one we propose, his concept spa... |

7 |
Computer Methods for Mathematical Computations (Chapter 9: Least squares and the singular value decomposition). Englewood Cliffs, NJ
- Forsythe, Malcolm, et al.
- 1977
(Show Context)
Citation Context ...nfolding, too computationally expensive. Two-mode factor analysis is a generalization of the familiar factor analytic model based on singular value decomposition (SVD). (See Forsythe, Malcolm & Moler =-=[30]-=- , Chapter 9, for an introduction to SVD and its applications.) SVD represents both terms and documents as vectors in a space of choosable dimensionality, and the dot product or cosine between points ... |

5 |
A Test of the Factor-Analytically Derived Automated Classification Method Applied to Descriptions of Work and
- Atherton, Borko
- 1965
(Show Context)
Citation Context ...lassification analyses are frequently used for term and document clustering (Sparck Jones, 1971; Salton, 1968; Jardin & van Rijsbergen, 1971). Latent class analysis (Baker, 1962) and factor analysis (=-=Atherton & Borko, 1965-=-; Borko & Bemick, 1963; Ossorio, 1966) have also been explored before for automatic document indexing and retrieval. In document clustering, for example, a notion of distance is defined such that two ... |

4 |
Information retrieval based on latent class analysis
- Baker
(Show Context)
Citation Context ... in the literature. Hierarchical classification analyses are frequently used for term and document clustering (Sparck Jones, 1971; Salton, 1968; Jardin & van Rijsbergen, 1971). Latent class analysis (=-=Baker, 1962-=-) and factor analysis (Atherton & Borko, 1965; Borko & Bemick, 1963; Ossorio, 1966) have also been explored before for automatic document indexing and retrieval. In document clustering, for example, a... |

4 | Pictures of relevance - Jones, Furnas - 1987 |

3 |
Evaluation of the subject catalog
- Liley
(Show Context)
Citation Context ...eement has been reported in studies of interindexer consistency (Tarr & Borko, 1974) and in the generation of search terms by either expert intermediaries (Fidel, 1985) or less experienced searchers (=-=Liley, 1954-=-; Bates, 1986). The prevalence of synonyms tends to decrease the “recall” performance of retrieval systems. By polysemy we refer to the general fact that most words have more than one distinct meaning... |

3 |
Objects and Their Features: The Metric Representation of Two Class Data, " unpublished doctoral dissertation
- Furnas
- 1980
(Show Context)
Citation Context ...25] [26] [27] [28] , in which terms and documents would again be represented as points in a space, but similarity is given by the inner product between points. A final candidate is unfolding in trees =-=[29]-=- , in which both terms and documents would appear as leaves on a tree, and path length distance through the tree would give the similarity (one version of this is equivalent to simultaneous hierarchic... |

2 |
How to tell a pine cone from an ice cream cone
- Lesk
- 1986
(Show Context)
Citation Context ...everal distinct meanings and to subcategorize it and place it in several points in the space. We have not yet found a satisfactory way to do that (but see Amsler [36] ; Choueka & Lusignan [37] ; Lesk =-=[38]-=- ). The latent semantic indexing methods that we have discussed, and in particular the singularvalue decomposition technique that we have tested, are capable of improving the way in which we deal with... |

1 |
Factor analysis and principal components: Bilinear methods
- Kruskal
- 1978
(Show Context)
Citation Context ...ld appear as points in a single space with similarity related monotonically to Euclidean distance. Another is two-mode factor analysis (Harshman, 1970; Harshman & Lundy, 1984a; Carroll & Chang, 1970; =-=Kruskal, 1978-=-), in which terms and documents would again be represented as points in a space, but similarity is given by the inner product between points. A final candidate is unfolding in trees (Fumas, 1980), in ... |

1 |
two-mode factor analysis
- or
(Show Context)
Citation Context ...e [1] . Comparably poor agreement has been reported in studies of inter-indexer consistency [2] and in the generation of search terms by either expert intermediaries [3] or less experienced searchers =-=[4]-=- [5] . The prevalence of synonyms tends to decrease the "recall" performance of retrieval systems. By polysemy we refer to the general fact that most words have more than one distinct meaning ... |

1 | Individual variability in online searching behavior - unknown authors - 1985 |

1 | Objects and their features: The metric representation of two-class data - unknown authors - 1980 |

1 |
Experience with au adaptive indexing scheme
- Fumas
- 1985
(Show Context)
Citation Context ...in recall rate without overall loss of precision as more indexing terms, either taken from the documents or from large samples of actual users’ words are added (Gomez, Lochbaum, & Landauer, in press; =-=Fumas, 1985-=-). Whether this “unlimited aliasing” method, which we have described elsewhere, will be effective in very large data bases remains to be determined. Not only is there a potential issue of ambiguity an... |