Dimensions of Meaning (1992) [102 citations — 3 self]
http://www.parc.xerox.com/istl/groups/qca/papers/S
http://www2.parc.com/istl/groups/qca/papers/Schuet
DBLP
CACHED:
Abstract:
The representation of documents and queries as vectors in a high-dimensional space is well-established in information retrieval [1]. This paper proposes to represent the semantics of words and contexts in a text as vectors. The dimensions of the space are words and the initial vectors are determined by the words occurring close to the entity to be represented which implies that the space has several thousand dimensions (words). This makes the vector representations (which are dense) too cumbersome to use directly. Therefore, dimensionality reduction by means of a singular value decomposition is employed. The paper analyzes the structure of the vector representations and applies them to word sense disambiguation and thesaurus induction.
Citations
| 2329 | Introduction to modern information retrieval – Salton - 1983 |
| 1636 | Indexing by latent semantic analysis – Deerwester, Dumais, et al. - 1990 |
| 970 | Principal Component Analysis – Jolliffe - 1986 |
| 464 | Word association norms, mutual information, and lexicography – CHURCH, HANKS - 1989 |
| 430 | Scatter/gather: a cluster-based approach to browsing large document collections – Cutting, Karger, et al. - 1992 |
| 351 | Building Large Knowledge-Based Systems – Lenat, Guha - 1990 |
| 228 | Word sense disambiguation using statistical models of Roget's categories trained on large corpora – Yarowsky - 1992 |
| 210 | AutoClass: a Bayesian classification system – Cheeseman, Kelly, et al. - 1988 |
| 146 | Word-sense disambiguation using statistical methods – Brown, Pietra, et al. - 1991 |
| 63 | Methods for Statistical Data Analysis of Multivariate Observations – Gnanadesikan - 1977 |
| 57 | Using Bilingual Materials to Develop Word Sense Disambiguation Methods – Gale, Church, et al. - 1992 |

