Results 1 -
4 of
4
Detecting Patterns in the LSI Term-Term Matrix
- In Proceedings ICDM’02 Workshop on Foundations of Data Mining and Discovery
, 2002
"... applications use techniques that explicitly or implicitly employ a limited degree of transitivity in the co-occurrence relation. In this work we show use of higher orders of co-occurrence in the Singular Value Decomposition (SVD) algorithm and, by inference, on the systems that rely on SVD, such as ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
applications use techniques that explicitly or implicitly employ a limited degree of transitivity in the co-occurrence relation. In this work we show use of higher orders of co-occurrence in the Singular Value Decomposition (SVD) algorithm and, by inference, on the systems that rely on SVD, such as LSI. Our empirical and mathematical studies prove that term cooccurrence plays a crucial role in LSI.
A Mathematical View of Latent Semantic Indexing: Tracing Term Co-Occurrences
, 2002
"... Current research in Latent Semantic Indexing (LSI) shows improvements in performance for a wide variety of information retrieval systems. We propose the development of a theoretical foundation for understanding the values produced in the reduced form of the term-term matrix. We assert that LSI's use ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Current research in Latent Semantic Indexing (LSI) shows improvements in performance for a wide variety of information retrieval systems. We propose the development of a theoretical foundation for understanding the values produced in the reduced form of the term-term matrix. We assert that LSI's use of higher orders of co-occurrence is a critical component of this study. In this work we present experiments that precisely determine the degree of co-occurrence used in LSI. We empirically demonstrate that LSI uses up to fifth order term co-occurrence. We also prove mathematically that a connectivity path exists for every nonzero element in the truncated term-term matrix computed by LSI. A complete understanding of this term transitivity is key to understanding LSI.
Improving Retrieval Performance with Positive and Negative Equivalence Classes of Terms
, 2002
"... One of the most pressing problems facing application developers in the area of information retrieval (IR) is the lack of sound mathematical, theoretical frameworks for understanding IR [SIGIR2000]. Although many such frameworks have been proposed, in the final analysis none has been sufficiently wel ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
One of the most pressing problems facing application developers in the area of information retrieval (IR) is the lack of sound mathematical, theoretical frameworks for understanding IR [SIGIR2000]. Although many such frameworks have been proposed, in the final analysis none has been sufficiently well-grounded to attain widespread acceptance in the field. In addition, there is all too often a lack of empirically sound evaluation of such frameworks in an actual application. For this reason we have forayed into the theoretical domain of IR, while at the same time grounded our work in an application of widespread importance, search and retrieval. One need only glance at the statistics of the hit counts of the latest search engines to realize just how important search and retrieval has become. In this paper we present a novel approach to term clustering and its application in improving the performance of search and retrieval. Our approach is firmly grounded in a theoretical framework that we have developed.
Trackin Morphological and Semantic Co-occurrences in Spontaneous Dialogues
"... e seen as aspects of topic tracking. The classical mechanism for lexical prediction is the use of N-gram statistics for the surface forms of the relevant lex ical items. For the purposes of speech recognition and disambiguation in spontaneous language, however, this technique is unsatisfactory in t ..."
Abstract
- Add to MetaCart
e seen as aspects of topic tracking. The classical mechanism for lexical prediction is the use of N-gram statistics for the surface forms of the relevant lex ical items. For the purposes of speech recognition and disambiguation in spontaneous language, however, this technique is unsatisfactory in two respects. First, the range of predictions is too short, as predictions are usually made over a distance of no more than five words [Church, 1990]. To support bottom-up recognition and analysis of noisy material containing gaps and fragments, longer-rang predictions are needed as well. Long-range pre- Tracking Morpholog ical and Semant ic Co-occurrences in Spontaneous Dialogues Mar k Sel igman Universit Joseph Fourier GETA, CLIPS, IMAG-campus, BP 53 385, rue de la Bibliothque 38041 Grenoble Cedex 9, France sel igman @cerf net.c om Jan Alex ander sson Ger man R esear ch In stitu te of Comp uter Scien ce, DFK I GmbH Stu hlsat zenau sweg 3 66 123 S aarbr cke n, Ge r

