Results 1 - 10
of
42
Indexing by latent semantic analysis
- JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
, 1990
"... A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The p ..."
Abstract
-
Cited by 2168 (30 self)
- Add to MetaCart
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 or-thogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are re-turned. initial tests find this completely automatic method for retrieval to be promising.
The Vocabulary Problem in Human-System Communication
- COMMUNICATIONS OF THE ACM
, 1987
"... In almost all computer applications, users must enter correct words for the desired objects or actions. For success without extensive training, or in first-tries for new targets, the system must recognize terms that will be chosen spontaneously. We studied spontaneous word choice for objects in five ..."
Abstract
-
Cited by 353 (6 self)
- Add to MetaCart
In almost all computer applications, users must enter correct words for the desired objects or actions. For success without extensive training, or in first-tries for new targets, the system must recognize terms that will be chosen spontaneously. We studied spontaneous word choice for objects in five application-related domains, and found the variability to be surprisingly large. In every case two people favored the same term with probability <0.20. Simulations show how this fundamental property of language limits the success of various design methodologies for vocabulary-driven interaction. For example, the popular approach in which access is via one designer's favorite single word will result in 80-90 percent failure rates in many common situations. An optimal strategy, unlimited aliasing, is derived and shown to be capable of several-fold improvements.
Paradox of the Active User
, 1987
"... One of the most sweeping changes ever in the ecology of human cognition may be taking place today. People are beginning to learn and use very powerful and sophisticated information processing technology as a matter of daily life. From the perspective of human history, this could be a transitional po ..."
Abstract
-
Cited by 84 (5 self)
- Add to MetaCart
One of the most sweeping changes ever in the ecology of human cognition may be taking place today. People are beginning to learn and use very powerful and sophisticated information processing technology as a matter of daily life. From the perspective of human history, this could be a transitional point dividing a period when machines merely helped us do things from a period when machines will seriously help us think about things. But if this is so, we are indeed still very much within the transition. For most people, computers have more possibility than they have real practical utility.
Using Latent Semantic Analysis To Improve Access To Textual Information
- SIGCHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS
, 1988
"... This paper describes a new approach for dealing with the vocabulary problem in human-computer interaction. Most approaches to retrieving textual materials depend on a lexical match between words in users' requests and those in or assigned to database objects. Because of the tremendous diversity in t ..."
Abstract
-
Cited by 84 (1 self)
- Add to MetaCart
This paper describes a new approach for dealing with the vocabulary problem in human-computer interaction. Most approaches to retrieving textual materials depend on a lexical match between words in users' requests and those in or assigned to database objects. Because of the tremendous diversity in the words people use to describe the same object, lexical matching methods are necessarily incomplete and imprecise [5]. The latent semantic indexing approach tries to overcome these problems by automatically organizing text objects into a semantic structure more appropriate for matching user requests. This is done by taking advantage of implicit higher-order structure in the association of terms with text objects. The particular technique used is singular-value decomposition, in which a large term by text-object matrix is decomposed into a set of about 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination. Terms and objects are represented by 50 to 150 dimensional vectors and matched against user queries in this “semantic” space. Initial tests find this completely automatic method widely applicable and a promising way to improve users' access to many kinds of textual materials, or to objects and services for which textual descriptions are available.
Experimental comparison of navigation in a Galois lattice with conventional information retrieval methods
- International Journal of Man-machine Studies
, 1998
"... A controlled experiment was conducted comparing information retrieval using a Galois lattice structure with two more conventional retrieval methods: navigating in a manually built hierarchical classification and Boolean querying with index terms. No significant performance difference was found be ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
A controlled experiment was conducted comparing information retrieval using a Galois lattice structure with two more conventional retrieval methods: navigating in a manually built hierarchical classification and Boolean querying with index terms. No significant performance difference was found between Boolean querying and the Galois lattice retrieval method for subject searching with the three measures used for the experiment: user searching time, recall and precision. However, hierarchical classification retrieval did show significantly lower recall compared to the two other methods. This experiment suggests that retrieval using a Galois lattice structure may be an attractive alternative since it combines a good performance for subject searching along with browsing potential. 11/12/98 2 1. Introduction Information retrieval is concerned with the representation, storage, organization, and accessing of information items (Salton & McGill, 1983). As opposed to the traditional f...
Human-Computer Interaction: Psychology as a Science of Design
- Annual Review of Psychology
, 2001
"... this paper, I review the history of HCI as steps toward a science of design. My touchstone is Simon's (1969) provocative book he Sciences of the Artificial. The book pre-dates HCI, and many of its specific characterizations and claims about design are no longer authoritative (see Ehn, 1988). Neverth ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
this paper, I review the history of HCI as steps toward a science of design. My touchstone is Simon's (1969) provocative book he Sciences of the Artificial. The book pre-dates HCI, and many of its specific characterizations and claims about design are no longer authoritative (see Ehn, 1988). Nevertheless, two of Simon's themes echo through the history of HCI, and still provide guidance for charting its continuing development
From frequency to meaning : Vector space models of semantics
- Journal of Artificial Intelligence Research
, 2010
"... Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are begi ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field. 1.
Optimizing Ranking Functions: A Connectionist Approach to Adaptive Information Retrieval
- DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, THE UNIVERSITY OF CALIFORNIA, SAN DIEGO
, 1994
"... This dissertation examines the use of adaptive methods to automatically improve the performance of ranked text retrieval systems. The goal of a ranked retrieval system is to manage a large collection of text documents and to order documents for a user based on the estimated relevance of the document ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
This dissertation examines the use of adaptive methods to automatically improve the performance of ranked text retrieval systems. The goal of a ranked retrieval system is to manage a large collection of text documents and to order documents for a user based on the estimated relevance of the documents to the user's information need (or query). The ordering enables the user to quickly find documents of interest. Ranked retrieval is a difficult problem because of the ambiguity of natural language, the large size of the collections, and because of the varying needs of users and varying collection characteristics. We propose and empirically validate general adaptive methods which improve the ability of a large class of retrieval systems to rank documents effectively. Our main adaptive method is to numerically optimize free parameters in a retrieval system by minimizing a non-metric criterion function. The criterion measures how well the system is ranking documents relative to a target ordering, defined by a set of training queries which include the users' desired document orderings. Thus, the system learns parameter settings which better enable it to rank relevant documents before irrelevant. The non-metric approach is interesting because it is a general adaptive method, an alternative to supervised methods for training neural networks in domains in which rank order or prioritization is important. A second adaptive method is also examined, which is applicable to a restricted class of retrieval systems but which permits an analytic solution. The adaptive methods are applied to a number of problems in text retrieval to validate their utility and practical efficiency. The applications include: A dimensionality reduction of vector-based document representations to a vector spa...
A News Story Categorization System
, 1988
"... This paper describes a pilot version of a commercial application of natural language processing techniques to the problem of categorizing news stories into broad topic categories. The system does not perform a complete semantic or syntactic analyses of the input stories. Its categorizations are depe ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
This paper describes a pilot version of a commercial application of natural language processing techniques to the problem of categorizing news stories into broad topic categories. The system does not perform a complete semantic or syntactic analyses of the input stories. Its categorizations are dependent on fragmentary recognition using patternmatching techniques. The fragments it looks for are determined by a set of knowledge-based rules. The accuracy of the system is only slightly lower than that of human categorizers.
Weight functions impact on LSA performance
- EuroConference RANLP'2001 (Recent Advances in NLP
, 2001
"... This paper presents experimental results of usage of LSA for analysis of English literature texts. Several preliminary transformations of the frequency text-document matrix with different weight functions are tested on the basis of control subsets. Additional clustering based on correlation matrix i ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
This paper presents experimental results of usage of LSA for analysis of English literature texts. Several preliminary transformations of the frequency text-document matrix with different weight functions are tested on the basis of control subsets. Additional clustering based on correlation matrix is applied in order to reveal the latent structure. The algorithm creates a shaded form matrix via singular values and vectors. The results are interpreted as a quality of the transformations and compared to the control set tests. 1.

