Term-weighting approaches in automatic text retrieval (1988) [916 citations — 9 self]
Abstract:
Abstract-The experimental evidence accumulated over the past 20 years indicates that text indexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective term-weighting systems. This article summarizes the insights gained in automatic term weight-ing, and provides baseline single-term-indexing models with which other more elaborate content analysis procedures can be compared. 1. AUTOMATIC TEXT ANALYSIS In the late 195Os, Luhn [l] first suggested that automatic text retrieval systems could be designed based on a comparison of content identifiers attached both to the stored texts and to the users ’ information queries. Typically, certain words extracted from the texts of doc-uments and queries would be used for content identification; alternatively, the content representations could be chosen manually by trained indexers familiar with the subject areas under consideration and with the contents of the document collections. In either case, the documents would be represented by term vectors of the form D = (ti,tj,...ytp) where each tk identifies a content term assigned to some sample document D. Analo-gously, the information requests, or queries, would be represented either in vector form, or in the form of Boolean statements. Thus, a typical query Q might be formulated as or Q = (qa,qbr...,4r) (2)
Citations
| 2331 | Introduction to Modern Information Retrieval – Salton, McGill - 1983 |
| 76 | A statistical approach to mechanized encoding and searching of literary information – Luhn - 1957 |
| 68 | The SMART Retrieval System Experiments – Salton - 1971 |
| 2 | The logical structure of coordinate indexing – Taube, Wachtel - 1952 |
| 1 | A new method of recording and searching information. American Documentation 4:l – Luhn - 1955 |
| 1 | Information analysis for machine searching – Perry - 1950 |

