Results 1 - 10
of
11
WEBSOM - Self-Organizing Maps of Document Collections
- Neurocomputing
, 1997
"... Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable efforts have been made to develop alternative methods, but their practical applicability has been l ..."
Abstract
-
Cited by 121 (14 self)
- Add to MetaCart
Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable efforts have been made to develop alternative methods, but their practical applicability has been low. Powerful methods are needed for the exploration of miscellaneous document collections. The WEBSOM method organizes a document collection on a map display that provides an overview of the collection and facilitates interactive browsing. Interesting documents can be retrieved by a content addressable search of interesting map locations. The interesting locations could also be marked as filters for collecting interesting new documents.
Data Exploration Using Self-Organizing Maps
- ACTA POLYTECHNICA SCANDINAVICA: MATHEMATICS, COMPUTING AND MANAGEMENT IN ENGINEERING SERIES NO. 82
, 1997
"... Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and time-consuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the ..."
Abstract
-
Cited by 93 (4 self)
- Add to MetaCart
Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and time-consuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the exploration: the structures in the data sets can be illustrated on special map displays. In this work, the methodology of using SOMs for exploratory data analysis or data mining is reviewed and developed further. The properties of the maps are compared with the properties of related methods intended for visualizing highdimensional multivariate data sets. In a set of case studies the SOM algorithm is applied to analyzing electroencephalograms, to illustrating structures of the standard of living in the world, and to organizing full-text document collections. Measures are proposed for evaluating the quality of different types of maps in representing a given data set, and for measuring the robu...
Word Space
- Advances in Neural Information Processing Systems 5
, 1993
"... Representations for semantic information about words are necessary for many applications of neural networks in natural language processing. This paper describes an efficient, corpus-based method for inducing distributed semantic representations for a large number of words (50,000) from lexical coccu ..."
Abstract
-
Cited by 53 (0 self)
- Add to MetaCart
Representations for semantic information about words are necessary for many applications of neural networks in natural language processing. This paper describes an efficient, corpus-based method for inducing distributed semantic representations for a large number of words (50,000) from lexical coccurrence statistics by means of a large-scale linear regression. The representations are successfully applied to word sense disambiguation using a nearest neighbor method.
WEBSOM for Textual Data Mining
- Artificial Intelligence Review
, 1999
"... New methods that axe user-friendly and efficient axe needed for guidance among the masses of textual information available in the Internet and the World Wide Web. We have developed a method and a tool called the WEBSON which utilizes the self-organizing map algorithm (SON) for organizing laxge colle ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
New methods that axe user-friendly and efficient axe needed for guidance among the masses of textual information available in the Internet and the World Wide Web. We have developed a method and a tool called the WEBSON which utilizes the self-organizing map algorithm (SON) for organizing laxge collections of text documents onto visual document maps. The approach to processing text is statistically oriented, computationally feasible, and scalablever a million text documents have been ordered on a single map. In the axticle we consider different kinds of information needs and tasks regaxding organizing, visualizing, seaxching, categorizing and filtering textual data. Furthermore, we discuss and illustrate with examples how document maps can aid in these situations. An example is presented where a document map is utilized as a tool for visualizing and filtering a stream of incoming electronic mail messages.
Automatic Indexing: An Approach Using an Index Term Corpus and Combining Linguistic and Statistical Methods
, 2000
"... This thesis discusses the problems and the methods of finding relevant information in large collections of documents. The contribution of this thesis to this problem is to develop better content analysis methods which can be used to describe document content with index terms. Index terms can be used ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This thesis discusses the problems and the methods of finding relevant information in large collections of documents. The contribution of this thesis to this problem is to develop better content analysis methods which can be used to describe document content with index terms. Index terms can be used as meta-information that describes documents, and that is used for seeking information. The main point of this thesis is to illustrate the process of developing an automatic indexer which analyses the content of documents by combining evidence from word frequencies and evidence from linguistic analysis provided by a syntactic parser. The indexer weights the expressions of a text according to their estimated importance for describing the content of a given document on the basis of the content analysis. The typical linguistic features of index terms were explored using a linguistically analysed text collection where the index terms are manually marked up. This text collection is referred to as an index term corpus. Specific features of the index terms provided the basis for a linguistic term-weighting scheme, which was then combined with a frequency-based term-weighting scheme. The use of an index term corpus like this as training material is a new method of developing an automatic indexer. The results of the experiments were promising.
Relevance judgments for assessing recall
- Information Processing and Management
, 1996
"... Abstract--Recall and Precision have become the principle measures of the effectiveness of information retrieval systems. Inherent in these measures of performance is the idea of a relevant document, Although recall and precision are easily and unambiguously defined, selecting the documents relevant ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract--Recall and Precision have become the principle measures of the effectiveness of information retrieval systems. Inherent in these measures of performance is the idea of a relevant document, Although recall and precision are easily and unambiguously defined, selecting the documents relevant to a query has long been recognized as problematic. To compare performance of different systems, standard collections of documents, queries, and relevance judgments have been used. Unfortunately the standard collections, such as SMART and TREC, have locked in a particular approach to relevance that is suitable for assessing precision but not recall. The problem is demonstrated by comparing two information retrieval methods over several queries, and showing how a new method of forming relevance judgments that is suitable for assessing recall gives different results. Recall is an interesting and practical issue, but current test procedures are inadequate for measuring it. Copyright ©
Relevance Judgements for Assessing Recall
, 1995
"... Recall and Precision have become the principle measures of the effectiveness of information retrieval systems. Inherent in these measures of performance is the idea of a relevant document. Although recall and precision are easily and unambiguously defined, selecting the documents relevant to a qu ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Recall and Precision have become the principle measures of the effectiveness of information retrieval systems. Inherent in these measures of performance is the idea of a relevant document. Although recall and precision are easily and unambiguously defined, selecting the documents relevant to a query has long been recognised as problematic. To compare performance of different systems, standard collections of documents, queries, and relevance judgements have been used. Unfortunately the standard collections, such as SMART and TREC, have locked in a particular approach to relevance and this has affected subsequent research. Two styles of information need are distinguished, high precision and high recall, and a method of forming relevance judgements suitable for each is described. The issues are illustrated by comparing two retrieval systems, keyword retrieval and semantic signatures, on different sets of relevance judgements. 2 Introduction Four decades of testing informatio...
Data analysis of conceptual similarities of Finnish verbs
- 18 5 Hyperparameter Estimation 21
, 2002
"... The study of the conceptual representations that underlie the use of language is a problem motivated from both a cognitive research point of view and that of construing language models for various language processing tasks. In this work, we organized 600 Finnish verbs using the SOM algorithm. T ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The study of the conceptual representations that underlie the use of language is a problem motivated from both a cognitive research point of view and that of construing language models for various language processing tasks. In this work, we organized 600 Finnish verbs using the SOM algorithm. Three experiments were conducted using dierent features to encode the verbs: morphosyntactic properties, individual nouns, and noun categories in the context of the verb. In general, the morphosyntactic properties seem to draw attention to semantic roles, whereas nouns as features seem to highlight clusters formed on grounds of topics in the text.
Statistical aspects of the WEBSOM system in organizing document collections
- Interface Foundation of North
, 1998
"... WEBSOM is a novel method for organizing document collections onto map displays to enhance the interactive browsing and retrieval of the documents. The map is organized automatically according to the contents of the full-text documents by the Self-Organizing Map algorithm. The map display provides a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
WEBSOM is a novel method for organizing document collections onto map displays to enhance the interactive browsing and retrieval of the documents. The map is organized automatically according to the contents of the full-text documents by the Self-Organizing Map algorithm. The map display provides a visual overview of the whole document collection. The overview, the map display, aids in the exploration since similar documents are located close to each other. In this paper we describe the WEBSOM system in a statistically oriented fashion and discuss its relations to other methods. Particular emphasis is put on how effective the methods are in treating large document collections. The two-phase architecture of the WEBSOM system makes it possible to build contextual information about the relations of words off-line into a word category representation, which can then be utilized rapidly on-line, when the documents are being encoded. The construction of large map displays from the encoded document representations is a computationally intensive operation when done in a straightforward manner. There exist, however, several effective computational shortcuts.
Efficient Preprocessing for Information Retrieval with Neural Networks
- EUFIT ‘99. 7th European Congress on Intelligent Techniques and Soft Computing
, 1999
"... Abstract: Neural networks are well suited for Information Retrieval (IR) from large text or multimedia databases. Their capacity for tolerant and intuitive processing offers new perspectives in IR where the vague nature of human relevance judgements has confronted theory and systems with considerabl ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract: Neural networks are well suited for Information Retrieval (IR) from large text or multimedia databases. Their capacity for tolerant and intuitive processing offers new perspectives in IR where the vague nature of human relevance judgements has confronted theory and systems with considerable problems. Most models use the keyword representation vector as input or output. However, fulltext indexing brings forth large vectors which are difficult to handle for neural networks. This article discusses methods for dimensionality reduction used in IR and applies one of them, Latent Semantic Indexing (LSI) to information retrieval using a neural backpropagation network. The transformation between two representation schemes is enabled through preprocessing by LSI which is based on Singular Value Decomposition (SVD). 1

