Results 1 - 10
of
44
WEBSOM - Self-Organizing Maps of Document Collections
- Neurocomputing
, 1997
"... Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable efforts have been made to develop alternative methods, but their practical applicability has been l ..."
Abstract
-
Cited by 121 (14 self)
- Add to MetaCart
Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable efforts have been made to develop alternative methods, but their practical applicability has been low. Powerful methods are needed for the exploration of miscellaneous document collections. The WEBSOM method organizes a document collection on a map display that provides an overview of the collection and facilitates interactive browsing. Interesting documents can be retrieved by a content addressable search of interesting map locations. The interesting locations could also be marked as filters for collecting interesting new documents.
Newsgroup Exploration with WEBSOM Method and Browsing Interface
, 1996
"... The current availability of large collections of full-text documents in electronic form emphasizes the need for intelligent information retrieval techniques. Especially in the rapidly growing World Wide Web it is important to have methods for exploring miscellaneous document collections automaticall ..."
Abstract
-
Cited by 66 (20 self)
- Add to MetaCart
The current availability of large collections of full-text documents in electronic form emphasizes the need for intelligent information retrieval techniques. Especially in the rapidly growing World Wide Web it is important to have methods for exploring miscellaneous document collections automatically. In the report, we introduce the WEBSOM method for this task. Self-Organizing Maps (SOMs) are used to position encoded documents onto a map that provides a general view into the text collection. The general view visualizes similarity relations between the documents on a map display, which can be utilized in exploring the material rather than having to rely on traditional search expressions. Similar documents become mapped close to each other. The potential of the WEBSOM method is demonstrated in a case study where articles from the Usenet newsgroup "comp.ai.neural-nets" are organized. The map is available for exploration at the WWW address http://websom.hut.fi/websom/
Self-Organizing Maps of Document Collections: A New Approach to Interactive Exploration
, 1996
"... Powerful methods for interactive exploration and search from collections of free-form textual documents are needed to manage the ever-increasing flood of digital information. In this article we present a method, WEBSOM, for automatic organization of full-text document collections using the self-orga ..."
Abstract
-
Cited by 57 (15 self)
- Add to MetaCart
Powerful methods for interactive exploration and search from collections of free-form textual documents are needed to manage the ever-increasing flood of digital information. In this article we present a method, WEBSOM, for automatic organization of full-text document collections using the self-organizing map (SOM) algorithm. The document collection is ordered onto a map in an unsupervised manner utilizing statistical information of short word contexts. The resulting ordered map where similar documents lie near each other thus presents a general view of the document space. With the aid of a suitable (WWWbased) interface, documents in interesting areas of the map can be browsed. The browsing can also be interactively extended to related topics, which appear in nearby areas on the map. Along with the method we present a case study of its use. Keywords: data visualization, document organization, full-text analysis, interactive exploration, self-organizing map. Introduction Finding relev...
Very Large Two-Level SOM for the Browsing of Newsgroups
, 1996
"... . On January 19, 1996 we published in the Internet a demo of how to use Self-Organizing Maps (SOMs) for the organization of large collections of full-text files. Later we added other newsgroups to the demo. It can be found at the address http://websom.hut.fi/websom/. In the present paper we describe ..."
Abstract
-
Cited by 45 (13 self)
- Add to MetaCart
. On January 19, 1996 we published in the Internet a demo of how to use Self-Organizing Maps (SOMs) for the organization of large collections of full-text files. Later we added other newsgroups to the demo. It can be found at the address http://websom.hut.fi/websom/. In the present paper we describe the main features of this system, called the WEBSOM, as well as some newer developments of it. 1 Introduction When organizing large collections of free-form full-text document files that contain no keywords, e.g. the newsgroups in the Internet, it is difficult to base their analysis on traditional search expressions. The main information one can resort to in the classification of such documents is statistical. SOMs of document collections have previously been constructed on the basis of their word histograms (published works are [5], [6], [7], [10], [11], [12]). Thereby, however, the size of the selected vocabulary cannot be large. In other studies (cf., e.g., [1], [2], [8], [9], [11], and...
Exploration of Text Collections with Hierarchical Feature Maps
, 1997
"... Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this pape ..."
Abstract
-
Cited by 37 (14 self)
- Add to MetaCart
Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this paper we show the results from using a hierarchy of self-organizing maps to perform the text classification task. Each of the individual self-organizing maps is trained independently and gets specialized to a subset of the input data. As a consequence, the choice of this particular artificial neural network model enables the true establishment of a document taxonomy. The benefit of this approach is a straightforward representation of document similarities combined with dramatically reduced training time. In particular, the hierarchical representation of document collections is appealing because it is the underlying organizational principle in use by librarians providing the necessary familiarity...
Creating an Order in Digital Libraries with Self-Organizing Maps
, 1996
"... Formulation of suitable search expressions for information retrieval from large full-text databases may currently require considerable efforts. Changing the scope of the search when, e.g., too many or too few hits have been obtained, requires re-formulation of the search expression. For an alternati ..."
Abstract
-
Cited by 36 (15 self)
- Add to MetaCart
Formulation of suitable search expressions for information retrieval from large full-text databases may currently require considerable efforts. Changing the scope of the search when, e.g., too many or too few hits have been obtained, requires re-formulation of the search expression. For an alternative scheme we suggest an explorative full-text information retrieval method, where the Self-Organizing Map (SOM) algorithm is used to order documents based on their full textual contents. The visualized order can then be utilized for an explorative search or exploration of novel knowledge areas, whereby the scope can be changed interactively. The ordering of the documents is achieved by a two-level analysis: First, word categories are extracted from the text by a "semantic" SOM. Second, the textual context of the documents is encoded on the basis of the histograms of words formed on the word category map. 1 Introduction The information age is characterized by an uncontrolled flood of miscel...
WEBSOM for Textual Data Mining
- Artificial Intelligence Review
, 1999
"... New methods that axe user-friendly and efficient axe needed for guidance among the masses of textual information available in the Internet and the World Wide Web. We have developed a method and a tool called the WEBSON which utilizes the self-organizing map algorithm (SON) for organizing laxge colle ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
New methods that axe user-friendly and efficient axe needed for guidance among the masses of textual information available in the Internet and the World Wide Web. We have developed a method and a tool called the WEBSON which utilizes the self-organizing map algorithm (SON) for organizing laxge collections of text documents onto visual document maps. The approach to processing text is statistically oriented, computationally feasible, and scalablever a million text documents have been ordered on a single map. In the axticle we consider different kinds of information needs and tasks regaxding organizing, visualizing, seaxching, categorizing and filtering textual data. Furthermore, we discuss and illustrate with examples how document maps can aid in these situations. An example is presented where a document map is utilized as a tool for visualizing and filtering a stream of incoming electronic mail messages.
Learning Shallow Context-Free Languages under Simple Distributions
, 1999
"... this paper I present the EMILE 3.0 algorithm ..."
Exploration of Full-Text Databases with Self-Organizing Maps
- In Proceedings of the ICNN96, International Conference on Neural Networks, volume I
, 1996
"... Availability of large full-text document collections in electronic form has created a need for intelligent information retrieval techniques. Especially the expanding World Wide Web presupposes methods for systematic exploration of miscellaneous document collections. In this paper we introduce a new ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Availability of large full-text document collections in electronic form has created a need for intelligent information retrieval techniques. Especially the expanding World Wide Web presupposes methods for systematic exploration of miscellaneous document collections. In this paper we introduce a new method, the WEBSOM, for this task. Self-Organizing Maps (SOMs) are used to represent documents on a map that provides an insightful view of the text collection. This view visualizes similarity relations between the documents, and the display can be utilized for orderly exploration of the material rather than having to rely on traditional search expressions. The complete WEBSOM method involves a two-level SOM architecture comprising of a word category map and a document map, and means for interactive exploration of the data base. 1. Introduction Full-text classification may be based on the assumption that the elementary textual features of documents that deal with similar topics are statist...
Text Data Mining
- In A Handbook of Natural Language Processing: Techniques and Applications for the Processing of Language as Text
, 1998
"... Classification is one of the central issues in any system dealing with text data. The need for effective approaches is dramatically increased nowadays due to the advent of massive digital libraries containing freeform documents. What we are looking for are powerful methods for the exploration of suc ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Classification is one of the central issues in any system dealing with text data. The need for effective approaches is dramatically increased nowadays due to the advent of massive digital libraries containing freeform documents. What we are looking for are powerful methods for the exploration of such libraries whereby the discovery of similarities between groups of text documents is the overall goal. In other words, methods that may be used to gain insight in the inherent structure of the various items contained in a text archive are needed. In this paper we demonstrate the applicability of unsupervised neural networks for the task of text document clustering. Specifically, we describe the results from using self-organizing maps for the exploration of document archives. We further argue in favor of paying more attention to the fact that text archives lend themselves naturally to a hierarchical structure. We take advantage of this fact by using a hierarchically organized network built u...

