Results 1 - 10
of
23
Exploration of Text Collections with Hierarchical Feature Maps
, 1997
"... Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this pape ..."
Abstract
-
Cited by 37 (14 self)
- Add to MetaCart
Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this paper we show the results from using a hierarchy of self-organizing maps to perform the text classification task. Each of the individual self-organizing maps is trained independently and gets specialized to a subset of the input data. As a consequence, the choice of this particular artificial neural network model enables the true establishment of a document taxonomy. The benefit of this approach is a straightforward representation of document similarities combined with dramatically reduced training time. In particular, the hierarchical representation of document collections is appealing because it is the underlying organizational principle in use by librarians providing the necessary familiarity...
Content-Based Software Classification by Self-Organization
, 1995
"... This paper is concerned with a case study in content-based classification of textual documents. In particular we compare the application of two prominent self-organizing neural networks to the same problem domain, namely the organization of software libraries. The two models are Adaptive Resonance T ..."
Abstract
-
Cited by 25 (11 self)
- Add to MetaCart
This paper is concerned with a case study in content-based classification of textual documents. In particular we compare the application of two prominent self-organizing neural networks to the same problem domain, namely the organization of software libraries. The two models are Adaptive Resonance Theory and Self-Organizing Maps. As a result we are able to show that both models successfully arrange software components according to their semantic similarity. 1. Introduction Software reuse is concerned with the technological and organizational issues of using already existing software components to build new applications. This is believed to be one of the most promising proposals to overcome the frequently discussed software crisis which may be described as the lack of productivity in the software area as well as the inability of the software suppliers to satisfy the needs of their customers. However, to make software reuse operational the software developers need to be provided with la...
Exploration of Document Collections with Self-Organizing Maps: A Novel Approach to Similarity Representation
- In Proceedings of the European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD'97
, 1997
"... . Classification is one of the central issues in any system dealing with text data. The need for effective approaches is dramatically increased nowadays due to the advent of massive digital libraries containing free-form documents. What we are looking for are powerful methods for the exploration of ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
. Classification is one of the central issues in any system dealing with text data. The need for effective approaches is dramatically increased nowadays due to the advent of massive digital libraries containing free-form documents. What we are looking for are powerful methods for the exploration of such libraries whereby the detection of similarities between the various text documents is the overall goal. In other words, methods that may be used to gain insight in the inherent structure of the various items contained in a text archive are needed. In this paper we demonstrate the applicability of self-organizing maps, a neural network model adhering to the unsupervised learning paradigm, for the task of text document clustering. In order to improve the representation of the result we present an extension to the basic learning rule that captures the movement of the various weight vectors in a two-dimensional output space for convenient visual inspection. The result of the extended traini...
Automatic Labeling of Self-Organizing Maps for Information Retrieval
, 2001
"... The self-organizing map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as in information retrieval applications. However, the interpretation of the map requires much manual eort, especially as far as the analysis of the learned features and the ch ..."
Abstract
-
Cited by 18 (8 self)
- Add to MetaCart
The self-organizing map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as in information retrieval applications. However, the interpretation of the map requires much manual eort, especially as far as the analysis of the learned features and the characteristics of identi ed clusters is concerned. In this paper we present the LabelSOM method which, based on the features learned by the map, automatically selects the most descriptive features of the input patterns mapped onto a particular unit of the map, thus making the characteristics of the various clusters within the map explicit. We demonstrate the bene ts of this approach on an example from text classi cation using a real-world document archive. In this particular case, the features correspond to keywords describing the contents of a document. The bene t of this approach is that the various document clusters are characterized in terms of shared keywords, thus making it easy for the user to explore the contents of an unknown document archive.
Creating an Order in Distributed Digital Libraries by Integrating Independent Self-Organizing Maps
- In Proc. Int'l Conf. on Artificial Neural Networks (ICANN'98
, 1998
"... Digital document libraries are an almost perfect application arena for unsupervised neural networks. This because many of the operations computers have to perform on text documents are classification tasks based on "noisy" input patterns. The "noise" arises because of the known inaccuracy of mapping ..."
Abstract
-
Cited by 18 (12 self)
- Add to MetaCart
Digital document libraries are an almost perfect application arena for unsupervised neural networks. This because many of the operations computers have to perform on text documents are classification tasks based on "noisy" input patterns. The "noise" arises because of the known inaccuracy of mapping natural language to an indexing vocabulary representing the contents of the documents. A growing number of papers is dedicated to the usage of self-organizing maps to organize the contents of such digital libraries. These papers assume the central availability of the data; an assumption that is questionable given the massive amount of available information. In this paper we describe an approach for organizing distributed digital libraries based on a system of independent self-organizing maps each of which representing just a portion of the complete digital library. Furthermore, we argue in favor of integrating these independent maps in a hierarchical fashion, again by means of self-organizi...
Soft Information retrieval: applications of fuzzy set theory and neural networks
- Neuro-fuzzy Techniques for Intelligent Information Systems
, 1999
"... Abstract. This paper presents a short survey of fuzzy and neural approaches to Information Retrieval. The goal of such approaches is to de ne exible Information Retrieval Systems able to deal with the inherent vagueness and uncertainty of the retrieval process. In this survey we address if and how s ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Abstract. This paper presents a short survey of fuzzy and neural approaches to Information Retrieval. The goal of such approaches is to de ne exible Information Retrieval Systems able to deal with the inherent vagueness and uncertainty of the retrieval process. In this survey we address if and how some approaches met their goal. 1.
CIA's view of the world and what neural networks learn from it: A comparison of geographical document space representation metaphors
, 1998
"... . Text collections may be regarded as an almost perfect application arena for unsupervised neural networks. This because many operations computers have to perform on text documents are classification tasks based on noisy patterns. In particular we rely on self-organizing maps which produce a map of ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
. Text collections may be regarded as an almost perfect application arena for unsupervised neural networks. This because many operations computers have to perform on text documents are classification tasks based on noisy patterns. In particular we rely on self-organizing maps which produce a map of the document space after their training process. From geography, however, it is known that maps are not always the best way to represent information spaces. For most applications it is better to provide a hierarchical view of the underlying data collection in form of an atlas where starting from a map representing the complete data collection different regions are shown at finer levels of granularity. Using an atlas, the user can easily "zoom" into regions of particular interest while still having general maps for overall orientation. We show that a similar display can be obtained by using hierarchical feature maps to represent the contents of a document archive. These neural networks have a...
Text Data Mining
- In A Handbook of Natural Language Processing: Techniques and Applications for the Processing of Language as Text
, 1998
"... Classification is one of the central issues in any system dealing with text data. The need for effective approaches is dramatically increased nowadays due to the advent of massive digital libraries containing freeform documents. What we are looking for are powerful methods for the exploration of suc ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Classification is one of the central issues in any system dealing with text data. The need for effective approaches is dramatically increased nowadays due to the advent of massive digital libraries containing freeform documents. What we are looking for are powerful methods for the exploration of such libraries whereby the discovery of similarities between groups of text documents is the overall goal. In other words, methods that may be used to gain insight in the inherent structure of the various items contained in a text archive are needed. In this paper we demonstrate the applicability of unsupervised neural networks for the task of text document clustering. Specifically, we describe the results from using self-organizing maps for the exploration of document archives. We further argue in favor of paying more attention to the fact that text archives lend themselves naturally to a hierarchical structure. We take advantage of this fact by using a hierarchically organized network built u...
Document Classification with Self-Organizing Maps
, 1999
"... this paper we argue in favor of establishing a hierarchical organization of the document space based on an unsupervised neural network. In much the same way as we are showing the world on dierent pages of an atlas, where each page contains a map showing some portion of the world at some specic resol ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
this paper we argue in favor of establishing a hierarchical organization of the document space based on an unsupervised neural network. In much the same way as we are showing the world on dierent pages of an atlas, where each page contains a map showing some portion of the world at some specic resolution, we suggest to use a kind of atlas for document space representation [15,16]. A page of this atlas of the document space shows a portion of the library at some resolution while omitting other parts of the library. As long as general maps that provide an overview of the whole library are available, the user can nd his or her way along the library by choosing maps that provide a suciently detailed view of the area of particular interest. More precisely, we show the eects of using the hierarchical feature map [18] for document archive organization. The distinguished feature of this model is its layered architecture where each layer consists of a number of independent self-organizing maps. The training process results in a hierarchical arrangement of the document collection where self-organizing maps from higher layers of the hierarchy are used to represent the overall organizational principles of the document archive. Maps from lower layers of the hierarchy are used to provide ne-grained distinction between individual documents. Such an organization comes close to what we would usually expect from conventional libraries.
Lessons Learned in Text Document Classification
- Proc. of Workshop on Self-Organizing Maps 1997 (WSOM’97), Helsinki University of Technology, Neural Networks Research
, 1997
"... Text archives may be regarded as an almost optimal application arena for unsupervised neural networks. This because many of the operations computers have to perform on text documents are classification tasks based on noisy patterns. As a natural result, an ever increasing number of research reports ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Text archives may be regarded as an almost optimal application arena for unsupervised neural networks. This because many of the operations computers have to perform on text documents are classification tasks based on noisy patterns. As a natural result, an ever increasing number of research reports concerned with that type of application appeared in literature. In this paper we argue in favor of paying more attention to the fact that text archives lend themselves naturally to a hierarchical structure. We take advantage of this fact by using a hierarchically organized network built up from independent self-organizing maps in order to enable the true establishment of a document taxonomy. 1 Introduction The self-organizing map is a neural network model capable of arranging high-dimensional input data within its (usually) two-dimensional output space in such a way that the similarity of the input data is mirrored as faithfully as possible. The utilization of this model is thus especially ...

