Results 1 -
4 of
4
Exploration of Text Collections with Hierarchical Feature Maps
, 1997
"... Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this pape ..."
Abstract
-
Cited by 37 (14 self)
- Add to MetaCart
Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this paper we show the results from using a hierarchy of self-organizing maps to perform the text classification task. Each of the individual self-organizing maps is trained independently and gets specialized to a subset of the input data. As a consequence, the choice of this particular artificial neural network model enables the true establishment of a document taxonomy. The benefit of this approach is a straightforward representation of document similarities combined with dramatically reduced training time. In particular, the hierarchical representation of document collections is appealing because it is the underlying organizational principle in use by librarians providing the necessary familiarity...
Exploratory Analysis of Concept and Document Spaces with Connectionist Networks
- Artificial Intelligence and Law
, 1999
"... . Exploratory analysis is an area of increasing interest in the computational linguistics arena. Pragmatically speaking, exploratory analysis may be paraphrased as natural language processing by means of analyzing large corpora of text. Concerning the analysis, appropriate means are statistics, on t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
. Exploratory analysis is an area of increasing interest in the computational linguistics arena. Pragmatically speaking, exploratory analysis may be paraphrased as natural language processing by means of analyzing large corpora of text. Concerning the analysis, appropriate means are statistics, on the one hand, and artificial neural networks, on the other hand. As a challenging application area for exploratory analysis of text corpora we may certainly identify text databases, be it information retrieval or information filtering systems. With this paper we present recent findings of exploratory analysis based on both statistical and neural models applied to legal text corpora. Concerning the artificial neural networks, we rely on a model adhering to the unsupervised learning paradigm. This choice appears naturally when taking into account the specific properties of large text corpora where one is faced with the fact that input-output-mappings as required by supervised learning models ca...
Data Mining in Large Free Text Document Archieves
- Proc. of the Int. Symposium on Cooperative Database Systems for Advanced Applications
, 1996
"... Document classification may be regarded as one of the central issues in information retrieval research during the last decades. The challenge of classification is to uncover the similarities between groups of data in order to improve the retrieval effectiveness of the overall system. From an explora ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Document classification may be regarded as one of the central issues in information retrieval research during the last decades. The challenge of classification is to uncover the similarities between groups of data in order to improve the retrieval effectiveness of the overall system. From an exploratory data analysis point of view the same process of classification may be used to gain insight in the structure of the various data items and may thus be referred to as data mining in text archives. In this paper we show the results from applying a neural network model, the hierarchical feature map, to such a data mining task. The neural network is carefully designed to impose a hierarchical structure on the underlying document collection which leads to straight-forward representation of data similarities. Apart from the benefit for text data mining, we are able to demonstrate that the hierarchical feature map leads to a tremendous speed-up of the training process as compared to more tradit...
Document Classification with Unsupervised Artificial Neural Networks
- IN F. CRESTANI, & G. PASI (EDS.), SOFT COMPUTING IN INFORMATION RETRIEVAL (PP. 102–121). WURZBURG (WIEN): PHYSICA-VERLAG
, 2000
"... Text collections may be regarded as an almost perfect application arena for unsupervised neural networks. This is because many operations computers have to perform on text documents are classification tasks based on noisy patterns. In particular we rely on self-organizing maps which produce a map of ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Text collections may be regarded as an almost perfect application arena for unsupervised neural networks. This is because many operations computers have to perform on text documents are classification tasks based on noisy patterns. In particular we rely on self-organizing maps which produce a map of the document space after their training process. From geography, however, it is known that maps are not always the best way to represent information spaces. For most applications it is better to provide a hierarchical view of the underlying data collection in form of an atlas where, starting from a map representing the complete data collection, different regions are shown at finer levels of granularity. Using an atlas, the user can easily "zoom" into regions of particular interest while still having general maps for overall orientation. We show that a similar display can be obtained by using hierarchical feature maps to represent the contents of a document archive. These neural networks have layerd architecture where each layer consists of a number of individual self-organizing maps. By this, the contents of the text archive may be represented at arbitrary detail while still having the general maps available for global orientation.

