Results 1 - 10
of
17
Exploration of Text Collections with Hierarchical Feature Maps
, 1997
"... Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this pape ..."
Abstract
-
Cited by 37 (14 self)
- Add to MetaCart
Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this paper we show the results from using a hierarchy of self-organizing maps to perform the text classification task. Each of the individual self-organizing maps is trained independently and gets specialized to a subset of the input data. As a consequence, the choice of this particular artificial neural network model enables the true establishment of a document taxonomy. The benefit of this approach is a straightforward representation of document similarities combined with dramatically reduced training time. In particular, the hierarchical representation of document collections is appealing because it is the underlying organizational principle in use by librarians providing the necessary familiarity...
Document Clustering and Text Summarization
, 2000
"... This paper describes a text mining tool that performs two tasks, namely document clustering and text summarization. These tasks have, of course, their corresponding counterpart in "conventional" data mining. However, the textual, unstructured nature of documents makes these two text mining tasks con ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
This paper describes a text mining tool that performs two tasks, namely document clustering and text summarization. These tasks have, of course, their corresponding counterpart in "conventional" data mining. However, the textual, unstructured nature of documents makes these two text mining tasks considerably more difficult than their data mining counterparts. In our system document clustering is performed by using the Autoclass data mining algorithm. Our text summarization algorithm is based on computing the value of a TF-ISF (term frequency -- inverse sentence frequency) measure for each word, which is an adaptation of the conventional TF-IDF (term frequency -- inverse document frequency) measure of information retrieval. Sentences with high values of TF-ISF are selected to produce a summary of the source text. The system has been evaluated on real-world documents, and the results are satisfactory. 1. Introduction Text mining is an emerging field at the intersection of several resea...
CIA's view of the world and what neural networks learn from it: A comparison of geographical document space representation metaphors
, 1998
"... . Text collections may be regarded as an almost perfect application arena for unsupervised neural networks. This because many operations computers have to perform on text documents are classification tasks based on noisy patterns. In particular we rely on self-organizing maps which produce a map of ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
. Text collections may be regarded as an almost perfect application arena for unsupervised neural networks. This because many operations computers have to perform on text documents are classification tasks based on noisy patterns. In particular we rely on self-organizing maps which produce a map of the document space after their training process. From geography, however, it is known that maps are not always the best way to represent information spaces. For most applications it is better to provide a hierarchical view of the underlying data collection in form of an atlas where starting from a map representing the complete data collection different regions are shown at finer levels of granularity. Using an atlas, the user can easily "zoom" into regions of particular interest while still having general maps for overall orientation. We show that a similar display can be obtained by using hierarchical feature maps to represent the contents of a document archive. These neural networks have a...
Finding Structure in Text Archives
- In Proc. European Symp. on Artificial Neural Networks (ESANN98
, 1998
"... . With the advance and massive growth of electronic text archives, the need for tools emerges, which help to gain insight into the basic structure of the underlying digital library. We present a neural network approach for the analysis and exploration of text archives aiming at the detection and vis ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
. With the advance and massive growth of electronic text archives, the need for tools emerges, which help to gain insight into the basic structure of the underlying digital library. We present a neural network approach for the analysis and exploration of text archives aiming at the detection and visualization of the inherent structure of the text collection. This cluster visualization technique called Adaptive Coordinates is based on an extended learning rule for the self-organizing map. It provides an intuitive visualization by mapping clusters in a high-dimensional input space onto groups of nodes in a 2-dimensional output space. We further compare the results of this mapping with another prominent cluster visualization technique, namely Sammon's Mapping. 1. Introduction Traditional text archives exhibit a kind of structure, which allows the user to understand the overall organization of the text collection and provides a means to search and to browse the collection to retrieve rel...
The indiGo Project: Enhancement of Experience Management and Process Learning with Moderated Discourses
- IN P. PERNER (ED.), DATA MINING IN MARKETING AND MEDICINE
, 2002
"... Within this paper we describe the indiGo approach to preparation, moderation, and analysis of discourses to enhance experience management. In the indiGo project this has been exemplified for the process learning domain. indiGo includes an integration of approaches for e-participation, experience ma ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Within this paper we describe the indiGo approach to preparation, moderation, and analysis of discourses to enhance experience management. In the indiGo project this has been exemplified for the process learning domain. indiGo includes an integration of approaches for e-participation, experience management, process modeling and publishing, as well as text mining. We describe both the methodology underlying indiGo and the indiGo platform. In addition, we compare indiGo to related work. Currently a case study is ongoing for an in-depth evaluation of the indiGo approach.
Document Classification with Self-Organizing Maps
, 1999
"... this paper we argue in favor of establishing a hierarchical organization of the document space based on an unsupervised neural network. In much the same way as we are showing the world on dierent pages of an atlas, where each page contains a map showing some portion of the world at some specic resol ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
this paper we argue in favor of establishing a hierarchical organization of the document space based on an unsupervised neural network. In much the same way as we are showing the world on dierent pages of an atlas, where each page contains a map showing some portion of the world at some specic resolution, we suggest to use a kind of atlas for document space representation [15,16]. A page of this atlas of the document space shows a portion of the library at some resolution while omitting other parts of the library. As long as general maps that provide an overview of the whole library are available, the user can nd his or her way along the library by choosing maps that provide a suciently detailed view of the area of particular interest. More precisely, we show the eects of using the hierarchical feature map [18] for document archive organization. The distinguished feature of this model is its layered architecture where each layer consists of a number of independent self-organizing maps. The training process results in a hierarchical arrangement of the document collection where self-organizing maps from higher layers of the hierarchy are used to represent the overall organizational principles of the document archive. Maps from lower layers of the hierarchy are used to provide ne-grained distinction between individual documents. Such an organization comes close to what we would usually expect from conventional libraries.
Using self-organizing maps to organize document archives and to characterize subject matters: How to make a map tell the news of the world
, 1999
"... . While the focus of research concerning electronic document archives still is on information retrieval, the importance of interactive exploration has been realized and is gaining importance. The map metaphor, where documents are organized on a map according to their contents, has proven particularl ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
. While the focus of research concerning electronic document archives still is on information retrieval, the importance of interactive exploration has been realized and is gaining importance. The map metaphor, where documents are organized on a map according to their contents, has proven particularly useful as an interface to such a collection. The self-organizing map has shown to produce stable topically ordered organizations of documents on such a 2-dimensional map display. However, the characteristics of these topical clusters are not being made explicit. In this paper we present the LabelSOM method which takes the applicability of the self-organizing map for document archive organization one step further by automatically labeling the various topical clusters found in the map. This allows the user to get an instant overview of the various topics covered by a document collection. 1 Introduction Today's information age may be characterized by constant massive production and dissemina...
En Route to Data Mining in Legal Text Corpora: Clustering, Neural Computation, and International Treaties
- In Proc. International Workshop on Database and Expert Systems Applications
, 1997
"... The huge amount of data in legal information systems requires a new generation of techniques and tools to assist lawyers in analyzing data and finding critical nuggets of useful knowledge. A promising approach for data mining in legal text corpora is classification. What we are looking for are power ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The huge amount of data in legal information systems requires a new generation of techniques and tools to assist lawyers in analyzing data and finding critical nuggets of useful knowledge. A promising approach for data mining in legal text corpora is classification. What we are looking for are powerful methods for the exploration of such libraries whereby the detection of similarities between documents is the overall goal. These methods may be used to gain insight in the inherent structure of the various items contained in a text archive. In this paper we present the results from a case study in legal document classification based on an experimental document archive comprising important treaties in public international law. The essentials of our approach are the usage of a vector space document representation and the utilization of an unsupervised artificial neural network for document classification. 1 Introduction During the last years we witnessed an ever increasing flood of writt...
Uncovering Associations Between Documents
- In Proc. International Joint Conference on Artificial Intelligence (IJCAI99
, 1999
"... The self-organizing map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as it is typically found in information retrieval applications. However, the interpretation of the map requires much manual effort, especially as far as the analysis of the lea ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
The self-organizing map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as it is typically found in information retrieval applications. However, the interpretation of the map requires much manual effort, especially as far as the analysis of the learned features and the characteristics of identified clusters is concerned. In this paper we present our novel LabelSOM method which, based on the features learned by the map, automatically selects the most descriptive features of the input patterns mapped onto a particular unit of the map, thus making the associations between the various clusters within the map explicit. We demonstrate the benefits of this approach with examples from text classification using two different real-world document archives. In this particular case, the features correspond to keywords describing the contents of a document. The benefit of this approach is obvious in that the various document clusters are character...

