Results 11 - 20
of
23
Using self-organizing maps to organize document archives and to characterize subject matters: How to make a map tell the news of the world
, 1999
"... . While the focus of research concerning electronic document archives still is on information retrieval, the importance of interactive exploration has been realized and is gaining importance. The map metaphor, where documents are organized on a map according to their contents, has proven particularl ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
. While the focus of research concerning electronic document archives still is on information retrieval, the importance of interactive exploration has been realized and is gaining importance. The map metaphor, where documents are organized on a map according to their contents, has proven particularly useful as an interface to such a collection. The self-organizing map has shown to produce stable topically ordered organizations of documents on such a 2-dimensional map display. However, the characteristics of these topical clusters are not being made explicit. In this paper we present the LabelSOM method which takes the applicability of the self-organizing map for document archive organization one step further by automatically labeling the various topical clusters found in the map. This allows the user to get an instant overview of the various topics covered by a document collection. 1 Introduction Today's information age may be characterized by constant massive production and dissemina...
The Exploration of Legal Text Corpora with Hierarchical Neural Networks: A Guided Tour in Public International Law
- in Public International Law”, Proc. Int. Conf. on Artificial Intelligence and Law
, 1997
"... The classification of feature vectors representing the interpretation of legal documents improves the search for similar or related documents, the interpretation of these documents as well as the navigation within the text corpus. The need for effective approaches of classification is dramatically i ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
The classification of feature vectors representing the interpretation of legal documents improves the search for similar or related documents, the interpretation of these documents as well as the navigation within the text corpus. The need for effective approaches of classification is dramatically increased nowadays due to the advent of massive digital libraries containing free-form legal text documents. What we are looking for are powerful methods for the exploration of such libraries whereby the detection of similarities between groups of documents is the overall goal. In other words, methods that may be used to gain insight in the inherent structure of the various items contained in a text archive are needed. In this paper we present the results from a case study in legal document classification based on an experimental document archive comprising important treaties in public international law. The core task of classification is performed by a non-standard neural network model with ...
En Route to Data Mining in Legal Text Corpora: Clustering, Neural Computation, and International Treaties
- In Proc. International Workshop on Database and Expert Systems Applications
, 1997
"... The huge amount of data in legal information systems requires a new generation of techniques and tools to assist lawyers in analyzing data and finding critical nuggets of useful knowledge. A promising approach for data mining in legal text corpora is classification. What we are looking for are power ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The huge amount of data in legal information systems requires a new generation of techniques and tools to assist lawyers in analyzing data and finding critical nuggets of useful knowledge. A promising approach for data mining in legal text corpora is classification. What we are looking for are powerful methods for the exploration of such libraries whereby the detection of similarities between documents is the overall goal. These methods may be used to gain insight in the inherent structure of the various items contained in a text archive. In this paper we present the results from a case study in legal document classification based on an experimental document archive comprising important treaties in public international law. The essentials of our approach are the usage of a vector space document representation and the utilization of an unsupervised artificial neural network for document classification. 1 Introduction During the last years we witnessed an ever increasing flood of writt...
Self-Organizing Maps And Software Reuse
- Computational Intelligence in Software Engineering
, 1998
"... Software reuse is the process of building new systems from existing components instead of developing these systems from scratch. For a long time now software reuse is repeatedly acknowledged for playing an essential role in overcoming the so-called software crisis, i.e. the late delivery of then sti ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Software reuse is the process of building new systems from existing components instead of developing these systems from scratch. For a long time now software reuse is repeatedly acknowledged for playing an essential role in overcoming the so-called software crisis, i.e. the late delivery of then still faulty software products. Current development practice as for example object-oriented analysis, design, and programming should in principle assist the proliferation of the reuse idea. However, before existing components may be considered for reuse they have to be found in a software library. As ever in any area relying on the retrieval of particular objects from a large data store, the process of retrieval may turn out to be rather cumbersome, especially when a large number of objects is contained in the data store and the success of the whole operation is dependent on the retrieval of a small number of relevant objects. With this work we address the assistance of such a retrieval process...
Uncovering Associations Between Documents
- In Proc. International Joint Conference on Artificial Intelligence (IJCAI99
, 1999
"... The self-organizing map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as it is typically found in information retrieval applications. However, the interpretation of the map requires much manual effort, especially as far as the analysis of the lea ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
The self-organizing map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as it is typically found in information retrieval applications. However, the interpretation of the map requires much manual effort, especially as far as the analysis of the learned features and the characteristics of identified clusters is concerned. In this paper we present our novel LabelSOM method which, based on the features learned by the map, automatically selects the most descriptive features of the input patterns mapped onto a particular unit of the map, thus making the associations between the various clusters within the map explicit. We demonstrate the benefits of this approach with examples from text classification using two different real-world document archives. In this particular case, the features correspond to keywords describing the contents of a document. The benefit of this approach is obvious in that the various document clusters are character...
Neural Network Agents for Learning Semantic Text Classification
- Information Retrieval
, 2000
"... The research project AgNeT develops Agents for Neural Text routing in the internet. Unrestricted potentially faulty text messages arrive at a certain delivery point (e.g. email address or world wide web address). These text messages are scanned and then distributed to one of several expert agents ac ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The research project AgNeT develops Agents for Neural Text routing in the internet. Unrestricted potentially faulty text messages arrive at a certain delivery point (e.g. email address or world wide web address). These text messages are scanned and then distributed to one of several expert agents according to a certain task criterium. Possible specific scenarios within this framework include the learning of the routing of publication titles or news titles. In this paper we describe extensive experiments for semantic text routing based on classified library titles and newswire titles.
Exploratory Analysis of Concept and Document Spaces with Connectionist Networks
- Artificial Intelligence and Law
, 1999
"... . Exploratory analysis is an area of increasing interest in the computational linguistics arena. Pragmatically speaking, exploratory analysis may be paraphrased as natural language processing by means of analyzing large corpora of text. Concerning the analysis, appropriate means are statistics, on t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
. Exploratory analysis is an area of increasing interest in the computational linguistics arena. Pragmatically speaking, exploratory analysis may be paraphrased as natural language processing by means of analyzing large corpora of text. Concerning the analysis, appropriate means are statistics, on the one hand, and artificial neural networks, on the other hand. As a challenging application area for exploratory analysis of text corpora we may certainly identify text databases, be it information retrieval or information filtering systems. With this paper we present recent findings of exploratory analysis based on both statistical and neural models applied to legal text corpora. Concerning the artificial neural networks, we rely on a model adhering to the unsupervised learning paradigm. This choice appears naturally when taking into account the specific properties of large text corpora where one is faced with the fact that input-output-mappings as required by supervised learning models ca...
Providing Topically Sorted Access to Subsequently Released Newspaper Editions or: How to Build Your Private Digital Library
- In Proc of the 11. Int'l Conf on Database and Expert Systems Applications (DEXA2000), Springer LNCS 1873
, 2000
"... . Self-organizing maps are a popular neural network model for presenting high-dimensional input data on a two-dimensional map, providing a particularly useful interface to electronic document collections. However, as the size of the training data increases, both the necessary computational power ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
. Self-organizing maps are a popular neural network model for presenting high-dimensional input data on a two-dimensional map, providing a particularly useful interface to electronic document collections. However, as the size of the training data increases, both the necessary computational power as well as the training time required exceed tolerable limits. Still more important, not all training data may be available in one central location but may rather be collected and managed at dierent repositories or released in subsequent periods of time. This paper describes an approach for combining independent, distributed self-organizing maps to build a higher order map, allowing the creation and maintenance of scalable, independent map systems, which can be built to suit the needs of individual users. This is achieved by training higher order maps using the trained lower order maps as input data. We demonstrate this approach by creating an integrated view of subsequent releas...
Data Mining in Large Free Text Document Archieves
- Proc. of the Int. Symposium on Cooperative Database Systems for Advanced Applications
, 1996
"... Document classification may be regarded as one of the central issues in information retrieval research during the last decades. The challenge of classification is to uncover the similarities between groups of data in order to improve the retrieval effectiveness of the overall system. From an explora ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Document classification may be regarded as one of the central issues in information retrieval research during the last decades. The challenge of classification is to uncover the similarities between groups of data in order to improve the retrieval effectiveness of the overall system. From an exploratory data analysis point of view the same process of classification may be used to gain insight in the structure of the various data items and may thus be referred to as data mining in text archives. In this paper we show the results from applying a neural network model, the hierarchical feature map, to such a data mining task. The neural network is carefully designed to impose a hierarchical structure on the underlying document collection which leads to straight-forward representation of data similarities. Apart from the benefit for text data mining, we are able to demonstrate that the hierarchical feature map leads to a tremendous speed-up of the training process as compared to more tradit...
Document Classification with Unsupervised Artificial Neural Networks
- IN F. CRESTANI, & G. PASI (EDS.), SOFT COMPUTING IN INFORMATION RETRIEVAL (PP. 102–121). WURZBURG (WIEN): PHYSICA-VERLAG
, 2000
"... Text collections may be regarded as an almost perfect application arena for unsupervised neural networks. This is because many operations computers have to perform on text documents are classification tasks based on noisy patterns. In particular we rely on self-organizing maps which produce a map of ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Text collections may be regarded as an almost perfect application arena for unsupervised neural networks. This is because many operations computers have to perform on text documents are classification tasks based on noisy patterns. In particular we rely on self-organizing maps which produce a map of the document space after their training process. From geography, however, it is known that maps are not always the best way to represent information spaces. For most applications it is better to provide a hierarchical view of the underlying data collection in form of an atlas where, starting from a map representing the complete data collection, different regions are shown at finer levels of granularity. Using an atlas, the user can easily "zoom" into regions of particular interest while still having general maps for overall orientation. We show that a similar display can be obtained by using hierarchical feature maps to represent the contents of a document archive. These neural networks have layerd architecture where each layer consists of a number of individual self-organizing maps. By this, the contents of the text archive may be represented at arbitrary detail while still having the general maps available for global orientation.

