Results 1 -
6 of
6
LabelSOM: On the Labeling of Self-Organizing Maps
- In Proc. International Joint Conference on Neural Networks
, 1999
"... Self-organizing maps are a prominent unsupervised neural network model providing cluster analysis of highdimensional input data. However, in spite of enhanced visualization techniques for self-organizing maps, interpreting a trained map proves to be difficult because the features responsible for a s ..."
Abstract
-
Cited by 43 (14 self)
- Add to MetaCart
Self-organizing maps are a prominent unsupervised neural network model providing cluster analysis of highdimensional input data. However, in spite of enhanced visualization techniques for self-organizing maps, interpreting a trained map proves to be difficult because the features responsible for a specific cluster assignment are not evident from the resulting map representation. In this paper we present our LabelSOM approach for automatically labeling a trained selforganizing map with the features of the input data that are the most relevant ones for the assignment of a set of input data to a particular cluster. The resulting labeled map allows the user to understand the structure and the information available in the map and the reason for a specific map organization, especially when only little prior information on the data set and its characteristics is available. We demonstrate the applicability of the LabelSOM method in the field of data mining providing an example from real world...
Uncovering the Hierarchical Structure of Text Archives by Using an Unsupervised Neural Network with Adaptive Architecture
- in: Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD’2000
, 2000
"... . Discovering the inherent structure in data has become one of the major challenges in data mining applications. It requires the development of stable and adaptive models that are capable of handling the typically very high-dimensional feature spaces. In this paper we present the Growing Hierarchica ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
. Discovering the inherent structure in data has become one of the major challenges in data mining applications. It requires the development of stable and adaptive models that are capable of handling the typically very high-dimensional feature spaces. In this paper we present the Growing Hierarchical Self-Organizing Map (GH-SOM), a neural network model based on the self-organizing map. The main feature of this extended model is its capability of growing both in terms of map size as well as in a three-dimensional tree-structure in order to represent the hierarchical structure present in a data collection. This capability, combined with the stability of the self-organizing map for high-dimensional feature space representation, makes it an ideal tool for data analysis and exploration. We demonstrate the potential of this method with an application from the information retrieval domain, which is prototypical of the high-dimensional feature spaces frequently encountered in toda...
Uncovering Associations Between Documents
- In Proc. International Joint Conference on Artificial Intelligence (IJCAI99
, 1999
"... The self-organizing map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as it is typically found in information retrieval applications. However, the interpretation of the map requires much manual effort, especially as far as the analysis of the lea ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
The self-organizing map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as it is typically found in information retrieval applications. However, the interpretation of the map requires much manual effort, especially as far as the analysis of the learned features and the characteristics of identified clusters is concerned. In this paper we present our novel LabelSOM method which, based on the features learned by the map, automatically selects the most descriptive features of the input patterns mapped onto a particular unit of the map, thus making the associations between the various clusters within the map explicit. We demonstrate the benefits of this approach with examples from text classification using two different real-world document archives. In this particular case, the features correspond to keywords describing the contents of a document. The benefit of this approach is obvious in that the various document clusters are character...
Text Mining Using HMM and PPM
, 2001
"... Text mining involves the use of statistical and machine learning techniques to learn structural elements of text in order to search for useful information in previously unseen text. The need for these techniques have emerged out of the rapidly growing information era. Token identification is an impo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Text mining involves the use of statistical and machine learning techniques to learn structural elements of text in order to search for useful information in previously unseen text. The need for these techniques have emerged out of the rapidly growing information era. Token identification is an important component of any text mining tool. The accomplishment of this task enhances the function of diverse applications involving searching for patterns in textual data. Several different identification methods have been reported in the literature. HMMs and PPM models have been successfully used in language processing tasks. They have also been applied separately to learning-based token identification. Most of the existing systems are domain- and language-dependent. In this thesis, we implement a system that bridges the two well known methods through words new to the identification model. The system is fully domain- and language-independent. No changes of code are necessary when applying to other domains or languages. The only thing required is an annotated corpus. The system has been tested on two corpora and achieved an overall F-measure of 76:59% for TCC, and 69:02% for BIB. This is not as good as would be expected from a system which includes language-dependent components. However, our system is more generalized. The identification of date has the best result, 73% and 92% of correct tokens are identified respectively. The system also performs reasonably well on people's name with correct tokens of 68% for TCC, and 76% for BIB. ii Acknowledgements During the time of my MPhil. study, I have been so lucky to have had a huge amount of help in academic, financial and personal from a number of people. First and foremost, I would like to thank my chief supervisor, Ian Witte...
Knowledge Discovery in Literature Data Bases
, 1998
"... . The concept of knowledge discovery as defined through "establishing previously unknown and unsuspected relations of features in a data base" is, cum grano salis, relatively easy to implement for data bases containing numerical data. Increasingly we find at our disposal data bases containing scient ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
. The concept of knowledge discovery as defined through "establishing previously unknown and unsuspected relations of features in a data base" is, cum grano salis, relatively easy to implement for data bases containing numerical data. Increasingly we find at our disposal data bases containing scientific literature. Computer assisted detection of unknown relations of features in such data bases would be extremely valuable and would lead to new scientific insights. However, the current representation of scientific knowledge in such data bases is not conducive to computer processing. Any correlation of features still has to be done by the human reader, a process which is plagued by ine#ectiveness and incompleteness. On the other hand we note that considerable progress is being made in an area where reading all available material is totally prohibitive: the World Wide Web. Robots and Web crawlers mine the Web continuously and construct data bases which allow the identification of pages of ...
Methods in Biomedical Text Mining
, 2008
"... Methods to improve text mining of molecular biology interactions are needed to capture a richer information space and qualify the quality of extraction. Simple interaction models fail to describe contextual and confidence information that would help with more fine-grained analyses. Herein a method ..."
Abstract
- Add to MetaCart
Methods to improve text mining of molecular biology interactions are needed to capture a richer information space and qualify the quality of extraction. Simple interaction models fail to describe contextual and confidence information that would help with more fine-grained analyses. Herein a method is presented to streamline curation of text-mined data and a way to improve text mining of biomedical terms that can be adapted to other domains using different machine learning techniques. These advances can be integrated into more powerful text-mining systems to meet user demand and to further promote the adoption of text-mining tools. Additionally, three studies on the nature of biomedical publications are presented: their novelty hinges on the fact that each asks questions that had not been posed before. They cover the phenomena of retraction, ways to improve the impact of research, and the writing style used in biomedical literature. Retraction is a hot topic in recent times

