Results 1 - 10
of
83
WEBSOM - Self-Organizing Maps of Document Collections
- Neurocomputing
, 1997
"... Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable efforts have been made to develop alternative methods, but their practical applicability has been l ..."
Abstract
-
Cited by 121 (14 self)
- Add to MetaCart
Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable efforts have been made to develop alternative methods, but their practical applicability has been low. Powerful methods are needed for the exploration of miscellaneous document collections. The WEBSOM method organizes a document collection on a map display that provides an overview of the collection and facilitates interactive browsing. Interesting documents can be retrieved by a content addressable search of interesting map locations. The interesting locations could also be marked as filters for collecting interesting new documents.
Relational Learning Techniques for Natural Language Information Extraction
, 1998
"... The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a t ..."
Abstract
-
Cited by 73 (4 self)
- Add to MetaCart
The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves specific types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain specific information, making them difficult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This paper presents a novel rule representation specific to natural language and a learning system, Rapier, which learns information extraction rules. Rapier takes pairs of documents and filled templates indicating the information to be ext...
Newsgroup Exploration with WEBSOM Method and Browsing Interface
, 1996
"... The current availability of large collections of full-text documents in electronic form emphasizes the need for intelligent information retrieval techniques. Especially in the rapidly growing World Wide Web it is important to have methods for exploring miscellaneous document collections automaticall ..."
Abstract
-
Cited by 66 (20 self)
- Add to MetaCart
The current availability of large collections of full-text documents in electronic form emphasizes the need for intelligent information retrieval techniques. Especially in the rapidly growing World Wide Web it is important to have methods for exploring miscellaneous document collections automatically. In the report, we introduce the WEBSOM method for this task. Self-Organizing Maps (SOMs) are used to position encoded documents onto a map that provides a general view into the text collection. The general view visualizes similarity relations between the documents on a map display, which can be utilized in exploring the material rather than having to rely on traditional search expressions. Similar documents become mapped close to each other. The potential of the WEBSOM method is demonstrated in a case study where articles from the Usenet newsgroup "comp.ai.neural-nets" are organized. The map is available for exploration at the WWW address http://websom.hut.fi/websom/
Self-Organizing Maps of Document Collections: A New Approach to Interactive Exploration
, 1996
"... Powerful methods for interactive exploration and search from collections of free-form textual documents are needed to manage the ever-increasing flood of digital information. In this article we present a method, WEBSOM, for automatic organization of full-text document collections using the self-orga ..."
Abstract
-
Cited by 57 (15 self)
- Add to MetaCart
Powerful methods for interactive exploration and search from collections of free-form textual documents are needed to manage the ever-increasing flood of digital information. In this article we present a method, WEBSOM, for automatic organization of full-text document collections using the self-organizing map (SOM) algorithm. The document collection is ordered onto a map in an unsupervised manner utilizing statistical information of short word contexts. The resulting ordered map where similar documents lie near each other thus presents a general view of the document space. With the aid of a suitable (WWWbased) interface, documents in interesting areas of the map can be browsed. The browsing can also be interactively extended to related topics, which appear in nearby areas on the map. Along with the method we present a case study of its use. Keywords: data visualization, document organization, full-text analysis, interactive exploration, self-organizing map. Introduction Finding relev...
Subsymbolic case-role analysis of sentences with embedded clauses
- Cognitive Science
, 1996
"... A distributed neural network model called SPEC for processing sentences with recursive relative clauses is described. The model is based on separating the tasks of segmenting the input word sequence into clauses, forming the case-role representations, and keeping track of the recursive embeddings in ..."
Abstract
-
Cited by 48 (6 self)
- Add to MetaCart
A distributed neural network model called SPEC for processing sentences with recursive relative clauses is described. The model is based on separating the tasks of segmenting the input word sequence into clauses, forming the case-role representations, and keeping track of the recursive embeddings into di erent modules. The system needs to be trained only with the basic sentence constructs, and it generalizes not only to new instances of familiar relative clause structures, but to novel structures as well. SPEC exhibits plausible memory degradation as the depth of the center embeddings increases, its memory is primed by earlier constituents, and its performance is aided by semantic constraints between the constituents. The ability to process structure is largely due to a central executive network that monitors and controls the execution of the entire system. This way, in contrast to earlier subsymbolic systems, parsing is modeled as a controlled high-level process rather than one based on automatic re ex responses. 1
Very Large Two-Level SOM for the Browsing of Newsgroups
, 1996
"... . On January 19, 1996 we published in the Internet a demo of how to use Self-Organizing Maps (SOMs) for the organization of large collections of full-text files. Later we added other newsgroups to the demo. It can be found at the address http://websom.hut.fi/websom/. In the present paper we describe ..."
Abstract
-
Cited by 45 (13 self)
- Add to MetaCart
. On January 19, 1996 we published in the Internet a demo of how to use Self-Organizing Maps (SOMs) for the organization of large collections of full-text files. Later we added other newsgroups to the demo. It can be found at the address http://websom.hut.fi/websom/. In the present paper we describe the main features of this system, called the WEBSOM, as well as some newer developments of it. 1 Introduction When organizing large collections of free-form full-text document files that contain no keywords, e.g. the newsgroups in the Internet, it is difficult to base their analysis on traditional search expressions. The main information one can resort to in the classification of such documents is statistical. SOMs of document collections have previously been constructed on the basis of their word histograms (published works are [5], [6], [7], [10], [11], [12]). Thereby, however, the size of the selected vocabulary cannot be large. In other studies (cf., e.g., [1], [2], [8], [9], [11], and...
Exploration of Text Collections with Hierarchical Feature Maps
, 1997
"... Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this pape ..."
Abstract
-
Cited by 37 (14 self)
- Add to MetaCart
Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this paper we show the results from using a hierarchy of self-organizing maps to perform the text classification task. Each of the individual self-organizing maps is trained independently and gets specialized to a subset of the input data. As a consequence, the choice of this particular artificial neural network model enables the true establishment of a document taxonomy. The benefit of this approach is a straightforward representation of document similarities combined with dramatically reduced training time. In particular, the hierarchical representation of document collections is appealing because it is the underlying organizational principle in use by librarians providing the necessary familiarity...
Creating an Order in Digital Libraries with Self-Organizing Maps
, 1996
"... Formulation of suitable search expressions for information retrieval from large full-text databases may currently require considerable efforts. Changing the scope of the search when, e.g., too many or too few hits have been obtained, requires re-formulation of the search expression. For an alternati ..."
Abstract
-
Cited by 36 (15 self)
- Add to MetaCart
Formulation of suitable search expressions for information retrieval from large full-text databases may currently require considerable efforts. Changing the scope of the search when, e.g., too many or too few hits have been obtained, requires re-formulation of the search expression. For an alternative scheme we suggest an explorative full-text information retrieval method, where the Self-Organizing Map (SOM) algorithm is used to order documents based on their full textual contents. The visualized order can then be utilized for an explorative search or exploration of novel knowledge areas, whereby the scope can be changed interactively. The ordering of the documents is achieved by a two-level analysis: First, word categories are extracted from the text by a "semantic" SOM. Second, the textual context of the documents is encoded on the basis of the histograms of words formed on the word category map. 1 Introduction The information age is characterized by an uncontrolled flood of miscel...
Self-Organizing Maps In Natural Language Processing
, 1997
"... Kohonen's Self-Organizing Map (SOM) is one of the most popular artificial neural network algorithms. Word category maps are SOMs that have been organized according to word similarities, measured by the similarity of the short contexts of the words. Conceptually interrelated words tend to fall into t ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
Kohonen's Self-Organizing Map (SOM) is one of the most popular artificial neural network algorithms. Word category maps are SOMs that have been organized according to word similarities, measured by the similarity of the short contexts of the words. Conceptually interrelated words tend to fall into the same or neighboring map nodes. Nodes may thus be viewed as word categories. Although no a priori information about classes is given, during the self-organizing process a model of the word classes emerges. The central topic of the thesis is the use of the SOM in natural language processing. The approach based on the word category maps is compared with the methods that are widely used in artificial intelligence research. Modeling gradience, conceptual change, and subjectivity of natural language interpretation are considered. The main application area is information retrieval and textual data mining for which a specific SOM-based method called the WEBSOM has been developed. The WEBSOM metho...

