Results 1 - 10
of
20
A survey of information retrieval and filtering methods
, 1995
"... We survey the major techniques for information retrieval. In the rst part, weprovide an overview of the traditional ones (full text scanning, inversion, signature les and clustering). In the second part we discuss attempts to include semantic information (natural language processing, latent semantic ..."
Abstract
-
Cited by 82 (0 self)
- Add to MetaCart
We survey the major techniques for information retrieval. In the rst part, weprovide an overview of the traditional ones (full text scanning, inversion, signature les and clustering). In the second part we discuss attempts to include semantic information (natural language processing, latent semantic indexing and neural networks).
Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms
- Journal of the American Society for Information Science
, 1995
"... Information retrieval using probabilistic techniques has at-tracted significant attention on the part of researchers in information and computer science over the past few de-cades. In the 198Os, knowledge-based techniques also made an impressive contribution to “intelligent ” informa-tion retrieval ..."
Abstract
-
Cited by 56 (9 self)
- Add to MetaCart
Information retrieval using probabilistic techniques has at-tracted significant attention on the part of researchers in information and computer science over the past few de-cades. In the 198Os, knowledge-based techniques also made an impressive contribution to “intelligent ” informa-tion retrieval and indexing. More recently, information sci-ence researchers have turned to other newer artificial-in-telligence-based inductive learning techniques including neural networks, symbolic learning, and genetic algo-rithms. These newer techniques, which are grounded on diverse paradigms, have provided great opportunities for researchers to enhance the information processing and re-trieval capabilities of current information storage and re-trieval systems. In this article, we first provide an overview of these newer techniques and their use in information science research. To familiarize readers with these tech-niques, we present three popular methods: the connec-tionist Hopfield network; the symbolic ID3/ID5R; and evolu-tion-based genetic algorithms. We discuss their knowl-edge representations and algorithms in the context of information retrieval. Sample implementation and testing results from our own research are also provided for each technique. We believe these techniques are promising in their ability to analyze user queries, identify users ’ infor-mation needs, and suggest alternatives for search. With proper user-system interactions, these methods can greatly complement the prevailing full-text, keyword-based, probabilistic, and knowledge-based techniques.
Self-Organizing Maps In Natural Language Processing
, 1997
"... Kohonen's Self-Organizing Map (SOM) is one of the most popular artificial neural network algorithms. Word category maps are SOMs that have been organized according to word similarities, measured by the similarity of the short contexts of the words. Conceptually interrelated words tend to fall into t ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
Kohonen's Self-Organizing Map (SOM) is one of the most popular artificial neural network algorithms. Word category maps are SOMs that have been organized according to word similarities, measured by the similarity of the short contexts of the words. Conceptually interrelated words tend to fall into the same or neighboring map nodes. Nodes may thus be viewed as word categories. Although no a priori information about classes is given, during the self-organizing process a model of the word classes emerges. The central topic of the thesis is the use of the SOM in natural language processing. The approach based on the word category maps is compared with the methods that are widely used in artificial intelligence research. Modeling gradience, conceptual change, and subjectivity of natural language interpretation are considered. The main application area is information retrieval and textual data mining for which a specific SOM-based method called the WEBSOM has been developed. The WEBSOM metho...
Effective information retrieval using genetic algorithms based matching function adaptation
- in Proceedings of the 33rd Hawaii International Conference on System Science (HICSS
, 2000
"... Knowledge intensive organizations have vast array of information contained in large document repositories. With the advent of E-commerce and corporate intranets/extranets, these repositories are expected to grow at a fast pace. This explosive growth has led to huge, fragmented, and unstructured docu ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
Knowledge intensive organizations have vast array of information contained in large document repositories. With the advent of E-commerce and corporate intranets/extranets, these repositories are expected to grow at a fast pace. This explosive growth has led to huge, fragmented, and unstructured document collections. Although it has become easier to collect and store information in document collections, it has become increasingly difficult to retrieve relevant information from these large document collections. This paper addresses the issue of improving retrieval performance (in terms of precision and recall) for retrieval from document collections. There are three important paradigms of research in the area of information retrieval (IR): Probabilistic IR, Knowledge-based IR, and, Artificial Intelligence based techniques like neural networks and symbolic learning. Very few researchers have tried to use evolutionary algorithms like genetic algorithms (GA's). Previous attempts at using GA's have concentrated on modifying document representations or modifying query representations. This work looks at the possibility of applying GA's to adapt various matching functions. It is hoped that such an adaptation of the matching functions will lead to a better retrieval performance than that obtained by using a single matching function. An overall matching function is treated as a weighted combination of scores produced by individual matching functions. This overall score is used to rank and retrieve documents. Weights associated with individual functions are searched using Genetic Algorithm. The idea is tested on a real document collection called the Cranfield collection. The results look very encouraging 1.
Verifying the Proximity Hypothesis for Self-Organizing Maps
- Journal of Management Information Systems
, 2000
"... The Kohonen Self-Organizing Map (SOM) is an unsupervised learning technique for summarizing high-dimensional data so that similar inputs are, in general, mapped close to each other. When applied to textual data, SOM has been shown to be able to group together related concepts in a data collection. T ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
The Kohonen Self-Organizing Map (SOM) is an unsupervised learning technique for summarizing high-dimensional data so that similar inputs are, in general, mapped close to each other. When applied to textual data, SOM has been shown to be able to group together related concepts in a data collection. This article presents research in which we sought to validate this property of SOM, called the Proximity Hypothesis, through a user evaluation study. Built upon our previous research in automatic concept generation and classification, we demonstrated that the Kohonen SOM was able to perform concept clustering effectively, based on its concept precision and recall scores judged by human experts. We believe this research has established the Kohonen SOM algorithm as an intuitively appealing and promising neural network based textual classification technique for addressing part of the long-standing “information overload ” problem.
Learning Similarity Functions in Information Retrieval
- EUFIT ‘98. 6th European Congress on Intelligent Techniques and Soft Computing
, 1998
"... Abstract: Most models for Information Retrieval (IR) using neural networks are simple spreading activation models. Some of them were successfully applied to real world document collections. Nevertheless, they do not exploit the subsymbolic paradigma of neural processing. In this paper a model using ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract: Most models for Information Retrieval (IR) using neural networks are simple spreading activation models. Some of them were successfully applied to real world document collections. Nevertheless, they do not exploit the subsymbolic paradigma of neural processing. In this paper a model using a simple backpropagation network for IR is proposed. The COSIMIR model implements the central process in IR. It is a backpropagation network which calculates the similarity between a document and a query representation. The similarity function is learned through examples. Hence, it implements a cognitive similarity function. The first evaluation demonstrates that COSIMIR works well for short vectors. 1
PAGE CONTENT RANK: AN APPROACH TO THE WEB CONTENT MINING
"... Methods of web data mining can be divided into several categories according to a kind of mined information and goals that particular categories set: Web structure mining (WSM), Web usage mining (WUM), and Web Content Mining (WCM). The objective of this paper is to propose a new WCM method of a page ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Methods of web data mining can be divided into several categories according to a kind of mined information and goals that particular categories set: Web structure mining (WSM), Web usage mining (WUM), and Web Content Mining (WCM). The objective of this paper is to propose a new WCM method of a page relevance ranking based on the page content exploration. The method, we call it Page Content Rank (PCR) in the paper, combines a number of heuristics that seem to be important for analysing the content of Web pages. The page importance is determined on the base of the importance of terms which the page contains. The importance of a term is specified with respect to a given query q and it is based on its statistical and linguistic features. As a source set of pages for mining we use a set of pages responded by a search engine to the query q. PCR uses a neural network as its inner classification structure. We describe an implementation of the proposed method and a comparison of its results with the other existing classification system – PageRank algorithm.
Using Background Contextual Knowledge for Documents Representation
, 1996
"... We describe our approach to document representation that captures contextual dependencies between terms in a corpus and makes use of these dependencies to represent documents. We have tried our representation scheme for automatic document categorisation on the Reuters' test set of documents. We ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We describe our approach to document representation that captures contextual dependencies between terms in a corpus and makes use of these dependencies to represent documents. We have tried our representation scheme for automatic document categorisation on the Reuters' test set of documents. We achieve a precision recall break even point of 84% which is comparable to the best known published results. Our approach acts as a feature selection technique that is an alternative to applying the techniques from machine learning and numerical taxonomy.
Information Filtering and Retrieval: An Overview
"... The areas of information retrieval(IR) and information filtering(IF) have become very active research domains. The problems created by the large increase of available online information, of which the vast majority is largely unstructured, have accentuated the need for effective mechanisms to separat ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The areas of information retrieval(IR) and information filtering(IF) have become very active research domains. The problems created by the large increase of available online information, of which the vast majority is largely unstructured, have accentuated the need for effective mechanisms to separate the relevant information from the irrelevant. This paper reviews the main approaches and systems used in IR and in the newer field of IF. The paper also includes an overview of systems which utilise social or collaborative filtering techniques to deal with the problem of information overload.
Neural Information Retrieval
, 1995
"... Keywords 41 As one would expect, the total amount of distinct words grows to an asymptote as the collection size is increased. The difference between the fields is where this asymptote lies. For the most typical free text field, the abstract, one can see that the number of words can grow very larg ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Keywords 41 As one would expect, the total amount of distinct words grows to an asymptote as the collection size is increased. The difference between the fields is where this asymptote lies. For the most typical free text field, the abstract, one can see that the number of words can grow very large and does not level off very fast 17 . The same is true of the title field. Not only are the absolute numbers beyond the capabilities of the computing machinery which we had available, it is also likely that the steepness of the growth will cause a problem in the case of extension of a collection. The experiments that will be described below were conducted with the terms from the keyword-field. This field levelled off at about 2500 distinct terms. 7.2 reducing the number of terms For effective clustering the terms used have to be as semantically discriminative as possible. However, it would be problematic to select important index terms manually, especially because the importance of term...

