• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

W.B.: Information filtering and information retrieval: two sides of the same coin (1992)

by N J Belkin, Croft
Venue:ACM Commun
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 223
Next 10 →

Machine Learning in Automated Text Categorization

by Fabrizio Sebastiani, Consiglio Nazionale Delle Ricerche - ACM Computing Surveys , 2002
"... The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this p ..."
Abstract - Cited by 839 (13 self) - Add to MetaCart
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.

Using Linear Algebra for Intelligent Information Retrieval

by Susan T. Dumais, Michael Berry, Michael W. Berry, Susan, T. Dumais - SIAM Review , 1995
"... . Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users' requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical me ..."
Abstract - Cited by 450 (14 self) - Add to MetaCart
. Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users' requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical methods are necessarily incomplete and imprecise. Using the singular value decomposition (SVD), one can take advantage of the implicit higher-order structure in the association of terms with documents by determining the SVD of large sparse term by document matrices. Terms and documents represented by 200-300 of the largest singular vectors are then matched against user queries. We call this retrieval method Latent Semantic Indexing (LSI) because the subspace represents important associative relationships between terms and documents that are not evident in individual documents. LSI is a completely automatic yet intelligent indexing method, widely applicable, and a promising way to improve users...

Efficient Filtering of XML Documents for Selective Dissemination of Information

by Mehmet Altınel , 2000
"... Information Dissemination applications are gaining increasing popularity due to dramatic improvements in communications bandwidth and ubiquity. The sheer volume of data available necessitates the use of selective approaches to dissemination in order to avoid overwhelming users with unnecessaryi ..."
Abstract - Cited by 272 (13 self) - Add to MetaCart
Information Dissemination applications are gaining increasing popularity due to dramatic improvements in communications bandwidth and ubiquity. The sheer volume of data available necessitates the use of selective approaches to dissemination in order to avoid overwhelming users with unnecessaryinformation. Existing mechanisms for selective dissemination typically rely on simple keyword matching or "bag of words" information retrieval techniques. The advent of XML as a standard for information exchangeand the development of query languages for XML data enables the development of more sophisticated filtering mechanisms that take structure information into account. We have developed several index organizations and search algorithms for performing efficient filtering of XML documents for large-scale information dissemination systems. In this paper we describe these techniques and examine their performance across a range of document, workload, and scale scenarios. 1

Combining Collaborative Filtering with Personal Agents for Better Recommendations

by Nathaniel Good, J. Ben Schafer, Joseph A. Konstan, Al Borchers, Badrul Sarwar, Jon Herlocker, John Riedl - In Proceedings of the Sixteenth National Conference on Artificial Intelligence , 1999
"... Information filtering agents and collaborative filtering both attempt to alleviate information overload by identifying which items a user will find worthwhile. Information filtering (IF) focuses on the analysis of item content and the development of a personal user interest profile. Collaborati ..."
Abstract - Cited by 178 (10 self) - Add to MetaCart
Information filtering agents and collaborative filtering both attempt to alleviate information overload by identifying which items a user will find worthwhile. Information filtering (IF) focuses on the analysis of item content and the development of a personal user interest profile. Collaborative filtering (CF) focuses on identification of other users with similar tastes and the use of their opinions to recommend items. Each technique has advantages and limitations that suggest that the two could be beneficially combined. This paper shows that a CF framework can be used to combine personal IF agents and the opinions of a community of users to produce better recommendations than either agents or users can produce alone. It also shows that using CF to create a personal combination of a set of agents produces better results than either individual agents or other combination mechanisms. One key implication of these results is that users can avoid having to select among ag...

Ontology-Based Integration of Information - A Survey of Existing Approaches

by H. Wache, T. Vögele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, S. Hübner , 2001
"... We review the use on ontologies for the integration of heterogeneous information sources. Based on an in-depth evaluation of existing approaches to this problem we discuss how ontologies are used to support the integration task. We evaluate and compare the languages used to represent the ontologies ..."
Abstract - Cited by 171 (1 self) - Add to MetaCart
We review the use on ontologies for the integration of heterogeneous information sources. Based on an in-depth evaluation of existing approaches to this problem we discuss how ontologies are used to support the integration task. We evaluate and compare the languages used to represent the ontologies and the use of mappings between ontologies as well as to connect ontologies with information sources. We also enquire into ontology engineering methods and tools used to develop ontologies for information integration. Based on the results of our analysis we summarize the state-of-the-art in ontology-based information integration and name areas of further research activities.

Information Filtering Based on User Behavior Analysis

by Masahiro Morita, Yoichi Shinoda - In Japanese). Master’s thesis, School of Information Science, Japan Advanced Institute of Science and Technology , 1994
"... Information filtering systems have potential power that may provide an efficient means of navigating through large and diverse data space. However, current information filtering technology heavily depends on a user’s active participation for describing the user’s interest to information items, forci ..."
Abstract - Cited by 129 (0 self) - Add to MetaCart
Information filtering systems have potential power that may provide an efficient means of navigating through large and diverse data space. However, current information filtering technology heavily depends on a user’s active participation for describing the user’s interest to information items, forcing the user to accept extra load to overcome thealready loaded situation. Fumhemo~, because theuser's interests weoften expressed indiscrete fomat such as a set of keywords sometimes augmented with if-then rules, it is difficult to express ambiguous interests, which users often want to do. We propose a technique that uses user behavior monitonng to transparently capture the user’sinterest in information, andatechnique to use this interest to fikerincoming information in avery efficient way. The proposed techniques are verified to perform very well by having conducted a field experiment and a series of simulation. 1

Combining the Evidence of Multiple Query Representations for Information Retrieval

by N. J. Belkin, P. Kantor - Information Processing & Management , 1995
"... Abstract-We report on two studies in the TREC-2 program that investigated the effect on retrieval performance of combination of multiple representations of TREC topics. In one of the projects, five separate Boolean queries for each of the 50 TREC routing topics and 25 of the TREC ad hoc topics were ..."
Abstract - Cited by 108 (7 self) - Add to MetaCart
Abstract-We report on two studies in the TREC-2 program that investigated the effect on retrieval performance of combination of multiple representations of TREC topics. In one of the projects, five separate Boolean queries for each of the 50 TREC routing topics and 25 of the TREC ad hoc topics were generated by 75 experienced online searchers. Using the INQUERY retrieval system, these queries were both combined into single queries, and used to produce five separate retrieval results for each topic. In the former case, progressive combination of queries led to progressively improving retrieval performance, significantly better than that of single queries, and at least as good as the best individual single-query formulations. In the latter case, data fusion of the ranked lists also led to performance better than that of any single list. In the second project, two automatically produced vector queries and three versions of a manually produced P-norm extended Boolean query for each routing and ad hoc topic were compared and combined. This project investigated six different methods of combination of queries, and the combination of the same queries on different databases. As in the first project, progressive combination led to progressively improving results, with the best results, on average, being achieved by combination through summing of retrieval status values. Both projects found that the best method of combination often led to results that were better than the best performing single query. The combined results from the two projects have also been combined by data fusion. The results of this procedure show that combining evidence from completely different systems also leads to performance improvement. 1.

Using Filtering Agents to Improve Prediction Quality in the GroupLens Research Collaborative Filtering System

by Badrul M. Sarwar, Joseph A. Konstan, Al Borchers, Jon Herlocker, Brad Miller, John Riedl , 1998
"... Collaborative filtering systems help address information overload by using the opinions of users in a community to make personal recommendations for documents to each user. Many collaborative filtering systems have few user opinions relative to the large number of documents available. This sparsity ..."
Abstract - Cited by 106 (11 self) - Add to MetaCart
Collaborative filtering systems help address information overload by using the opinions of users in a community to make personal recommendations for documents to each user. Many collaborative filtering systems have few user opinions relative to the large number of documents available. This sparsity problem can reduce the utility of the filtering system by reducing the number of documents for which the system can make recommendations and adversely affecting the quality of recommendations. This paper defines and implements a model for integrating content-based ratings into a collaborative filtering system. The filterbot model allows collaborative filtering systems to address sparsity by tapping the strength of content filtering techniques. We identify and evaluate metrics for assessing the effectiveness of filterbots specifically, and filtering system enhancements in general. Finally, we experimentally validate the filterbot approach by showing that even simple filterbots such as spell ...

Path Sharing and Predicate Evaluation for High-Performance XML Filtering

by Yanlei Diao, Mehmet Altinel, Michael J. Franklin, Hao Zhang, Peter Fischer - ACM TRANS. DATABASE SYST , 2003
"... ... In this paper we first describe the XFilter and YFilter approaches and present results of a detailed performance comparison of structure matching for these algorithms as well as a hybrid approach. The results show that the path sharing employed by YFilter can provide order-of-magnitude performan ..."
Abstract - Cited by 105 (5 self) - Add to MetaCart
... In this paper we first describe the XFilter and YFilter approaches and present results of a detailed performance comparison of structure matching for these algorithms as well as a hybrid approach. The results show that the path sharing employed by YFilter can provide order-of-magnitude performance benefits. We then propose two alternative techniques for extending YFilter's shared structure matching with support for valuebased predicates, and compare the performance of these two techniques. The results of this latter study demonstrate some key differences between shared XML filtering and traditional database query processing. Finally, we describe how the YFilter approach is extended to handle more complicated queries containing nested path expressions.

Information Extraction as a Basis for High-Precision Text Classification

by Ellen Riloff, Wendy Lehnert - ACM Transactions on Information Systems , 1994
"... this article. For the purpose of text classification, the answer keys serve only as a set of correct classifications for each text. If a text has instantiated key templates associated with it in the corpus, then it should be classified as a relevant text. If a text has no instantiated key templates ..."
Abstract - Cited by 102 (5 self) - Add to MetaCart
this article. For the purpose of text classification, the answer keys serve only as a set of correct classifications for each text. If a text has instantiated key templates associated with it in the corpus, then it should be classified as a relevant text. If a text has no instantiated key templates associated with it (i.e., only a dummy template) then it should be classified as an irrelevant text. This is a binary classification problem: a text is either relevant to the terrorism domain or irrelevant. The texts were selected by keyword search from a database of newswire articles 2 because they contained words associated with terrorism. However, many of them did not mention any relevant terrorist incidents. Of the 1700 texts in the MUC4 corpus, only 53% described a relevant terrorist event. Because many of the texts in the corpus were irrelevant, the MUC-4 systems had to distinguish the relevant from the irrelevant texts. Although the MUC-4 task was information extraction, information detection 4 (i.e, text classification) was an implicit subtask. To be successful in MUC-4, the information extraction systems also had to be good at detection. Our MUC-4 system did not use a separate text classification module. Instead, we extracted information from every text and relied on a discourse analysis module to discard irrelevant templates. This strategy was very effective, 5 but it was expensive. A reliable text classification module could have filtered out irrele- 1MUC-3 was the Third Message Understanding ConferenCe held in 1991 [MUC-3 Proceedings 19911
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University