Results 1 - 10
of
28
Projections for Efficient Document Clustering
, 1997
"... Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the distance calculations at the heart of clustering routines. We study two techniques for improving the ..."
Abstract
-
Cited by 86 (0 self)
- Add to MetaCart
Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the distance calculations at the heart of clustering routines. We study two techniques for improving the cost of distance calculations, LSI and truncation, and determine both how much these techniques speed up clustering and how much they affect the quality of the resulting clusters. We find that the speed increase is significant while --- surprisingly --- the quality of clustering is not adversely affected. We conclude that truncation yields clusters as good as those produced by full-profile clustering while offering a significant speed advantage.
A language modeling framework for resource selection and results merging
- IN CIKM 2002
, 2002
"... Statistical language models have been proposed recently for several information retrieval tasks, including the resource selection task in distributed information retrieval. This paper extends the language modeling approach to integrate resource selection, ad-hoc searching, and merging of results fro ..."
Abstract
-
Cited by 60 (5 self)
- Add to MetaCart
Statistical language models have been proposed recently for several information retrieval tasks, including the resource selection task in distributed information retrieval. This paper extends the language modeling approach to integrate resource selection, ad-hoc searching, and merging of results from different text databases into a single probabilistic retrieval model. This new approach is designed primarily for Intranet environments, where it is reasonable to assume that resource providers are relatively homogeneous and can adopt the same kind of search engine. Experiments demonstrate that this new, integrated approach is at least as effective as the prior state-of-the-art in distributed IR.
A Taxonomy of Recommender Agents on the Internet
- ARTIFICIAL INTELLIGENCE REVIEW
, 2003
"... Recently, Artificial Intelligence techniques have proved useful in helping users to handle the large amount of information on the Internet. The idea of personalized search engines, intelligent software agents, and recommender systems has been widely accepted among users who require assistance in sea ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
Recently, Artificial Intelligence techniques have proved useful in helping users to handle the large amount of information on the Internet. The idea of personalized search engines, intelligent software agents, and recommender systems has been widely accepted among users who require assistance in searching, sorting, classifying, filtering and sharing this vast quantity of information. In this paper, we present a state-of-the-art taxonomy of intelligent recommender agents on the Internet. We have analyzed 37 different systems and their references and have sorted them into a list of 8 basic dimensions. These dimensions are then used to establish a taxonomy under which the systems analyzed are classified. Finally, we conclude this paper with a cross-dimensional analysis with the aim of providing a starting point for researchers to construct their own recommender system.
A Semisupervised Learning Method to Merge Search Engine Results
- ACM Transactions on Information Systems
, 2003
"... This article presents a semisupervised learning solution to the result merging problem. The key contribution is the observation that information used to create resource descriptions for resource selection can also be used to create a centralized sample database to guide the normalization of document ..."
Abstract
-
Cited by 34 (8 self)
- Add to MetaCart
This article presents a semisupervised learning solution to the result merging problem. The key contribution is the observation that information used to create resource descriptions for resource selection can also be used to create a centralized sample database to guide the normalization of document scores returned by different databases. At retrieval time, the query is sent to the selected databases, which return database-specific document scores, and to a centralized sample database, which returns database-independent document scores. Documents that have both a database-specific score and a database-independent score serve as training data for learning to normalize the scores of other documents. An extensive set of experiments demonstrates that this method is more effective than the well-known CORI result-merging algorithm under a variety of conditions
Using Sampled Data and Regression to Merge Search Engine Results
- In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
, 2002
"... This paper addresses the problem of merging results obtained from different databases and search engines in a distributed information retrieval environment. The prior research on this problem either assumed the exchange of statistics necessary for normalizing scores (cooperative solutions) or is heu ..."
Abstract
-
Cited by 32 (10 self)
- Add to MetaCart
This paper addresses the problem of merging results obtained from different databases and search engines in a distributed information retrieval environment. The prior research on this problem either assumed the exchange of statistics necessary for normalizing scores (cooperative solutions) or is heuristic. Both approaches have disadvantages.
A stemming procedure and stopword list for general French corpora
- Journal of the American Society for Information Science
, 1999
"... to appear in ..."
Efficient Semantic-Based Content Search in P2P Network
- IEEE Transactions on Knowledge and Data Engineering
, 2004
"... Abstract — Most existing Peer-to-Peer (P2P) systems support only title-based searches and are limited in functionality when compared to today’s search engines. In this paper, we present the design of a distributed P2P information sharing system that supports semantic-based content searches of releva ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Abstract — Most existing Peer-to-Peer (P2P) systems support only title-based searches and are limited in functionality when compared to today’s search engines. In this paper, we present the design of a distributed P2P information sharing system that supports semantic-based content searches of relevant documents. First, we propose a general and extensible framework for searching similar documents in P2P network. The framework is based on the novel concept of Hierarchical Summary Structure. Second, based on the framework, we develop our efficient document searching system, by effectively summarizing and maintaining all documents within the network with different granularity. Finally, an experimental study is conducted on a real P2P prototype, and a large-scale network is further simulated. The results show the effectiveness, efficiency and scalability of the proposed system. Index Terms — content-based, similarity search, peer-to-peer, hierarchical summary, indexing
Modeling Search Engine Effectiveness for Federated Search
"... Federated search links multiple search engines into a single, virtual search system. Most prior research of federated search focused on selecting search engines that have the most relevant contents, but ignored the retrieval effectiveness of individual search engines. This omission can cause serious ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Federated search links multiple search engines into a single, virtual search system. Most prior research of federated search focused on selecting search engines that have the most relevant contents, but ignored the retrieval effectiveness of individual search engines. This omission can cause serious problems when federating search engines of different qualities.
Report on clef-2002 experiments: Combining multiple sources of evidence
- Proceedings CLEF-2002
, 2002
"... Abstract. For our second participation in the CLEF retrieval tasks, our first objective was to propose better and more general stopword lists for various European languages (namely, French, Italian, German, Spanish and Finnish) along with improved, simpler and efficient stemming procedures. Our seco ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
Abstract. For our second participation in the CLEF retrieval tasks, our first objective was to propose better and more general stopword lists for various European languages (namely, French, Italian, German, Spanish and Finnish) along with improved, simpler and efficient stemming procedures. Our second goal was to propose a combined query-translation approach that could cross language barriers and also an effective merging strategy based on logistic regression for accessing the multilingual collection. Finally, within the Amaryllis experiment, we wanted to analyze how a specialized thesaurus might improve retrieval effectiveness.

