Results 1 - 10
of
22
Searching Distributed Collections With Inference Networks
- IN PROCEEDINGS OF THE 18TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL
, 1995
"... The use of information retrieval systems in networked environments raises a new set of issues that have received little attention. These issues include ranking document collections for relevance to a query, selecting the best set of collections from a ranked list, and merging the document rankings t ..."
Abstract
-
Cited by 359 (31 self)
- Add to MetaCart
The use of information retrieval systems in networked environments raises a new set of issues that have received little attention. These issues include ranking document collections for relevance to a query, selecting the best set of collections from a ranked list, and merging the document rankings that are returned from a set of collections. This paper describes methods of addressing each issue in the inference network model, discusses their implementation in the INQUERY system, and presents experimental results demonstrating their effectiveness.
Semantic File Systems
- IN 13TH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES
, 1991
"... A semantic file system is an information storage system that provides flexible associative access to the system's contents by automatically extracting attributes from files with file type specific transducers. Associative access is provided by a conservative extension to existing tree-structured fil ..."
Abstract
-
Cited by 200 (4 self)
- Add to MetaCart
A semantic file system is an information storage system that provides flexible associative access to the system's contents by automatically extracting attributes from files with file type specific transducers. Associative access is provided by a conservative extension to existing tree-structured file system protocols, and by protocols that are designed specifically for content based access. Compatibility with existing file system protocols is provided by introducing the concept of a virtual directory. Virtual directory names are interpreted as queries, and thus provide flexible associative access to files and directories in a manner compatible with existing software. Rapid attribute-based access to file system contents is implemented by automatic extraction and indexing of key properties of file system objects.
The automatic indexing of files and directories is called "semantic" because user programmable transducers use information about the semantics of updated file system objects to extract the properties for indexing. Experimental results from a semantic file system implementation support the thesis that semantic file systems present a more effective storage abstraction than do traditional tree structured file systems for information sharing and command level programming.
Automatic Discovery of Language Models for Text Databases
- In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data
, 1999
"... The proliferation of text databases within large organizations and on the Internet makes it difficult for a person to know which databases to search. Given language models that describe the contents of each database, a database selection algorithm such as GlOSS can provide assistance by automatica ..."
Abstract
-
Cited by 104 (7 self)
- Add to MetaCart
The proliferation of text databases within large organizations and on the Internet makes it difficult for a person to know which databases to search. Given language models that describe the contents of each database, a database selection algorithm such as GlOSS can provide assistance by automatically selecting appropriate databases for an information need. Current practice is that each database provides its language model upon request, but this cooperative approach has important limitations. This paper demonstrates that cooperation is not required. Instead, the database selection service can construct its own language models by sampling database contents via the normal process of running queries and retrieving documents. Although random sampling is not possible, it can be approximated with carefully selected queries. This sampling approach avoids the limitations that characterize the cooperative approach, and also enables additional capabilities. Experimental results demonstrate th...
Server Ranking for Distributed Text Retrieval Systems on the Internet
- In Proceedings of the Fifth International Conference on Database Systems for Advanced Applications
, 1997
"... Keyword-based search services have become necessary tools for finding information resources on the Internet today. The rapid growth of information on the Internet renders centralized keyword index services incapable of collecting comprehensive resource meta-data in a timely manner. We argue that del ..."
Abstract
-
Cited by 68 (4 self)
- Add to MetaCart
Keyword-based search services have become necessary tools for finding information resources on the Internet today. The rapid growth of information on the Internet renders centralized keyword index services incapable of collecting comprehensive resource meta-data in a timely manner. We argue that delegating the task of meta-data collection to local index servers is a more scalable approach. We propose a mechanism for integrating distributed autonomous index servers into a cooperative resource discovery system. Focusing on the retrieval effectiveness of the system, we propose a set of methods, called CVV-based methods, for ranking and selecting index servers with respect to a query, and merging the results returned by the index servers. Through experiments, we evaluate the effectiveness of the CVV-based methods, and compare our server ranking method with methods proposed by other researchers. Keywords information retrieval, internet data-- bases. 1 Introduction With the rapid growth ...
Index Structures for Selective Dissemination of Information
, 1992
"... The number, size, and user population of bibliographic and full text document databases are rapidly growing. With a high document arrival rate, it becomes essential for users of such databases to have access to the very latest documents; yet the high document arrival rate also makes it di cult for t ..."
Abstract
-
Cited by 64 (5 self)
- Add to MetaCart
The number, size, and user population of bibliographic and full text document databases are rapidly growing. With a high document arrival rate, it becomes essential for users of such databases to have access to the very latest documents; yet the high document arrival rate also makes it di cult for the users to keep themselves updated. It is desirable to allow users to subscribe pro les, i.e., queries that are constantly evaluated, so that they will be automatically informed of new additions that may beofinterest. Such service is traditionally called Selective Dissemination of Information (SDI). The high document arrival rate, the huge number of users, and the timeliness requirement of the service pose a challenge in achieving e cient SDI. In this paper, we propose several index structures for indexing pro les and algorithms that e ciently match documents against large number of pro les. We also present analysis and simulations results to compare their performance under di erent scenarios. 1
Internet Resource Discovery Services
- IEEE Computer
, 1993
"... This paper presents an overview of resource discovery services currently available on the Internet. First, we survey a number of existing Internet discovery services. Then, we present a taxonomy of design decisions and characteristics of tools for the Internet resource discovery problem [30]. The Wi ..."
Abstract
-
Cited by 63 (5 self)
- Add to MetaCart
This paper presents an overview of resource discovery services currently available on the Internet. First, we survey a number of existing Internet discovery services. Then, we present a taxonomy of design decisions and characteristics of tools for the Internet resource discovery problem [30]. The Wide Area Information Server
The Prospero File System: A Global File System Based on the Virtual System Model
, 1992
"... Distributed file systems have come into widespread use in recent years. Many allow files to be accessed over large geographic areas and across organizational boundaries. However, few systems to date have given much thought to how information should be organized in such a global environment. This pap ..."
Abstract
-
Cited by 45 (6 self)
- Add to MetaCart
Distributed file systems have come into widespread use in recent years. Many allow files to be accessed over large geographic areas and across organizational boundaries. However, few systems to date have given much thought to how information should be organized in such a global environment. This paper describes the Prospero File System, a file system based on the Virtual System Model, a model for building large systems within which users construct their own virtual systems by selecting and organizing the objects and services of interest. This customized view of a globaJ file system makes it easier for users to keep track of files that they have identified as being of interest. The use of multiple name spaces can cause confusion. Such confusion is eliminated by support for closure: ever) ' object has an associated name space, and names specified by the object are resolved in that name space. Tools are provided to allow views to be kept up-to-date, and to allow views to be defined as functions of other (possibly changing) views. These tools promote sharing and enable the organization of files in ways that make it easier to identify information of interest than it is in existing systems. The prototype implementation has been used to organize information available from Internet archive sites; its directory service has been used from more than 7,500 systems in 29 countries. This paper discusses the goals of the Prospero File System, describes the prototype implementation, and discusses experience with the use of the system to date.
Index Structures for Information Filtering Under the Vector Space Model
- In Proc. International Conference on Data Engineering
, 1993
"... With the ever increasing volumes of electronic information generation, users of information systems are facing an information overload. It is desirable to support information filtering as a complement to traditional retrieval mechanism. The number of users, and thus profiles (representing users' lon ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
With the ever increasing volumes of electronic information generation, users of information systems are facing an information overload. It is desirable to support information filtering as a complement to traditional retrieval mechanism. The number of users, and thus profiles (representing users' long-term interests), handled by an information filtering system is potentially huge, and the system has to process a constant stream of incoming information in a timely fashion. The efficiency of the filtering process is thus an important issue. In this paper, we study what data structures and algorithms can be used to efficiently perform large-scale information filtering under the vector space model, a retrieval model established as being effective. We apply the idea of the standard inverted index to index user profiles. We devise an alternative to the standard inverted index, in which we, instead of indexing every term in a profile, select only the significant ones to index. We evaluate thei...

