Results 1 - 10
of
14
Self-Indexing Inverted Files for Fast Text Retrieval
- ACM Transactions on Information Systems
, 1996
"... Query processing costs on large text databases are dominated by the need to retrieve and scan the inverted list of each query term. Here we show that query response time for conjunctive Boolean queries and for informal ranked queries can be dramatically reduced, at little cost in terms of storage, b ..."
Abstract
-
Cited by 127 (23 self)
- Add to MetaCart
Query processing costs on large text databases are dominated by the need to retrieve and scan the inverted list of each query term. Here we show that query response time for conjunctive Boolean queries and for informal ranked queries can be dramatically reduced, at little cost in terms of storage, by the inclusion of an internal index in each inverted list. This method has been applied in a retrieval system for a collection of nearly two million short documents. Our experimental results show that the selfindexing strategy adds less than 20% to the size of the inverted file, but, for Boolean queries of 5--10 terms, can reduce processing time to under one fifth of the previous cost. Similarly, ranked queries of 40--50 terms can be evaluated in as little as 25% of the previous time, with little or no loss of retrieval effectiveness.
Adding Compression to a Full-Text Retrieval System
, 1995
"... We describe the implementation of a data compression scheme as an integral and transparent layer within a full-text... ..."
Abstract
-
Cited by 75 (25 self)
- Add to MetaCart
We describe the implementation of a data compression scheme as an integral and transparent layer within a full-text...
A Text Compression Scheme That Allows Fast Searching Directly In The Compressed File
- ACM Transactions on Information Systems
, 1993
"... . A new text compression scheme is presented in this paper. The main purpose of this scheme is to speed up string matching by searching the compressed file directly. The scheme requires no modification of the string-matching algorithm, which is used as a black box; any string-matching procedure can ..."
Abstract
-
Cited by 56 (1 self)
- Add to MetaCart
. A new text compression scheme is presented in this paper. The main purpose of this scheme is to speed up string matching by searching the compressed file directly. The scheme requires no modification of the string-matching algorithm, which is used as a black box; any string-matching procedure can be used. Instead, the pattern is modified; only the outcome of the matching of the modified pattern against the compressed file is decompressed. Since the compressed file is smaller than the original file, the search is faster both in terms of I/O time and processing time than a search in the original file. For typical text files, we achieve about 30% reduction of space and slightly less of search time. A 30% space saving is not competitive with good text compression schemes, and thus should not be used where space is the predominant concern. The intended applications of this scheme are files that are searched often, such as catalogs, bibliographic files, and address books. Such files are ty...
Modeling and assessing inference exposure in encrypted databases
- ACM Transactions on Information and System Security (TISSEC
, 2005
"... The scope and character of today’s computing environments are progressively shifting from traditional, one-on-one client-server interaction to the new cooperative paradigm. It then becomes of primary importance to provide means of protecting the secrecy of the information, while guaranteeing its ava ..."
Abstract
-
Cited by 28 (22 self)
- Add to MetaCart
The scope and character of today’s computing environments are progressively shifting from traditional, one-on-one client-server interaction to the new cooperative paradigm. It then becomes of primary importance to provide means of protecting the secrecy of the information, while guaranteeing its availability to legitimate clients. Operating online querying services securely on open networks is very difficult; therefore many enterprises outsource their data center operations to external application service providers. A promising direction toward prevention of unauthorized access to outsourced data is represented by encryption. However, data encryption is often supported for the sole purpose of protecting the data in storage while allowing access to plaintext values by the server, which decrypts data for query execution. In this paper, we present a simple yet robust single-server solution for remote querying of encrypted databases on external servers. Our approach is based on the use of indexing information attached to the encrypted database, which can be used by the server to select the data to be This paper extends the previous work by the authors appeared under the title “Balancing
Memory Efficient Ranking
- Information Processing & Management
, 2002
"... Fast and effective ranking of a collection of documents with respect to a query requires several structures, including a vocabulary, inverted file entries, arrays of term weights and document lengths, an array of partial similarity accumulators, and address tables for inverted file entries and docum ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Fast and effective ranking of a collection of documents with respect to a query requires several structures, including a vocabulary, inverted file entries, arrays of term weights and document lengths, an array of partial similarity accumulators, and address tables for inverted file entries and documents. Of all of these structures, the array of document lengths and the array of accumulators are the components accessed most frequently in a ranked query, and it is crucial to acceptable performance that they be held in main memory. Here we describe an approximate ranking process that makes use of a compact array of in-memory low precision approximations for the lengths. Combined with another simple rule for reducing the memory required by the partial similarity accumulators, the approximation heuristic allows the ranking of large document collections using less than one byte of memory per document, an eight-fold reduction compared with the space required by conventional techniques. Moreover, in our experiments retrieval effectiveness was unaffected by the use of these heuristics.
Compression, Information Theory and Grammars: A Unified Approach
- ACM Trans. on Information Systems
, 1990
"... : Text compression is of considerable theoretical and practical interest. It is, for example, becoming increasingly important for satisfying the requirements of fitting a large database onto a single CDROM. Many of the compression techniques discussed in the literature are model based. We here prop ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
: Text compression is of considerable theoretical and practical interest. It is, for example, becoming increasingly important for satisfying the requirements of fitting a large database onto a single CDROM. Many of the compression techniques discussed in the literature are model based. We here propose the notion of a formal grammar as a flexible model of text generation that encompasses most of the models offered before as well as, in principle, extending the possibility of compression to a much more general class of languages. Assuming a general model of text generation, a derivation is given of the well known Shannon entropy formula, making possible a theory of information based upon text representation rather than on communication. The ideas are shown to apply to a number of commonly used text models. Finally, we focus on a Markov model of text generation, suggest an information theoretic measure of similarity between two probability distributions, and develop a clustering algorith...
Implementation of a Storage Mechanism for Untrusted DBMSs
- IN PROC. OF THE SECOND INTERNATIONAL IEEE SECURITY IN STORAGE WORKSHOP
, 2003
"... Several architectures have been recently proposed that store relational data in encrypted form on untrusted relational databases. Such architectures permit the creation of novel Internet services and also offer an opportunity for a better construction of ASP solutions. Environments where there are l ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
Several architectures have been recently proposed that store relational data in encrypted form on untrusted relational databases. Such architectures permit the creation of novel Internet services and also offer an opportunity for a better construction of ASP solutions. Environments where there are limited resources that do not permit an efficient management of databases or where it is critical to offer a robust Internet access to private data may all benefit from the above architectures. In this paper we analyze the impact that this architecture has on the typical services of a database. The analysis is based on the experience gained in the construction of a prototype of a complete architecture for the management of encrypted databases. Specifically, we illustrate the impact on query translation and optimization, and the main components of the software architecture of the prototype.
WAP may Stumble over the Gateway (Security in WAP-based Mobile Commerce)
, 2001
"... The key design idea underlying the Wireless Application Protocol (WAP) is to use a gateway at the intersection of the wireless mobile network and the traditional, wired network. The WAP gateway forwards web content to the mobile phone in a way intended to accommodate the limited bandwidth of the mob ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The key design idea underlying the Wireless Application Protocol (WAP) is to use a gateway at the intersection of the wireless mobile network and the traditional, wired network. The WAP gateway forwards web content to the mobile phone in a way intended to accommodate the limited bandwidth of the mobile network and the mobile phone's limited processing capability. However, the gateway introduces a security hole which may render WAP unsuitable for m-commerce and other security-sensitive transactions and services on the emerging mobile Internet. The paper explains the security hole and the gateway-based design that has led to it, including the technical and business considerations underlying the design. A number of ways to correct the situation are discussed, including a complete re-design of WAP as proposed for the future version 2.0 of the protocol. Index Terms---WAP, gateway, Internet, end-to-end security, protocols, mobile commerce. I.
Complexity Aspects of Guessing Prefix Codes
- Algorithmica
, 1994
"... : Given a natural language cleartext and a ciphertext obtained by Huffman coding, the problem of guessing the code is shown to be NP-complete for various variants of the encoding process. One of the best known compression techniques is due to Huffman [3], which is optimal for any given probability d ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
: Given a natural language cleartext and a ciphertext obtained by Huffman coding, the problem of guessing the code is shown to be NP-complete for various variants of the encoding process. One of the best known compression techniques is due to Huffman [3], which is optimal for any given probability distribution in the sense that it achieves a minimum redundancy code, provided each codeword consists of an integral number of bits. The aspect of using Huffman codes also as an encryption method has been considered in [6] and recently in [4], where it was motivated by an application to storing a large textual database on a CD-ROM. The text of the database had not only to be compressed, but also to be encrypted to prevent illegal use of copyrighted material. In this paper we show that various decoding problems involving variable length prefix codes, of which Huffman codes are a special case, are NP-complete, and suggest some methods how this could be exploited to increase the cryptographic se...

