Results 1 - 10
of
91
Beyond Market Baskets: Generalizing Association Rules To Dependence Rules
, 1998
"... One of the more well-studied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market bask ..."
Abstract
-
Cited by 414 (5 self)
- Add to MetaCart
One of the more well-studied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market basket data and partly by the goal of ironing out some problems in the definition of association rules, we develop the notion of dependence rules that identify statistical dependence in both the presence and absence of items in itemsets. We propose measuring significance of dependence via the chi-squared test for independence from classical statistics. This leads to a measure that is upward-closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between dependent and independent itemsets in the lattice. We develop pruning strategies based on the closure property and thereby devise an efficient algorithm for discovering dependence rules. We demonstrate our algorithm’s effectiveness by testing it on census data, text data (wherein we seek term dependence), and synthetic data.
Min-wise Independent Permutations
- Journal of Computer and System Sciences
, 1998
"... We define and study the notion of min-wise independent families of permutations. We say that F ⊆ Sn is min-wise independent if for any set X ⊆ [n] and any x ∈ X, when π is chosen at random in F we have Pr(min{π(X)} = π(x)) = 1 |X |. In other words we require that all the elements of any fixed set ..."
Abstract
-
Cited by 151 (10 self)
- Add to MetaCart
We define and study the notion of min-wise independent families of permutations. We say that F ⊆ Sn is min-wise independent if for any set X ⊆ [n] and any x ∈ X, when π is chosen at random in F we have Pr(min{π(X)} = π(x)) = 1 |X |. In other words we require that all the elements of any fixed set X have an equal chance to become the minimum element of the image of X under π. Our research was motivated by the fact that such a family (under some relaxations) is essential to the algorithm used in practice by the AltaVista web index software to detect and filter near-duplicate documents. However, in the course of our investigation we have discovered interesting and challenging theoretical questions related to this concept – we present the solutions to some of them and we list the rest as open problems.
Faster Algorithms for the Shortest Path Problem
- J. ASSOC. COMPUT. MACH
, 1990
"... Efficient implementations of Dijkstra’s shortest path algorithm are investigated. A new data structure, called the radix heap, is proposed for use in this algorithm. On a network with n vertices, m edges, and nonnegative integer arc costs bounded by C, a one-level form of radix heap gives a time b ..."
Abstract
-
Cited by 91 (8 self)
- Add to MetaCart
Efficient implementations of Dijkstra’s shortest path algorithm are investigated. A new data structure, called the radix heap, is proposed for use in this algorithm. On a network with n vertices, m edges, and nonnegative integer arc costs bounded by C, a one-level form of radix heap gives a time bound for Dijkstra’s algorithm of O(m + n log C). A two-level form of radix heap gives a bound of O(m + n log C/log log C). A combination of a radix heap and a previously known data structure called a Fibonacci heap gives a bound of O(m + nm). The best previously known bounds are O(m + n log n) using Fibonacci heaps alone and O(m log log C) using the priority queue structure of Van Emde Boas et al. [17].
Let Sleeping Files Lie: Pattern Matching in Z-compressed Files
, 1994
"... The current explosion of stored information necessitates a new model of pattern matching, that of compressed matching. In this model one tries to find all occurrences of a pattern in a compressed text in time proportional to the compressed text size, i.e., without decompressing the text. The most ef ..."
Abstract
-
Cited by 86 (2 self)
- Add to MetaCart
The current explosion of stored information necessitates a new model of pattern matching, that of compressed matching. In this model one tries to find all occurrences of a pattern in a compressed text in time proportional to the compressed text size, i.e., without decompressing the text. The most effective general purpose compression algorithms are adaptive, in that the text represented by each compression symbol is determined dynamically by the data. As a result, the encoding of a substring depends on its location. Thus the same substring may "look different" every time it appears in the compressed text. In this paper we consider pattern matching without decompression in the UNIX Z-compression. This is a variant of the Lempel-Ziv adaptive compression scheme. If n is the length of the compressed text and m is the length of the pattern, our algorithms find the first pattern occurrence in time O(n+m 2 ) or O(n log m+m). We also introduce a new criterion to measure compressed matching ...
Closest-Point Problems in Computational Geometry
, 1997
"... This is the preliminary version of a chapter that will appear in the Handbook on Computational Geometry, edited by J.-R. Sack and J. Urrutia. A comprehensive overview is given of algorithms and data structures for proximity problems on point sets in IR D . In particular, the closest pair problem, th ..."
Abstract
-
Cited by 60 (14 self)
- Add to MetaCart
This is the preliminary version of a chapter that will appear in the Handbook on Computational Geometry, edited by J.-R. Sack and J. Urrutia. A comprehensive overview is given of algorithms and data structures for proximity problems on point sets in IR D . In particular, the closest pair problem, the exact and approximate post-office problem, and the problem of constructing spanners are discussed in detail. Contents 1 Introduction 1 2 The static closest pair problem 4 2.1 Preliminary remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Algorithms that are optimal in the algebraic computation tree model . 5 2.2.1 An algorithm based on the Voronoi diagram . . . . . . . . . . . 5 2.2.2 A divide-and-conquer algorithm . . . . . . . . . . . . . . . . . . 5 2.2.3 A plane sweep algorithm . . . . . . . . . . . . . . . . . . . . . . 6 2.3 A deterministic algorithm that uses indirect addressing . . . . . . . . . 7 2.3.1 The degraded grid . . . . . . . . . . . . . . . . . . ...
A Perfect Hash Function Generator
"... gperf is a "software-tool generating-tool" designed to automate the generation of perfect hash functions. This paper describes the features, algorithms, and object-oriented design and implementation strategies incorporated in gperf.Italso presents the results from an empirical comparison between gp ..."
Abstract
-
Cited by 49 (34 self)
- Add to MetaCart
gperf is a "software-tool generating-tool" designed to automate the generation of perfect hash functions. This paper describes the features, algorithms, and object-oriented design and implementation strategies incorporated in gperf.Italso presents the results from an empirical comparison between gperf-generated recognizers and other popular techniques for reserved word lookup. gperf is distributed with the GNU libg++ library and is used to generate the keyword recognizers for the GNU C and GNU C++ compilers. 1 Introduction Perfect hash functions are a time and space efficient implementation of static search sets, which are ADTs with operations like initialize, insert,andretrieve. Static search sets are common in system software applications. Typical static search sets include compiler and interpreter reserved words, assembler instruction mnemonics, and shell interpreter builtin commands. Search set elements are called keywords.Key- words are inserted into the set once, usually at c...
Spheres, Molecules, and Hidden Surface Removal
, 1996
"... We devise techniques to manipulate a collection of loosely interpenetrating spheres in threedimensional space. Our study is motivated by the representation and manipulation of molecular con gurations, modeled by a collection of spheres. We analyze the sphere model and point toitsfavorable properties ..."
Abstract
-
Cited by 45 (14 self)
- Add to MetaCart
We devise techniques to manipulate a collection of loosely interpenetrating spheres in threedimensional space. Our study is motivated by the representation and manipulation of molecular con gurations, modeled by a collection of spheres. We analyze the sphere model and point toitsfavorable properties that make it more easy to manipulate than an arbitrary collection of spheres. For this special sphere model we present e cient algorithms for computing its union boundary and for hidden surface removal. The e ciency and practicality of our approach are demonstrated by experiments on actual molecule data.
The Snapshot Index, an I/O-Optimal Access Method for Timeslice Queries
- Information Systems, An International Journal
, 1995
"... Abstract- We present an access method for timeslice queries that reconstructs a past state s(t) of a time-evolving collection of objects, in O(log,, n + Is(t)l/b) I/O ‘8, where Is(t)1 denotes the size of the collection at time t, n is the total number of changes in the collection’s evolution and b i ..."
Abstract
-
Cited by 44 (15 self)
- Add to MetaCart
Abstract- We present an access method for timeslice queries that reconstructs a past state s(t) of a time-evolving collection of objects, in O(log,, n + Is(t)l/b) I/O ‘8, where Is(t)1 denotes the size of the collection at time t, n is the total number of changes in the collection’s evolution and b is the size of an I/O transfer. Changes include the addition, deletion or attribute modification of objects; they are assumed to occur in increasing time order and always affect the most current state of the collection (thus our index supports transaction-time.) The space used is 0 n/b) while the update processing is constant per change, i.e., independent of n. This is the first I I O-optimal access method for this problem using O(n/b) space and O(1) updating (in the expected amortized sense due to the use of hashing.) This performance is also achieved for interval intersection temporal queries. An advantage of our approach is that its performance can be tuned to match particular application needs (trading space for query time and vice versa). In addition, the Snapshot Index can naturally migrate data on a write-once optical medium while maintaining the same performance bounds.
Accountable Certificate Management Using Undeniable Attestations
- COMPUTER AND COMMUNICATIONS SECURITY
, 2000
"... This paper initiates a study of accountable certificate management methods, necessary to support long-term authenticity of digital documents. Our main contribution is a model for accountable certificate management, where clients receive attestations confirming inclusion/removal of their certificates ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
This paper initiates a study of accountable certificate management methods, necessary to support long-term authenticity of digital documents. Our main contribution is a model for accountable certificate management, where clients receive attestations confirming inclusion/removal of their certificates from the database of valid certificates. We explain why accountability depends on the inability of the third parties to create contradictory attestations. After that we define an undeniable attester as a primitive that provides efficient attestation creation, publishing and verification, so that it is intractable to create contradictory attestations. We introduce authenticated search trees and build an efficient undeniable attester upon them. The proposed system is the first accountable long-term certificate management system. Moreover, authenticated search trees can be used in many security-critical applications instead of the (sorted) hash trees to reduce trust in the authorities, without decrease in efficiency. Therefore, the undeniable attester promises looks like a very useful cryptographic primitive with a wide range of applications.
Space Efficient Hash Tables With Worst Case Constant Access Time
- In STACS
, 2003
"... We generalize Cuckoo Hashing [23] to d-ary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
We generalize Cuckoo Hashing [23] to d-ary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes and the expected amortized insertion time is constant. This is the first dictionary that has worst case constant access time and expected constant update time, works with (1 + ffl) n space, and supports satellite information. Experiments indicate that d = 4 choices suffice for ffl 0:03. We also describe variants of the data structure that allow the use of hash functions that can be evaluted in constant time.

