Results 1  10
of
1,343
Randomized Algorithms
, 1995
"... Randomized algorithms, once viewed as a tool in computational number theory, have by now found widespread application. Growth has been fueled by the two major benefits of randomization: simplicity and speed. For many applications a randomized algorithm is the fastest algorithm available, or the simp ..."
Abstract

Cited by 1876 (38 self)
 Add to MetaCart
Randomized algorithms, once viewed as a tool in computational number theory, have by now found widespread application. Growth has been fueled by the two major benefits of randomization: simplicity and speed. For many applications a randomized algorithm is the fastest algorithm available, or the simplest, or both. A randomized algorithm is an algorithm that uses random numbers to influence the choices it makes in the course of its computation. Thus its behavior (typically quantified as running time or quality of output) varies from
The space complexity of approximating the frequency moments
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1996
"... The frequency moments of a sequence containing mi elements of type i, for 1 ≤ i ≤ n, are the numbers Fk = �n i=1 mki. We consider the space complexity of randomized algorithms that approximate the numbers Fk, when the elements of the sequence are given one by one and cannot be stored. Surprisingly, ..."
Abstract

Cited by 704 (12 self)
 Add to MetaCart
The frequency moments of a sequence containing mi elements of type i, for 1 ≤ i ≤ n, are the numbers Fk = �n i=1 mki. We consider the space complexity of randomized algorithms that approximate the numbers Fk, when the elements of the sequence are given one by one and cannot be stored. Surprisingly, it turns out that the numbers F0, F1 and F2 can be approximated in logarithmic space, whereas the approximation of Fk for k ≥ 6 requires nΩ(1) space. Applications to data bases are mentioned as well.
Mining Generalized Association Rules
, 1995
"... We introduce the problem of mining generalized association rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (isa hierarchy) on the items, we find associations between items at any level of the taxonomy. For example, given a taxonomy th ..."
Abstract

Cited by 457 (7 self)
 Add to MetaCart
We introduce the problem of mining generalized association rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (isa hierarchy) on the items, we find associations between items at any level of the taxonomy. For example, given a taxonomy that says that jackets isa outerwear isa clothes, we may infer a rule that "people who buy outerwear tend to buy shoes". This rule may hold even if rules that "people who buy jackets tend to buy shoes", and "people who buy clothes tend to buy shoes" do not hold. An obvious solution to the problem is to add all ancestors of each item in a transaction to the transaction, and then run any of the algorithms for mining association rules on these "extended transactions ". However, this "Basic" algorithm is not very fast; we present two algorithms, Cumulate and EstMerge, which run 2 to 5 times faster than Basic (and more than 100 times faster on one reallife dataset). We also present a new interes...
Exact Sampling with Coupled Markov Chains and Applications to Statistical Mechanics
, 1996
"... For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has ..."
Abstract

Cited by 410 (13 self)
 Add to MetaCart
For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has run for M steps, with M sufficiently large, the distribution governing the state of the chain approximates the desired distribution. Unfortunately it can be difficult to determine how large M needs to be. We describe a simple variant of this method that determines on its own when to stop, and that outputs samples in exact accordance with the desired distribution. The method uses couplings, which have also played a role in other sampling schemes; however, rather than running the coupled chains from the present into the future, one runs from a distant point in the past up until the present, where the distance into the past that one needs to go is determined during the running of the al...
Sampling Large Databases for Association Rules
, 1996
"... Discovery of association rules is an important database mining problem. Current algorithms for nding association rules require several passes over the analyzed database, and obviously the role of I/O overhead is very signi cant for very large databases. We present new algorithms that reduce the data ..."
Abstract

Cited by 376 (4 self)
 Add to MetaCart
Discovery of association rules is an important database mining problem. Current algorithms for nding association rules require several passes over the analyzed database, and obviously the role of I/O overhead is very signi cant for very large databases. We present new algorithms that reduce the database activity considerably. Theidea is to pick a random sample, to ndusingthis sample all association rules that probably hold in the whole database, and then to verify the results with the restofthe database. The algorithms thus produce exact association rules, not approximations based on a sample. The approach is, however, probabilistic, and inthose rare cases where our sampling method does not produce all association rules, the missing rules can be found inasecond pass. Our experiments show that the proposed algorithms can nd association rules very e ciently in only onedatabase pass. 1
The Capacity of LowDensity ParityCheck Codes Under MessagePassing Decoding
, 2001
"... In this paper, we present a general method for determining the capacity of lowdensity paritycheck (LDPC) codes under messagepassing decoding when used over any binaryinput memoryless channel with discrete or continuous output alphabets. Transmitting at rates below this capacity, a randomly chos ..."
Abstract

Cited by 367 (8 self)
 Add to MetaCart
In this paper, we present a general method for determining the capacity of lowdensity paritycheck (LDPC) codes under messagepassing decoding when used over any binaryinput memoryless channel with discrete or continuous output alphabets. Transmitting at rates below this capacity, a randomly chosen element of the given ensemble will achieve an arbitrarily small target probability of error with a probability that approaches one exponentially fast in the length of the code. (By concatenating with an appropriate outer code one can achieve a probability of error that approaches zero exponentially fast in the length of the code with arbitrarily small loss in rate.) Conversely, transmitting at rates above this capacity the probability of error is bounded away from zero by a strictly positive constant which is independent of the length of the code and of the number of iterations performed. Our results are based on the observation that the concentration of the performance of the decoder around its average performance, as observed by Luby et al. [1] in the case of a binarysymmetric channel and a binary messagepassing algorithm, is a general phenomenon. For the particularly important case of beliefpropagation decoders, we provide an effective algorithm to determine the corresponding capacity to any desired degree of accuracy. The ideas presented in this paper are broadly applicable and extensions of the general method to lowdensity paritycheck codes over larger alphabets, turbo codes, and other concatenated coding schemes are outlined.
On the Resemblance and Containment of Documents
 In Compression and Complexity of Sequences (SEQUENCES’97
, 1997
"... Given two documents A and B we define two mathematical notions: their resemblance r(A, B)andtheircontainment c(A, B) that seem to capture well the informal notions of "roughly the same" and "roughly contained." The basic idea is to reduce these issues to set intersection problems that can be eas ..."
Abstract

Cited by 345 (7 self)
 Add to MetaCart
Given two documents A and B we define two mathematical notions: their resemblance r(A, B)andtheircontainment c(A, B) that seem to capture well the informal notions of "roughly the same" and "roughly contained." The basic idea is to reduce these issues to set intersection problems that can be easily evaluated by a process of random sampling that can be done independently for each document. Furthermore, the resemblance can be evaluated using a fixed size sample for each document.
A Random Graph Model for Massive Graphs
, 2000
"... We propose a random graph model which is a special case of sparse random graphs with given degree sequences. This model involves only a small number of parameters, called logsize and loglog growth rate. These parameters capture some universal characteristics of massive graphs. Furthermore, from the ..."
Abstract

Cited by 332 (25 self)
 Add to MetaCart
We propose a random graph model which is a special case of sparse random graphs with given degree sequences. This model involves only a small number of parameters, called logsize and loglog growth rate. These parameters capture some universal characteristics of massive graphs. Furthermore, from these parameters, various properties of the graph can be derived. For example, for certain ranges of the parameters, we will compute the expected distribution of the sizes of the connected components which almost surely occur with high probability. We will illustrate the consistency of our model with the behavior of some massive graphs derived from data in telecommunications. We will also discuss the threshold function, the giant component, and the evolution of random graphs in this model.
A PolynomialTime Approximation Algorithm for the Permanent of a Matrix with NonNegative Entries
 Journal of the ACM
, 2004
"... Abstract. We present a polynomialtime randomized algorithm for estimating the permanent of an arbitrary n ×n matrix with nonnegative entries. This algorithm—technically a “fullypolynomial randomized approximation scheme”—computes an approximation that is, with high probability, within arbitrarily ..."
Abstract

Cited by 316 (23 self)
 Add to MetaCart
Abstract. We present a polynomialtime randomized algorithm for estimating the permanent of an arbitrary n ×n matrix with nonnegative entries. This algorithm—technically a “fullypolynomial randomized approximation scheme”—computes an approximation that is, with high probability, within arbitrarily small specified relative error of the true value of the permanent. Categories and Subject Descriptors: F.2.2 [Analysis of algorithms and problem complexity]: Nonnumerical
CollusionSecure Fingerprinting for Digital Data
 IEEE Transactions on Information Theory
, 1996
"... This paper discusses methods for assigning codewords for the purpose of fingerprinting digital data (e.g., software, documents, and images). Fingerprinting consists of uniquely marking and registering each copy of the data. This marking allows a distributor to detect any unauthorized copy and trac ..."
Abstract

Cited by 287 (1 self)
 Add to MetaCart
This paper discusses methods for assigning codewords for the purpose of fingerprinting digital data (e.g., software, documents, and images). Fingerprinting consists of uniquely marking and registering each copy of the data. This marking allows a distributor to detect any unauthorized copy and trace it back to the user. This threat of detection will hopefully deter users from releasing unauthorized copies. A problem arises when users collude: For digital data, two different fingerprinted objects can be compared and the differences between them detected. Hence, a set of users can collude to detect the location of the fingerprint. They can then alter the fingerprint to mask their identities. We present a general fingerprinting solution which is secure in the context of collusion. In addition, we discuss methods for distributing fingerprinted data. 1 Introduction Fingerprinting is an old cryptographic technique. For instance, several hundred years ago logarithm tables were protec...