Results 1  10
of
37
Mining Generalized Association Rules
, 1995
"... We introduce the problem of mining generalized association rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (isa hierarchy) on the items, we find associations between items at any level of the taxonomy. For example, given a taxonomy th ..."
Abstract

Cited by 481 (7 self)
 Add to MetaCart
(Show Context)
We introduce the problem of mining generalized association rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (isa hierarchy) on the items, we find associations between items at any level of the taxonomy. For example, given a taxonomy that says that jackets isa outerwear isa clothes, we may infer a rule that "people who buy outerwear tend to buy shoes". This rule may hold even if rules that "people who buy jackets tend to buy shoes", and "people who buy clothes tend to buy shoes" do not hold. An obvious solution to the problem is to add all ancestors of each item in a transaction to the transaction, and then run any of the algorithms for mining association rules on these "extended transactions ". However, this "Basic" algorithm is not very fast; we present two algorithms, Cumulate and EstMerge, which run 2 to 5 times faster than Basic (and more than 100 times faster on one reallife dataset). We also present a new interes...
The Power of Two Random Choices: A Survey of Techniques and Results
 in Handbook of Randomized Computing
, 2000
"... ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately ..."
Abstract

Cited by 102 (2 self)
 Add to MetaCart
(Show Context)
ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately log n= log log n with high probability. Now suppose instead that the balls are placed sequentially, and each ball is placed in the least loaded of d 2 bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this case, the maximum load is log log n= log d + (1) with high probability [ABKU99]. The important implication of this result is that even a small amount of choice can lead to drastically different results in load balancing. Indeed, having just two random choices (i.e.,...
Parallel Randomized Load Balancing
 In Symposium on Theory of Computing. ACM
, 1995
"... It is well known that after placing n balls independently and uniformly at random into n bins, the fullest bin holds \Theta(log n= log log n) balls with high probability. Recently, Azar et al. analyzed the following: randomly choose d bins for each ball, and then sequentially place each ball in the ..."
Abstract

Cited by 57 (8 self)
 Add to MetaCart
(Show Context)
It is well known that after placing n balls independently and uniformly at random into n bins, the fullest bin holds \Theta(log n= log log n) balls with high probability. Recently, Azar et al. analyzed the following: randomly choose d bins for each ball, and then sequentially place each ball in the least full of its chosen bins [2]. They show that the fullest bin contains only log log n= log d + \Theta(1) balls with high probability. We explore extensions of this result to parallel and distributed settings. Our results focus on the tradeoff between the amount of communication and the final load. Given r rounds of communication, we provide lower bounds on the maximum load of \Omega\Gamma r p log n= log log n) for a wide class of strategies. Our results extend to the case where the number of rounds is allowed to grow with n. We then demonstrate parallelizations of the sequential strategy presented in Azar et al. that achieve loads within a constant factor of the lower bound for two ...
Evolutionary Algorithms  How to Cope With Plateaus of Constant Fitness and When to Reject Strings of The Same Fitness
, 2000
"... The most simple evolutionary algorithm, the socalled (1+1)EA accepts a child if its fitness is at least as large (in the case of maximization) as the fitness of its parent. The variant (1 + 1) # EA only accepts a child if its fitness is strictly larger than the fitness of its parent. Here two funct ..."
Abstract

Cited by 44 (11 self)
 Add to MetaCart
The most simple evolutionary algorithm, the socalled (1+1)EA accepts a child if its fitness is at least as large (in the case of maximization) as the fitness of its parent. The variant (1 + 1) # EA only accepts a child if its fitness is strictly larger than the fitness of its parent. Here two functions related to the class of long path functions are presented such that the (1 + 1)EA maximizes one of it in polynomial time and needs exponential time for the other while the (1+1) # EA has the opposite behavior. These results prove that small changes of an evolutionary algorithm may change its behavior significantly. Since the (1 + 1)EA and the (1 + 1) # EA di#er only on plateaus of constant fitness, the results also show how evolutionary algorithms behave on such plateaus. The (1 + 1)EA can pass a path of constant fitness and polynomial length in polynomial time. Finally, for these functions it is shown that local performance measures like the quality gain and the progress rate do not de...
Measure, stochasticity, and the density of hard languages
 Proceedings of the Tenth Symposium on Theoretical Aspects of Computer Science
, 1993
"... The main theorem of this paper is that, for every real number <1 (e.g., = 0:99), only a measure 0 subset of the languages decidable P in exponential time are n;ttreducible to languages that are not P exponentially dense. Thus every n;tthard language for E is exponentially dense. This strengthe ..."
Abstract

Cited by 41 (14 self)
 Add to MetaCart
(Show Context)
The main theorem of this paper is that, for every real number <1 (e.g., = 0:99), only a measure 0 subset of the languages decidable P in exponential time are n;ttreducible to languages that are not P exponentially dense. Thus every n;tthard language for E is exponentially dense. This strengthens Watanabe's 1987 result, that every P O(log n);tthard language for E is exponentially dense. The combinatorial technique used here, the sequentially most frequent query selection, also gives a new, simpler proof of Watanabe's result. The main theorem also has implications for the structure of NP under strong hypotheses. Ogiwara and Watanabe (1991) have shown P that the hypothesis P 6 = NP implies that every btthard language for NP is nonsparse (i.e., not polynomially sparse). Their technique does not appear to allow signi cant relaxation of either the query bound or the sparseness criterion. It is shown here that a stronger hypothesis namely, that NP does not have measure 0 in exponential timeimplies P the stronger conclusion that, for every real <1, every n;tthard language for NP is exponentially dense. Evidence is presented that this stronger hypothesis is reasonable. The proof of the main theorem uses a new, very general weak stochasticity theorem, ensuring that almost every language in E is statistically unpredictable by feasible deterministic algorithms, even How dense must a language A f0 � 1g be in order to be hard for a complexity class C? The ongoing investigation of this question, especially important
On learning monotone DNF under product distributions
 In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory
, 2001
"... We show that the class of monotone 2 O( √ log n)term DNF formulae can be PAC learned in polynomial time under the uniform distribution from random examples only. This is an exponential improvement over the best previous polynomialtime algorithms in this model, which could learn monotone o(log 2 n) ..."
Abstract

Cited by 35 (16 self)
 Add to MetaCart
We show that the class of monotone 2 O( √ log n)term DNF formulae can be PAC learned in polynomial time under the uniform distribution from random examples only. This is an exponential improvement over the best previous polynomialtime algorithms in this model, which could learn monotone o(log 2 n)term DNF. We also show that various classes of small constantdepth circuits which compute monotone functions are PAC learnable in polynomial time under the uniform distribution. All of our results extend to learning under any constantbounded product distribution.
Optimization with Randomized Search Heuristics  The (A)NFL Theorem, Realistic Scenarios, and Difficult Functions
, 2000
"... The No Free Lunch (NFL) theorem due to Wolpert and Macready (1997) has led to controversial discussions on the usefulness of randomized search heuristics, in particular, evolutionary algorithms. Here a short and simple proof of the NFL theorem is given to show its elementary character. Moreover, ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
The No Free Lunch (NFL) theorem due to Wolpert and Macready (1997) has led to controversial discussions on the usefulness of randomized search heuristics, in particular, evolutionary algorithms. Here a short and simple proof of the NFL theorem is given to show its elementary character. Moreover, the proof method leads to a generalization of the NFL theorem. Afterwards, realistic complexity theoretical based scenarios for black box optimization are presented and it is argued why NFL theorems are not possible in such situations. However, an Almost No Free Lunch (ANFL) theorem shows that for each function which can be optimized e#ciently by a search heuristic there can be constructed many related functions where the same heuristic is bad. As a consequence, search heuristics use some idea how to look for good points and can be successful only for functions "giving the right hints". The consequences of these theoretical considerations for some wellknown classes of functions ar...
Randomized Robot Navigation Algorithms
, 1996
"... We consider the problem faced by a mobile robot that has to reach a given target by traveling through an unmapped region in the plane containing oriented rectangular obstacles. We assume the robot has no prior knowledge about the positions or sizes of the obstacles, and acquires such knowledge only ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
We consider the problem faced by a mobile robot that has to reach a given target by traveling through an unmapped region in the plane containing oriented rectangular obstacles. We assume the robot has no prior knowledge about the positions or sizes of the obstacles, and acquires such knowledge only when obstacles are encountered. Our goal is to minimize the distance the robot must travel, using the competitive ratio as our measure. We give a new randomized algorithm...
Dynamic Parameter Control in Simple Evolutionary Algorithms
, 2000
"... Evolutionary algorithms are general, randomized search heuristics that are influenced by many parameters. Though evolutionary algorithms are assumed to be robust, it is wellknown that choosing the parameters appropriately is crucial for success and efficiency of the search. It has been shown in man ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
(Show Context)
Evolutionary algorithms are general, randomized search heuristics that are influenced by many parameters. Though evolutionary algorithms are assumed to be robust, it is wellknown that choosing the parameters appropriately is crucial for success and efficiency of the search. It has been shown in many experiments, that nonstatic parameter settings can be by far superior to static ones but theoretical verifications are hard to find. We investigate a very simple evolutionary algorithm and rigorously prove that employing dynamic parameter control can greatly speedup optimization.