Results 1  10
of
27
Clustering Gene Expression Patterns
, 1999
"... Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the ana ..."
Abstract

Cited by 446 (11 self)
 Add to MetaCart
Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. The corresponding algorithmic problem is to cluster multicondition gene expression patterns. In this paper we describe a novel clustering algorithm that was developed for analysis of gene expression data. We define an appropriate stochastic error model on the input, and prove that under the conditions of the model, the algorithm recovers the cluster structure with high probability. The running time of the algorithm on an ngene dataset is O(n 2 (log(n)) c ). We also present a practical heuristic based on the same algorithmic ideas. The heuristic was implemented and its p...
Cryptographic Limitations on Learning Boolean Formulae and Finite Automata
 PROCEEDINGS OF THE TWENTYFIRST ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING
, 1989
"... In this paper we prove the intractability of learning several classes of Boolean functions in the distributionfree model (also called the Probably Approximately Correct or PAC model) of learning from examples. These results are representation independent, in that they hold regardless of the syntact ..."
Abstract

Cited by 348 (15 self)
 Add to MetaCart
In this paper we prove the intractability of learning several classes of Boolean functions in the distributionfree model (also called the Probably Approximately Correct or PAC model) of learning from examples. These results are representation independent, in that they hold regardless of the syntactic form in which the learner chooses to represent its hypotheses. Our methods reduce the problems of cracking a number of wellknown publickey cryptosystems to the learning problems. We prove that a polynomialtime learning algorithm for Boolean formulae, deterministic finite automata or constantdepth threshold circuits would have dramatic consequences for cryptography and number theory: in particular, such an algorithm could be used to break the RSA cryptosystem, factor Blum integers (composite numbers equivalent to 3 modulo 4), and detect quadratic residues. The results hold even if the learning algorithm is only required to obtain a slight advantage in prediction over random guessing. The techniques used demonstrate an interesting duality between learning and cryptography. We also apply our results to obtain strong intractability results for approximating a generalization of graph coloring.
Randomized routing and sorting on fixedconnection networks
 JOURNAL OF ALGORITHMS
, 1994
"... This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps ..."
Abstract

Cited by 86 (13 self)
 Add to MetaCart
(Show Context)
This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps, using constantsize queues. In this paradigm, the design of a routing algorithm is broken into three parts: (1) showing that the underlying network can emulate a leveled network, (2) designing a path selection strategy for the leveled network, and (3) applying the scheduling algorithm. This strategy yields randomized algorithms for routing and sorting in time proportional to the diameter for meshes, butterflies, shuffleexchange graphs, multidimensional arrays, and hypercubes. It also leads to the construction of an areauniversal network: an Nnode network with area Θ(N) that can simulate any other network of area O(N) with slowdown O(log N).
Approximate Equilibria and Ball Fusion
 Theory of Computing Systems
, 2002
"... We consider sel sh routing over a network consisting of m parallel links through which n sel sh users route their tra c trying to minimize their own expected latency. Westudy the class of mixed strategies in which the expected latency through each link is at most a constant multiple of the optimum m ..."
Abstract

Cited by 63 (25 self)
 Add to MetaCart
(Show Context)
We consider sel sh routing over a network consisting of m parallel links through which n sel sh users route their tra c trying to minimize their own expected latency. Westudy the class of mixed strategies in which the expected latency through each link is at most a constant multiple of the optimum maximum latency had global regulation been available. For the case of uniform links it is known that all Nash equilibria belong to this class of strategies. We areinterested in bounding the coordination ratio (or price of anarchy) of these strategies de ned as the worstcase ratio of the maximum (over all links) expected latency over the optimum maximum latency. The load balancing aspect of the problem immediately implies a lower bound; lnm ln lnm of the coordination ratio. We give a tight (uptoamultiplicative constant) upper bound. To show the upper bound, we analyze a variant ofthe classical balls and bins problem, in which balls with arbitrary weights are placed into bins according to arbitrary probability distributions. At the heart of our approach is a new probabilistic tool that we call
Measure, stochasticity, and the density of hard languages
 Proceedings of the Tenth Symposium on Theoretical Aspects of Computer Science
, 1993
"... The main theorem of this paper is that, for every real number <1 (e.g., = 0:99), only a measure 0 subset of the languages decidable P in exponential time are n;ttreducible to languages that are not P exponentially dense. Thus every n;tthard language for E is exponentially dense. This strengthe ..."
Abstract

Cited by 47 (18 self)
 Add to MetaCart
(Show Context)
The main theorem of this paper is that, for every real number <1 (e.g., = 0:99), only a measure 0 subset of the languages decidable P in exponential time are n;ttreducible to languages that are not P exponentially dense. Thus every n;tthard language for E is exponentially dense. This strengthens Watanabe's 1987 result, that every P O(log n);tthard language for E is exponentially dense. The combinatorial technique used here, the sequentially most frequent query selection, also gives a new, simpler proof of Watanabe's result. The main theorem also has implications for the structure of NP under strong hypotheses. Ogiwara and Watanabe (1991) have shown P that the hypothesis P 6 = NP implies that every btthard language for NP is nonsparse (i.e., not polynomially sparse). Their technique does not appear to allow signi cant relaxation of either the query bound or the sparseness criterion. It is shown here that a stronger hypothesis namely, that NP does not have measure 0 in exponential timeimplies P the stronger conclusion that, for every real <1, every n;tthard language for NP is exponentially dense. Evidence is presented that this stronger hypothesis is reasonable. The proof of the main theorem uses a new, very general weak stochasticity theorem, ensuring that almost every language in E is statistically unpredictable by feasible deterministic algorithms, even How dense must a language A f0 � 1g be in order to be hard for a complexity class C? The ongoing investigation of this question, especially important
Probabilistic bounds on the coefficients of polynomials with only real zeros
, 1997
"... The work of Harper and subsequent authors has shown that finite sequences (a0,..., an) arising from combinatorial problems are often such that the polynomial A(z): = n k=0 akz k has only real zeros. Basic examples include rows from the arrays of binomial coefficients, Stirling numbers of the first a ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
(Show Context)
The work of Harper and subsequent authors has shown that finite sequences (a0,..., an) arising from combinatorial problems are often such that the polynomial A(z): = n k=0 akz k has only real zeros. Basic examples include rows from the arrays of binomial coefficients, Stirling numbers of the first and second kinds, and Eulerian numbers. Assuming the ak are nonnegative, A(1)>0 and that A(z) is not constant, it is known that A(z) has only real zeros iff the normalized sequence (a0 A(1),..., an A(1)) is the probability distribution of the number of successes in n independent trials for some sequence of success probabilities. Such sequences (a0,..., an) are also known to be characterized by total positivity of the infinite matrix (ai & j) indexed by nonnegative integers i and j. This papers reviews inequalities and approximations for such sequences, called Polya frequency sequences which follow from their probabilistic representation. In combinatorial examples these inequalities yield a number of improvements of known estimates.
A bound on the deviation probability for sums of nonnegative random variables
 2003), Art. 15. [ONLINE: http://jipam.vu.edu.au/ article.php?sid=251
"... ABSTRACT. A simple bound is presented for the probability that the sum of nonnegative independent random variables is exceeded by its expectation by more than a positive number t. If the variables have the same expectation the bound is slightly weaker than the Bennett and Bernstein inequalities, oth ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
(Show Context)
ABSTRACT. A simple bound is presented for the probability that the sum of nonnegative independent random variables is exceeded by its expectation by more than a positive number t. If the variables have the same expectation the bound is slightly weaker than the Bennett and Bernstein inequalities, otherwise it can be significantly stronger. The inequality extends to onesidedly bounded martingale difference sequences.
Information Theoretic Methods in Probability and Statistics
, 2001
"... Ideas of information theory have found fruitful applications not only in various fields of science and engineering but also within mathematics, both pure and applied. This is illustrated by several typical applications of information theory specifically in probability and statistics. ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
Ideas of information theory have found fruitful applications not only in various fields of science and engineering but also within mathematics, both pure and applied. This is illustrated by several typical applications of information theory specifically in probability and statistics.
Circuit size relative to pseudorandom oracles
 THEORETICAL COMPUTER SCIENCE A 107
, 1993
"... Circuitsize complexity is compared with deterministic and nondeterministic time complexity in the presence of pseudorandom oracles. The following separations are shown to hold relative to every pspacerandom oracle A, and relative toalmost every oracle A 2 ESPACE. (i) NP A is not contained in SIZE ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
Circuitsize complexity is compared with deterministic and nondeterministic time complexity in the presence of pseudorandom oracles. The following separations are shown to hold relative to every pspacerandom oracle A, and relative toalmost every oracle A 2 ESPACE. (i) NP A is not contained in SIZE A (2 n) for any real < 1 3. (ii) E A is not contained in SIZE A ( 2n n). Thus, neither NP A nor E A is contained in P A /Poly. In fact, these separations are shown to hold for almost every n. Since a randomly selected oracle is pspacerandom with probability one, (i) and (ii) immediately imply the corresponding random oracle separations, thus improving a result of Bennett and Gill [9] and answering open questions of Wilson [47].