Clustering Gene Expression Patterns
, 1999
"... Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the ana ..."
Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. The corresponding algorithmic problem is to cluster multicondition gene expression patterns. In this paper we describe a novel clustering algorithm that was developed for analysis of gene expression data. We define an appropriate stochastic error model on the input, and prove that under the conditions of the model, the algorithm recovers the cluster structure with high probability. The running time of the algorithm on an ngene dataset is O(n 2 (log(n)) c ). We also present a practical heuristic based on the same algorithmic ideas. The heuristic was implemented and its p...
Learning in the Presence of Malicious Errors
 SIAM Journal on Computing
, 1993
"... In this paper we study an extension of the distributionfree model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an advers ..."
In this paper we study an extension of the distributionfree model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an adversary with unbounded computational power and access to the entire history of the learning algorithm's computation. Thus, we study a worstcase model of errors. Our results include general methods for bounding the rate of error tolerable by any learning algorithm, efficient algorithms tolerating nontrivial rates of malicious errors, and equivalences between problems of learning with errors and standard combinatorial optimization problems. 1 Introduction In this paper, we study a practical extension to Valiant's distributionfree model of learning: the presence of errors (possibly maliciously generated by an adversary) in the sample data. The distributionfree model typically makes the idealize...
Approximation Algorithms for Disjoint Paths Problems
, 1996
"... The construction of disjoint paths in a network is a basic issue in combinatorial optimization: given a network, and specified pairs of nodes in it, we are interested in finding disjoint paths between as many of these pairs as possible. This leads to a variety of classical NPcomplete problems for w ..."
The construction of disjoint paths in a network is a basic issue in combinatorial optimization: given a network, and specified pairs of nodes in it, we are interested in finding disjoint paths between as many of these pairs as possible. This leads to a variety of classical NPcomplete problems for which very little is known from the point of view of approximation algorithms. It has recently been brought into focus in work on problems such as VLSI layout and routing in highspeed networks; in these settings, the current lack of understanding of the disjoint paths problem is often an obstacle to the design of practical heuristics.
A Lower Bound for Radio Broadcast
, 1991
"... A radio network is a synchronous network of processors that communicate by transmitting messages to their neighbors, where a processor receives a message in a given step if and only if it is silent in this step and precisely one of its neighbors transmits. In this paper we prove the existence of a f ..."
A radio network is a synchronous network of processors that communicate by transmitting messages to their neighbors, where a processor receives a message in a given step if and only if it is silent in this step and precisely one of its neighbors transmits. In this paper we prove the existence of a family of radius2 networks on n vertices for which any broadcast schedule requires at least sZ(log * n) rounds of transmissions. This matches an upper bound of O(log * n) rounds for networks of radius 2 proved earlier by BarYehuda, Goldreich, and Itai, in
A Tradeoff between Space and Efficiency for Routing Tables
, 1988
"... Abstract. Two conflicting goals play a crucial role in the design of routing schemes for communication networks. A routing scheme should use paths that are as short as possible for routing messages in the network, while keeping the routing information stored in the processors ’ local memory as succi ..."
Abstract. Two conflicting goals play a crucial role in the design of routing schemes for communication networks. A routing scheme should use paths that are as short as possible for routing messages in the network, while keeping the routing information stored in the processors ’ local memory as succinct as possible. The efficiency of a routing scheme is measured in terms of its stretch factorthe maximum ratio between the length of a route computed by the scheme and that of a shortest path connecting the same pair of vertices. Most previous work has concentrated on finding good routing schemes (with a small fixed stretch factor) for special classes of network topologies. In this paper the problem for general networks is studied, and the entire range of possible stretch factors is examined. The results exhibit a tradeoff between the efficiency of a routing scheme and its space requirements. Almost tight upper and lower bounds for this tradeoff are presented. Specifically, it is proved that any routing scheme for general nvertex networks that achieves a stretch factor k 2 1 must use a total of Q(n ‘+“fzLcJ)) bits of routing information in the networks. This lower bound is complemented by a family Z(k) of hierarchic:al routing schemes (for every k z 1) for unitcost general networks, which guarantee a stretch factor of O(k), require storing a total of O(k. n ‘+(““logn) bits of routing information in the network, name the vertices with O(log’n)bit names and use O(logn)bit headers.
Competitive auctions and digital goods
 In Proc. 12th Symp. on Discrete Alg
, 2001
"... Abstract We study a class of single round, sealed bid auctions for items in unlimited supply such as digital goods. We focus on auctions that are truthful and competitive. Truthful auctions encourage bidders to bid their utility; competitive auctions yield revenue within a constant factor of the rev ..."
Abstract We study a class of single round, sealed bid auctions for items in unlimited supply such as digital goods. We focus on auctions that are truthful and competitive. Truthful auctions encourage bidders to bid their utility; competitive auctions yield revenue within a constant factor of the revenue for optimal fixed pricing. We show that for any truthful auction, even a multiprice auction, the expected revenue does not exceed that for optimal fixed pricing. We also give a bound on how far the revenue for optimal fixed pricing can be from the total market utility. We show that several randomized auctions are truthful and competitive under certain assumptions, and that no truthful deterministic auction is competitive. We present simulation results which confirm that our auctions compare favorably to fixed pricing. Some of our results extend to bounded supply markets, for which we also get truthful and competitive auctions.
On Hiding Information from an Oracle
, 1989
"... : We consider the problem of computing with encrypted data. Player A wishes to know the value f(x) for some x but lacks the power to compute it. Player B has the power to compute f and is willing to send f(y) to A if she sends him y, for any y. Informally, an encryption scheme for the problem f is a ..."
: We consider the problem of computing with encrypted data. Player A wishes to know the value f(x) for some x but lacks the power to compute it. Player B has the power to compute f and is willing to send f(y) to A if she sends him y, for any y. Informally, an encryption scheme for the problem f is a method by which A, using her inferior resources, can transform the cleartext instance x into an encrypted instance y, obtain f(y) from B, and infer f(x) from f(y) in such a way that B cannot infer x from y. When such an encryption scheme exists, we say that f is encryptable. The framework defined in this paper enables us to prove precise statements about what an encrypted instance hides and what it leaks, in an informationtheoretic sense. Our definitions are cast in the language of probability theory and do not involve assumptions such as the intractability of factoring or the existence of oneway functions. We use our framework to describe encryption schemes for some wellknown function...
Twoway communication channels
 in Proc. 4th Berkeley Symp. Math, Statist. Probab
, 1961
"... A twoway communication channel is shown schematically in figure 1. Here xi is an input letter to the channel at terminal 1 and yi an output while x2 is an ..."
A twoway communication channel is shown schematically in figure 1. Here xi is an input letter to the channel at terminal 1 and yi an output while x2 is an
Practical Bayesian Density Estimation Using Mixtures Of Normals
 Journal of the American Statistical Association
, 1995
"... this paper, we propose some solutions to these problems. Our goal is to come up with a simple, practical method for estimating the density. This is an interesting problem in its own right, as well as a first step towards solving other inference problems, such as providing more flexible distributions ..."
this paper, we propose some solutions to these problems. Our goal is to come up with a simple, practical method for estimating the density. This is an interesting problem in its own right, as well as a first step towards solving other inference problems, such as providing more flexible distributions in hierarchical models. To see why the posterior is improper under the usual reference prior, we write the model in the following way. Let Z = (Z 1 ; : : : ; Z n ) and X = (X 1 ; : : : ; X n ). The Z