Results 1  10
of
19
A few logs suffice to build (almost) all trees (I)
 II. THEORETICAL COMPUTER SCIENCE
, 1999
"... A phylogenetic tree (also called an "evolutionary tree") is a leaflabelled tree which represents the evolutionary history for a set of species, and the construction of such trees is a fundamental problem in biology. Here we address the issue of how many sequence sites are required in order to recov ..."
Abstract

Cited by 101 (24 self)
 Add to MetaCart
A phylogenetic tree (also called an "evolutionary tree") is a leaflabelled tree which represents the evolutionary history for a set of species, and the construction of such trees is a fundamental problem in biology. Here we address the issue of how many sequence sites are required in order to recover the tree with high probability when the sites evolve under standard Markovstyle i.i.d. mutation models. We provide analytic upper and lower bounds for the required sequence length, by developing a new (and polynomial time) algorithm. In particular we show that when the mutation probabilities are bounded the required sequence length can grow surprisingly slowly (a power of log n) in the number n of sequences, for almost all trees.
Adaptive Parallel Aggregation Algorithms
 In ACM SIGMOD
, 1995
"... Aggregation and duplicate removal are common in SQL queries. However, in the parallel query processing literature, aggregate processing has received surprisingly little attention; furthermore, for each of the traditional parallel aggregation algorithms, there is a range of grouping selectivities whe ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
Aggregation and duplicate removal are common in SQL queries. However, in the parallel query processing literature, aggregate processing has received surprisingly little attention; furthermore, for each of the traditional parallel aggregation algorithms, there is a range of grouping selectivities where the algorithm performs poorly. In this work, we propose new algorithms that dynamically adapt, at query evaluation time, in response to observed grouping selectivities. Performance analysis via analytical modeling and an implementation on a workstationcluster shows that the proposed algorithms are able to perform well for all grouping selectivities. Finally, we study the effect of data skew and show that for certain data sets the proposed algorithms can even outperform the best of traditional approaches. 1 Introduction SQL queries are replete with aggregate and duplicate elimination operations. One measure of the perceived importance of aggregation is that in the proposed TPCD benchmark...
Write Amplification Analysis in FlashBased Solid State Drives
"... Write amplification is a critical factor limiting the random write performance and write endurance in storage devices based on NANDflash memories such as solidstate drives (SSD). The impact of garbage collection on write amplification is influenced by the level of overprovisioning and the choice ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Write amplification is a critical factor limiting the random write performance and write endurance in storage devices based on NANDflash memories such as solidstate drives (SSD). The impact of garbage collection on write amplification is influenced by the level of overprovisioning and the choice of reclaiming policy. In this paper, we present a novel probabilistic model of write amplification for logstructured flashbased SSDs. Specifically, we quantify the impact of overprovisioning on write amplification analytically and by simulation assuming workloads of uniformlydistributed random short writes. Moreover, we propose modified versions of the greedy garbagecollection reclaiming policy and compare their performance. Finally, we analytically evaluate the benefits of separating static and dynamic data in reducing write amplification, and how to address endurance with proper wear leveling.
Some new aspects of the coupon collector’s problem
 SIAM J. Discret. Math
, 2004
"... We extend the classical coupon collector’s problem to one in which two collectors are simultaneously and independently seeking collections of d coupons. We find, in finite terms, the probability that the two collectors finish at the same trial, and we find, using the methods of GesselViennot, the p ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We extend the classical coupon collector’s problem to one in which two collectors are simultaneously and independently seeking collections of d coupons. We find, in finite terms, the probability that the two collectors finish at the same trial, and we find, using the methods of GesselViennot, the probability that the game has the following “ballotlike” character: the two collectors are tied with each other for some initial number of steps, and after that the player who first gains the lead remains ahead throughout the game. As a byproduct we obtain the evaluation in finite terms of certain infinite series whose coefficients are powers and products of Stirling numbers of the second kind. We study the variant of the original coupon collector’s problem in which a single collector wants to obtain at least h copies of each coupon. Here we give a simpler derivation of results of Newman and Shepp, and extend those results. Finally we obtain the distribution of the number of coupons that have been obtained exactly once
On random matrices II
 Studia Sci. Math. Hungar
, 1968
"... This paper is a continuation of our paper [1]. Let #(n) denote the set of all n by n zeroone matrices; let us denote the elements of a matrix M „ E"ff(n) by elk (1 rj_n; 1kn). Let p denote an arbitrary permutation p=(p 1, PI,.. •, pn) of the integers (1, 2,..., n) and H the set of all n! such p ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This paper is a continuation of our paper [1]. Let #(n) denote the set of all n by n zeroone matrices; let us denote the elements of a matrix M „ E"ff(n) by elk (1 rj_n; 1kn). Let p denote an arbitrary permutation p=(p 1, PI,.. •, pn) of the integers (1, 2,..., n) and H the set of all n! such permutations. Let us put
The Number of Nucleotide Sites Needed to Accurately Reconstruct Large Evolutionary Trees
, 1996
"... Biologists seek to reconstruct evolutionary trees for increasing number of species, n, from aligned genetic sequences. How fast the sequence length N must grow, as a function of n, in order to accurately recover the underlying tree with probability 1 \Gamma ffl, if the sequences evolve according to ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Biologists seek to reconstruct evolutionary trees for increasing number of species, n, from aligned genetic sequences. How fast the sequence length N must grow, as a function of n, in order to accurately recover the underlying tree with probability 1 \Gamma ffl, if the sequences evolve according to simple stochastic models of nucleotide substitution? We show that for a certain model,a reconstruction method exists for which the sequence length N can grow surprisingly slowly with n (sublinearly for a wide range of parameters, and even as a power of log n in a narrow range, which roughly meets the lower bound from information theory). By contrast a more traditional technique (maximum compatibility) provably requires N to grow faster than linearly in n. Our approach is based on a new, and computationally efficient approach for reconstructing phylogenetic trees from aligned DNA sequences.
Architectural Considerations For Parallel Query Evaluation Algorithms
"... Parallelism is key to high performance relational database systems. Since there are several parallel architectures suitable for database systems, a few interesting problems arise, mostly from an emphasis on the differences among the architectures. Specifically, in the literature, differences rather ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Parallelism is key to high performance relational database systems. Since there are several parallel architectures suitable for database systems, a few interesting problems arise, mostly from an emphasis on the differences among the architectures. Specifically, in the literature, differences rather than similarities between the architectures are pointed out, and the specific details of a particular architecture, crucial to high performance, are generally ignored. In this thesis we have attempted to remedy this situation by emphasizing the similarities and a deeper understanding of two popular parallel architectures, shared nothing and shared memory, from a database perspective. We show that there is complementarity and similarity in the two architectures by showing that software sharedmemory support can be used to improve performance on sharednothing hardware and by showing that sharednothing software can run on sharedmemory hardware with performance comparable to that of "native" ...
The collector's brotherhood problem using the NewmanShepp symbolic method
"... Abstract. Further computations are made on the traditional coupon collector’s problem when the collector shares his harvest with his younger brothers. When the book of the pth brother of the collector is completed, the books of the younger brothers have certain numbers of empty spots. On the averag ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. Further computations are made on the traditional coupon collector’s problem when the collector shares his harvest with his younger brothers. When the book of the pth brother of the collector is completed, the books of the younger brothers have certain numbers of empty spots. On the average, how many? Several answers can be brought to that question. 1. Introduction. This paper on the traditional and recurrent Collector’s Problem, that uses the NewmanShepp method, one of those symbolic approaches dear to GianCarlo Rota, is dedicated to his memory. Suppose that m different coupons are needed for a collector, to complete his picturebook. Each time, that is, at each purchase, he can get a given
A network coding approach to IP traceback
 in Proc. of IEEE International Symposium on Network Coding (NetCod
, 2010
"... Abstract—Traceback schemes aim at identifying the source(s) of a sequence of packets and the nodes these packets traversed. This is useful for tracing the sources of high volume traffic, e.g., in Distributed DenialofService (DDoS) attacks. In this paper, we are particularly interested in Probabili ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract—Traceback schemes aim at identifying the source(s) of a sequence of packets and the nodes these packets traversed. This is useful for tracing the sources of high volume traffic, e.g., in Distributed DenialofService (DDoS) attacks. In this paper, we are particularly interested in Probabilistic Packet Marking (PPM) schemes, where intermediate nodes probabilistically mark packets with information about their identity and the receiver uses information from several packets to reconstruct the paths they have traversed. Our work is inspired by two observations. First, PPM is essentially a coupon collector’s problem [1], [2]. Second, the coupon collector’s problem significantly benefits from network coding ideas [3], [4]. Based on these observations, we propose a network codingbased approach (PPM+NC) that marks packets with random linear combinations of router IDs, instead of individual router IDs. We demonstrate its benefits through analysis. We then propose a practical PPM+NC scheme based on the main PPM+NC idea, but also taking into account the limited bit budget in the IP header available for marking and other practical constraints. Simulation results show that our scheme significantly reduces the number of packets needed to reconstruct the attack graph, in both single and multipath scenarios, thus increasing the speed of tracing the attack back to its source(s). I.
Random Walks on Nilpotent Groups
, 2001
"... We obtain sharp bounds on mixing time of random walks on nilpotent groups, with Hall bases as generating sets. ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We obtain sharp bounds on mixing time of random walks on nilpotent groups, with Hall bases as generating sets.