Results 1 - 10
of
46
Tail Bounds for Occupancy and the Satisfiability Threshold Conjecture
, 1995
"... The classical occupancy problem is concerned with studying the number of empty bins resulting from a random allocation of m balls to n bins. We provide a series of tail bounds on the distribution of the number of empty bins. These tail bounds should find application in randomized algorithms and prob ..."
Abstract
-
Cited by 94 (1 self)
- Add to MetaCart
The classical occupancy problem is concerned with studying the number of empty bins resulting from a random allocation of m balls to n bins. We provide a series of tail bounds on the distribution of the number of empty bins. These tail bounds should find application in randomized algorithms and probabilistic analysis. Our motivating application is the following well-known conjecture on threshold phenomenon for the satisfiability problem. Consider random 3-SAT formulas with cn clauses over n variables, where each clause is chosen uniformly and independently from the space of all clauses of size 3. It has been conjectured that there is a sharp threshold for satisfiability at c ß 4:2. We provide a strong upper bound on the value of c , showing that for c ? 4:758 a random 3-SAT formula is unsatisfiable with high probability. This result is based on a structural property, possibly of independent interest, whose proof needs several applications of the occupancy tail bounds. Supporte...
On the Performance of Object Clustering Techniques
"... We investigate the performance of some of the best-known object clustering algorithms on four different workloads based upon the Tektronix benchmark. For all four workloads, stochastic clustering gave the best performance for a variety of performance metrics. Since stochastic clustering is computati ..."
Abstract
-
Cited by 65 (0 self)
- Add to MetaCart
We investigate the performance of some of the best-known object clustering algorithms on four different workloads based upon the Tektronix benchmark. For all four workloads, stochastic clustering gave the best performance for a variety of performance metrics. Since stochastic clustering is computationally expensive, it is interesting that for every workload there was at least one cheaper clustering algorithm that matched or almost matched stochastic clustering. Unfortunately, for each workload, the algorithm that approximated stochastic clustering was different. Our experiments also demonstrated that even when the workload and object graph are fixed, the choice of the clustering algorithm depends upon the goals of the system. For example, if the goal is to perform well on traversals of small portions of the database starting with a cold cache, the important metric is the per-traversal expansion factor, and a well-chosen placement tree will be nearly optimal; if the goal is to achieve a...
Simple Randomized Mergesort on Parallel Disks
- PARALLEL COMPUTING
, 1996
"... We consider the problem of sorting a file of N records on the D-disk model of parallel I/O [VS94] in which there are two sources of parallelism. Records are transferred to and from disk concurrently in blocks of B contiguous records. In each I/O operation, up to one block can be transferred to or fr ..."
Abstract
-
Cited by 59 (11 self)
- Add to MetaCart
We consider the problem of sorting a file of N records on the D-disk model of parallel I/O [VS94] in which there are two sources of parallelism. Records are transferred to and from disk concurrently in blocks of B contiguous records. In each I/O operation, up to one block can be transferred to or from each of the D disks in parallel. We propose a simple, efficient, randomized mergesort algorithm called SRM that uses a forecast-and-flush approach to overcome the inherent difficulties of simple merging on parallel disks. SRM exhibits a limited use of randomization and also has a useful deterministic version. Generalizing the technique of forecasting [Knu73], our algorithm is able to read in, at any time, the "right" block from any disk, and using the technique of flushing, our algorithm evicts, without any I/O overhead, just the "right" blocks from memory to make space for new ones to be read in. The disk layout of SRM is such that it enjoys perfect write parallelism, avoiding fundamenta...
A Multi-Level WDM Access Protocol for an Optically Interconnected Multiprocessor System
- IEEE/OSA Journal of Lightwave Technology
, 1999
"... Scalable, hierarchical, all-optical WDM networks for processor interconnection in multiprocessor systems have been recently considered. The principal objective of this paper is to introduce an access protocol for this type of network which supports a distributed shared memory(DSM) environment. The o ..."
Abstract
-
Cited by 33 (14 self)
- Add to MetaCart
Scalable, hierarchical, all-optical WDM networks for processor interconnection in multiprocessor systems have been recently considered. The principal objective of this paper is to introduce an access protocol for this type of network which supports a distributed shared memory(DSM) environment. The objectives of the protocol are reduced averagelatency per packet, support of broadcast/multicast, collisionless communication, and exploitation of inherent DSM traffic characteristics. The protocol is based on a hybrid approach that combines reservation access and pre-allocated reception channels for a WDM system. The proposed approach trades maximum capacity for reduced communication latency to improve system response. The performance of the protocol is analyzed through semi-markov analytic and simulation models with varying system parameters such as number of nodes and channels. The performance of the new protocol is compared to a TDM-based protocol and their relative merits are examined. ...
Optimizing Result Prefetching in Web Search Engines with Segmented Indices
- In VLDB
, 2001
"... We study the process in which search engines with segmented indices serve queries. In particular, we investigate the number of result pages which search engines should prepare during the query processing phase. Search engine users have been observed to browse through very few pages of results for qu ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
We study the process in which search engines with segmented indices serve queries. In particular, we investigate the number of result pages which search engines should prepare during the query processing phase. Search engine users have been observed to browse through very few pages of results for queries which they submit. This behavior of users suggests that prefetching many results upon processing an initial query is not efficient, since most of the prefetched results will not be requested by the user who initiated the search. However, a policy which abandons result prefetching in favor of retrieving just the first page of search results might not make optimal use of system resources as well. We argue that for a certain behavior of users, engines should prefetch a constant number of result pages per query. We define a concrete query processing model for search engines with segmented indices, and analyze the cost of such prefetching policies. Based on these costs, we show how to determine the constant which optimizes the prefetching policy. Our results are mostly applicable to local index partitions of the inverted files, but are also applicable to processing of short queries in global index architectures.
Balanced Allocations (Extended Abstract)
- SIAM Journal on Computing
, 1994
"... Suppose that we sequentially place n balls into n boxes by putting each ball into a randomly chosen box. It is well known that when we are done, the fullest box has with high probability ln n= ln ln n(1 + o(1)) balls in it. Suppose instead, that for each ball we choose two boxes at random and place ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Suppose that we sequentially place n balls into n boxes by putting each ball into a randomly chosen box. It is well known that when we are done, the fullest box has with high probability ln n= ln ln n(1 + o(1)) balls in it. Suppose instead, that for each ball we choose two boxes at random and place the ball into the one which is less full at the time of placement. We show that with high probability, the fullest box contains only ln ln n= ln 2+O(1) balls -- exponentially less than before. Furthermore, we show that a similar gap exists in the infinite process, where at each step one ball, chosen uniformly at random, is deleted, and one ball is added in the manner above. We discuss consequences of this and related theorems for dynamic resource allocation, hashing, and on-line load balancing. 1 Introduction Suppose that we sequentially place n balls into n boxes by putting each ball into a randomly chosen box. Properties of this random allocation process have been extensively studied in ...
Approximation Algorithms Via Randomized Rounding: A Survey
- Series in Advanced Topics in Mathematics, Polish Scientific Publishers PWN
, 1999
"... Approximation algorithms provide a natural way to approach computationally hard problems. There are currently many known paradigms in this area, including greedy algorithms, primal-dual methods, methods based on mathematical programming (linear and semidefinite programming in particular), local i ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Approximation algorithms provide a natural way to approach computationally hard problems. There are currently many known paradigms in this area, including greedy algorithms, primal-dual methods, methods based on mathematical programming (linear and semidefinite programming in particular), local improvement, and "low distortion" embeddings of general metric spaces into special families of metric spaces. Randomization is a useful ingredient in many of these approaches, and particularly so in the form of randomized rounding of a suitable relaxation of a given problem. We survey this technique here, with a focus on correlation inequalities and their applications.
Deterministic Packet Marking for Congestion Price Estimation
, 2004
"... Several recent price-based congestion control schemes require relatively accurate path price estimates for successful operation. The proposed addition of the two-bit Explicit Congestion Notification (ECN) field in the IP header provides routers with a mechanism for conveying price information. Recen ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Several recent price-based congestion control schemes require relatively accurate path price estimates for successful operation. The proposed addition of the two-bit Explicit Congestion Notification (ECN) field in the IP header provides routers with a mechanism for conveying price information. Recently, two proposals have emerged for probabilistic packet marking at the routers; the proposals allow receivers to estimate path price from the fraction of marked packets. In this paper we introduce an alternative deterministic marking scheme for encoding path price. Under our approach, each router quantizes the price of its outgoing link to a fixed number of bits. We then make use of the IP identification (IPid)fieldtomapdata packets to different probe types, and each probe type calculates a partial sum of the path price bits. A router deduces its marking behaviour according to the IPid and the TTL (Time To Live) field of each packet. We evaluate the performance of our algorithm in terms of its error in representing the end-toend price, and compare it to probabilistic marking. We show that based on empirical Internet traffic characteristics, our algorithm performs better when estimating path price using small blocks of packets. We also derive the probability distribution of the error for our scheme, and provide a relatively simple bound on its maximum mean-squared error.
Reducing network congestion and blocking probability through balanced allocation
- in: Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science, FOCS
, 1999
"... We compare the performance of a variant of the standard Dynamic Alternative Routing (DAR) technique commonly used in telephone and ATM networks to a path selection algorithm that is based on the balanced allocations principle [4, 18]- the Balanced Dynamic Alternative Routing (BDAR) algorithm. While ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We compare the performance of a variant of the standard Dynamic Alternative Routing (DAR) technique commonly used in telephone and ATM networks to a path selection algorithm that is based on the balanced allocations principle [4, 18]- the Balanced Dynamic Alternative Routing (BDAR) algorithm. While the standard technique checks alternative routes sequentially until available bandwidth is found, the BDAR algorithm compares and chooses the best among a small number of alternatives. We show that, at the expense of a minor increase in routing overhead, the BDAR gives a substantial improvement in network performance in terms of both network congestion and blocking probabilities. 1

