Results 1 - 10
of
18
Incremental Clustering and Dynamic Information Retrieval
, 1997
"... Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retri ..."
Abstract
-
Cited by 129 (3 self)
- Add to MetaCart
Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retrieval application, and which should also be useful in other applications. The goal is to efficiently maintain clusters of small diameter as new points are inserted. We analyze several natural greedy algorithms and demonstrate that they perform poorly. We propose new deterministic and randomized incremental clustering algorithms which have a provably good performance. We complement our positive results with lower bounds on the performance of incremental algorithms. Finally, we consider the dual clustering problem where the clusters are of fixed diameter, and the goal is to minimize the number of clusters. 1 Introduction We consider the following problem: as a sequence of points from a metric...
The Online Median Problem
- In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science
, 2000
"... We introduce a natural variant of the (metric uncapacitated) k-median problem that we call the online median problem. Whereas the k-median problem involves optimizing the simultaneous placement of k facilities, the online median problem imposes the following additional constraints: the facilities ar ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
We introduce a natural variant of the (metric uncapacitated) k-median problem that we call the online median problem. Whereas the k-median problem involves optimizing the simultaneous placement of k facilities, the online median problem imposes the following additional constraints: the facilities are placed one at a time; a facility cannot be moved once it is placed, and the total number of facilities to be placed, k, is not known in advance. The objective of an online median algorithm is to minimize the competitive ratio, that is, the worst-case ratio of the cost of an online placement to that of an optimal offline placement. Our main result is a linear-time constant-competitive algorithm for the online median problem. In addition, we present a related, though substantially simpler, linear-time constant-factor approximation algorithm for the (metric uncapacitated) facility location problem. The latter algorithm is similar in spirit to the recent primal-dual-based facility location algorithm of Jain and Vazirani, but our approach is more elementary and yields an improved running time.
Algorithmic problems in power management
- SIGACT News
, 2005
"... We survey recent research that has appeared in the theoretical computer science literature on algorithmic ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
We survey recent research that has appeared in the theoretical computer science literature on algorithmic
Competitive Hill-Climbing Strategies for Replica Placement in a Distributed File System
- In DISC
, 2001
"... 1 Introduction This paper analyzes algorithms for automated placement of file replicas in the Farsite [3] system, using both theory and simulation. In the Farsite distributed file system, multiple replicas of files are stored on multiple machines, so that files can be accessed even if some of the ma ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
1 Introduction This paper analyzes algorithms for automated placement of file replicas in the Farsite [3] system, using both theory and simulation. In the Farsite distributed file system, multiple replicas of files are stored on multiple machines, so that files can be accessed even if some of the machines are down or inaccessible. The purpose of the placement algorithm is to determine an assignment of file replicas to machines that maximally exploits the availability provided by machines. The file placement algorithm is given a fixed value, R, for the number of replicas of each file. For systems reasons, we are most interested in a value of R = 3 [9]. However, to ensure that our results are not excessively sensitive to the file replication factor, we also provide tight bounds for R = 2 and lower bounds for all R (tight at different values of R).
On broadcast disk paging
- SIAM Journal on Computing
, 1998
"... Abstract. Broadcast disks are an emerging paradigm for massive data dissemination. In a broadcast disk, data is divided into n equal-sized pages, and pages are broadcast in a round-robin fashion by a server. Broadcast disks are effective because many clients can simultaneously retrieve any transmitt ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
Abstract. Broadcast disks are an emerging paradigm for massive data dissemination. In a broadcast disk, data is divided into n equal-sized pages, and pages are broadcast in a round-robin fashion by a server. Broadcast disks are effective because many clients can simultaneously retrieve any transmitted data. Paging is used by the clients to improve performance, much as in virtual memory systems. However, paging on broadcast disks differs from virtual memory paging in at least two fundamental aspects: • A page fault in the broadcast disk model has a variable cost that depends on the requested page as well as the current state of the broadcast. • Prefetching is both natural and a provably essential mechanism for achieving significantly better competitive ratios in broadcast disk paging. In this paper, we design a deterministic algorithm that uses prefetching to achieve an O(n log k) competitive ratio for the broadcast disk paging problem, where k denotes the size of the client’s cache. We also show a matching lower bound of Ω(n log k) that applies even when the adversary is not allowed to use prefetching. In contrast, we show that when prefetching is not allowed, no deterministic online algorithm can achieve a competitive ratio better than Ω(nk). Moreover, we show a lower bound of Ω(n log k) on the competitive ratio achievable by any nonprefetching randomized algorithm against an oblivious adversary. These lower bounds are trivially matched from above by known results about deterministic and randomized marking algorithms for paging. An interpretation of our results is that in the broadcast disk paging, prefetching is a perfect substitute for randomization.
The Accommodating Function - a generalization of the competitive ratio
- In Sixth International Workshop on Algorithms and Data Structures, volume 1663 of Lecture Notes in Computer Science
, 1998
"... A new measure, the accommodating function, for the quality of on-line algorithms is presented. The accommodating function, which is a generalization of both the competitive ratio and the accommodating ratio, measures the quality of an on-line algorithm as a function of the resources that would be su ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
A new measure, the accommodating function, for the quality of on-line algorithms is presented. The accommodating function, which is a generalization of both the competitive ratio and the accommodating ratio, measures the quality of an on-line algorithm as a function of the resources that would be sufficient for an optimal algorithm to fully grant all requests. More precisely, if we have some amount of resources n, the function value at ff is the usual ratio (still on some fixed amount of resources n), except that input sequences are restricted to those where all requests could have been fully granted by an optimal algorithm if it had had the amount of resources ffn. The accommodating functions for three specific on-line problems are investigated: a variant of bin-packing in which the goal is to maximize the number of objects put in n bins, the seat reservation problem, and the problem of optimizing total flow time when preemption is allowed.
Caching and Scheduling for Broadcast Disk Systems
- in Proceedings of the 2nd Workshop on Algorithm Engineering and Experiments (ALENEX
, 1998
"... Unicast connections lead to performance and scalability problems when a large client population attempts to access the same data. Broadcast push and broadcast disk technology address the problem by broadcasting data items from a server to a large number of clients. Broadcast disk performance depends ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Unicast connections lead to performance and scalability problems when a large client population attempts to access the same data. Broadcast push and broadcast disk technology address the problem by broadcasting data items from a server to a large number of clients. Broadcast disk performance depends mainly on caching strategies at the client site and on how the broadcast is scheduled at the server site. An on-line broadcast disk paging strategy makes caching decisions without knowing access probabilities. In this paper, we subject on-line paging algorithms to extensive empirical investigation. The Gray algorithm [25] always outperformed other on-line strategies on both synthetic and Web traces. Moreover, caching limited the skewness needed from a broadcast schedule, and led to favor efficient caching algorithms over refined scheduling strategies when the cache was not small. Prior to this paper, no work had empirically investigated on-line paging algorithms and their relation with serv...
Modeling Replica Placement in a Distributed File System: Narrowing the Gap between Competitive Analysis and Simulation
- Proceedings of 9th ESA
, 2001
"... We examine the replica placement aspect of a distributed peer-to-peer le system that replicates and stores files on ordinary desktop computers. It has been shown that some desktop machines are available for a greater fraction of time than others, and it is crucial not to place all replicas of any fi ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
We examine the replica placement aspect of a distributed peer-to-peer le system that replicates and stores files on ordinary desktop computers. It has been shown that some desktop machines are available for a greater fraction of time than others, and it is crucial not to place all replicas of any file on machines with low availability. In this paper we study the efficacy of three hill-climbing algorithms for file replica placement. Based on large-scale measurements, we assume that the distribution of machine availabilities be uniform. Among other results we show that the MinMax algorithm is competitive, and that for growing replication factor the MinMax and MinRand algorithms have the same asymptotic worst-case efficacy.
Multiagent Cooperative Search for Portfolio Selection
, 2001
"... this paper because we assume throughout that the total initial wealth of all systems of agents is $1 ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
this paper because we assume throughout that the total initial wealth of all systems of agents is $1
The Dynamic Servers Problem
, 1998
"... Introduction. We introduce the dynamic servers problem, a generalization of the k-server problem [11]. This problem is a simultaneous abstraction for problems arising in a variety of applications described below and appears to be of theoretical significance as a natural new paradigm in online algori ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Introduction. We introduce the dynamic servers problem, a generalization of the k-server problem [11]. This problem is a simultaneous abstraction for problems arising in a variety of applications described below and appears to be of theoretical significance as a natural new paradigm in online algorithms. We study both the offline and online versions of this problem. Our results are based on a geometric reformulation of the dynamic servers problem that leads to interesting connections with Steiner trees and geometric partitioning problems, and our results may be of independent interest in that context. The k-server problem is the following: coordinate the movement of k mobile servers in a metric space, so as to process a sequence of requests at the points of the metric space, where a request is processed by moving a server to its location. The goal is to minimize the total distance traveled by the servers. I

