Results 1  10
of
15
Combinatorial algorithms for DNA sequence assembly
 Algorithmica
, 1993
"... The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The seq ..."
Abstract

Cited by 43 (3 self)
 Add to MetaCart
The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The sequence reconstruction problem that we take as our formulation of DNA sequence assembly is a variation of the shortest common superstring problem, complicated by the presence of sequencing errors and reverse complements of fragments. Since the simpler superstring problem is NPhard, any efficient reconstruction procedure must resort to heuristics. In this paper, however, a four phase approach based on rigorous design criteria is presented, and has been found to be very accurate in practice. Our method is robust in the sense that it can accommodate high sequencing error rates and list a series of alternate solutions in the event that several appear equally good. Moreover it uses a limited form ...
Gathering Correlated Data in Sensor Networks
 In Proc. ACM Joint Workshop on Foundations of Mobile Computing (DIALMPOMC
, 2004
"... In this paper, we consider energyefficient gathering of correlated data in sensor networks. We focus on singleinput coding strategies in order to aggregate correlated data. For foreign coding we propose the MEGA algorithm which yields a minimumenergy data gathering topology in O ( n 3) time. We a ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
In this paper, we consider energyefficient gathering of correlated data in sensor networks. We focus on singleinput coding strategies in order to aggregate correlated data. For foreign coding we propose the MEGA algorithm which yields a minimumenergy data gathering topology in O ( n 3) time. We also consider selfcoding for which the problem of finding an optimal data gathering tree was recently shown to be NPcomplete; with LEGA, we present the first approximation algorithm for this problem with approximation ratio 2(1 + √ 2) and running time O(m + n log n). Categories and Subject Descriptors:
Algorithms for Delta Compression and Remote File Synchronization
 In Khalid Sayood, editor, Lossless Compression Handbook
, 2002
"... Delta compression and remote file synchronization techniques are concerned with efficient file transfer over a slow communication link in the case where the receiving party already has a similar file (or files). This problem arises naturally, e.g., when distributing updated versions of software o ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
Delta compression and remote file synchronization techniques are concerned with efficient file transfer over a slow communication link in the case where the receiving party already has a similar file (or files). This problem arises naturally, e.g., when distributing updated versions of software over a network or synchronizing personal files between different accounts and devices. More generally, the problem is becoming increasingly common in many networkbased applications where files and content are widely replicated, frequently modified, and cut and reassembled in different contexts and packagings.
ClusterBased Delta Compression of a Collection of Files
 In Third Int. Conf. on Web Information Systems Engineering
, 2002
"... Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. In this paper, we study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by p ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. In this paper, we study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by performing a sequence of pairwise delta compressions. The problem of finding an optimal delta encoding for a collection of files by taking pairwise deltas can be reduced to the problem of computing a branching of maximum weight in a weighted directed graph, but this solution is inefficient and thus does not scale to larger file collections. This motivates us to propose a framework for clusterbased delta compression that uses text clustering techniques to prune the graph of possible pairwise delta encodings. To demonstrate the efficacy of our approach, we present experimental results on collections of web pages. Our experiments show that clusterbased delta compression of collections provides significant improvements in compression ratio as compared to individually compressing each file or using tar+gzip, at a moderate cost in efficiency.
Band Ordering in Lossless Compression of Multispectral Images
 IEEE Transactions on Computers
, 1994
"... This paper examines the compression benefits that can be obtained by reordering the bands of a multispectral image. In particular, we consider a model of lossless image compression in which each band of a multispectral image is coded using a prediction function involving values from a previously cod ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
This paper examines the compression benefits that can be obtained by reordering the bands of a multispectral image. In particular, we consider a model of lossless image compression in which each band of a multispectral image is coded using a prediction function involving values from a previously coded band of the compression. Clearly, the ordering of the bands determines which bands can be used for prediction, and this, in turn, can strongly influence compression performance. We present an efficient algorithm for computing the optimal band ordering for a multispectral image. This algorithm has time complexity O(n 2 ) for an nband image, while the naive algorithm takes time \Omega\Gamma n!). We also define a slight variant of the optimal ordering problem that is motivated by some practical concerns on band extraction, and prove that this problem is NPhard, and hence computationally infeasible, in all cases except for the most trivial possibility. In addition, we report on our experi...
Arborescence optimization problems solvable by Edmonds’ algorithm
 Theoretical Computer Science
, 2003
"... Abstract. We consider a general class of optimization problems regarding spanning trees in directed graphs (arborescences). We present an algorithm for solving such problems, which can be considered as a generalization of Edmonds ’ algorithm for the solution of the minimumcost arborescence problem. ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Abstract. We consider a general class of optimization problems regarding spanning trees in directed graphs (arborescences). We present an algorithm for solving such problems, which can be considered as a generalization of Edmonds ’ algorithm for the solution of the minimumcost arborescence problem. The considered class of optimization problems includes as special cases the standard minimumcost arborescence problem, the bottleneck and the lexicographically optimal arborescence problem.
Optimal peertopeer technique for massive content distribution
 in Proceedings of INFOCOM
, 2008
"... Abstract—A distinct trend has emerged that the Internet is used to transport data on a more and more massive scale. Capacity shortage in the backbone networks has become a genuine possibility, which will be more serious with fiberbased access. The problem addressed in this paper is how to conduct m ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract—A distinct trend has emerged that the Internet is used to transport data on a more and more massive scale. Capacity shortage in the backbone networks has become a genuine possibility, which will be more serious with fiberbased access. The problem addressed in this paper is how to conduct massive content distribution efficiently in the future network environment where the capacity limitation can equally be at the core or the edge. We propose a novel peertopeer technique as a main content transport mechanism to achieve efficient network resource utilization. The technique uses multiple trees for distributing different file pieces, which at the heart is a version of swarming. In this paper, we formulate an optimization problem for determining an optimal set of distribution trees as well as the rate of distribution on each tree under bandwidth limitation at arbitrary places in the network. The optimal solution can be found by a distributed algorithm. The results of the paper not only provide standalone solutions to the massive content distribution problem, but should also help the understanding of existing distribution techniques such as BitTorrent or FastReplica.
Learning Linear Dynamical Systems without Sequence Information
"... Virtually all methods of learning dynamic systems from data start from the same basic assumption: that the learning algorithm will be provided with a sequence, or trajectory, of data generated from the dynamic system. In this paper we consider the case where the data is not sequenced. The learning a ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Virtually all methods of learning dynamic systems from data start from the same basic assumption: that the learning algorithm will be provided with a sequence, or trajectory, of data generated from the dynamic system. In this paper we consider the case where the data is not sequenced. The learning algorithm is presented a set of data points from the system’s operation but with no temporal ordering. The data are simply drawn as individual disconnected points. While making this assumption may seem absurd at first glance, we observe that many scientific modeling tasks have exactly this property. In this paper we restrict our attention to learning linear, discrete time models. We propose several algorithms for learning these models based on optimizing approximate likelihood functions and test the methods on several synthetic data sets. 1.
Compression file collections with a TSPbased approach
, 2004
"... Delta compression techniques solve the problem of encoding a given target file with respect to one or more reference files. Recent work in [15, 12, 7] has demonstrated the benefits of using such techniques in the context of file collection compression. In these scenarios, files are often better comp ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Delta compression techniques solve the problem of encoding a given target file with respect to one or more reference files. Recent work in [15, 12, 7] has demonstrated the benefits of using such techniques in the context of file collection compression. In these scenarios, files are often better compressed by computing deltas with respect to other similar files from the same collection, as opposed to compressing each file by itself. It is known that the optimal set of such delta encodings, assuming that only a single reference file is used for each target file, can be found by computing an optimal branching on a directed graph. In this paper we propose two techniques for improving the compression of file collections. The first one utilizes deltas computed with respect to more than one file, while the second one improves the compressibility of batched file collections, such as tar archives, using standard compression tools. Both techniques are based on a reduction to the Traveling Sales Person problem on directed weighted graphs. We present experiments demonstrating the benefits of our methods. 1 1
An additive branchandbound algorithm for the pickup and delivery traveling salesman problem with lifo loading. submitted
"... This paper introduces an additive branchandbound algorithm for two variants of the pickup and delivery traveling salesman problem in which loading and unloading operations have to be performed either in a LastInFirstOut (LIFO) or in a FirstInFirstOut (FIFO) order. Two relaxations are used wi ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper introduces an additive branchandbound algorithm for two variants of the pickup and delivery traveling salesman problem in which loading and unloading operations have to be performed either in a LastInFirstOut (LIFO) or in a FirstInFirstOut (FIFO) order. Two relaxations are used within the additive approach: the assignment problem and the shortest spanning rarborescence problem. The quality of the lower bounds is further improved by a set of elimination rules applied at each node of the search tree to remove from the problem arcs that cannot belong to feasible solutions because of precedence relationships. The performance of the algorithm and the effectiveness of the elimination rules are assessed on instances from the literature.