Results 1 - 10
of
11
Combinatorial algorithms for DNA sequence assembly
- Algorithmica
, 1993
"... The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The seq ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The sequence reconstruction problem that we take as our formulation of DNA sequence assembly is a variation of the shortest common superstring problem, complicated by the presence of sequencing errors and reverse complements of fragments. Since the simpler superstring problem is NP-hard, any efficient reconstruction procedure must resort to heuristics. In this paper, however, a four phase approach based on rigorous design criteria is presented, and has been found to be very accurate in practice. Our method is robust in the sense that it can accommodate high sequencing error rates and list a series of alternate solutions in the event that several appear equally good. Moreover it uses a limited form ...
Gathering Correlated Data in Sensor Networks
- In Proc. of the ACM Joint Workshop on Foundations of Mobile Computing (DIALM-POMC
, 2004
"... In this paper, we consider energy-e#cient gathering of correlated data in sensor networks. We focus on single-input coding strategies in order to aggregate correlated data. For foreign coding we propose the MEGA algorithm which yields a minimum-energy data gathering topology in O time. We also ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
In this paper, we consider energy-e#cient gathering of correlated data in sensor networks. We focus on single-input coding strategies in order to aggregate correlated data. For foreign coding we propose the MEGA algorithm which yields a minimum-energy data gathering topology in O time. We also consider self-coding for which the problem of finding an optimal data gathering tree was recently shown to be NP-complete; with LEGA, we present the first approximation algorithm for this problem with approximation ratio 2(1 + # 2) and running time O(m + n log n).
Cluster-Based Delta Compression of a Collection of Files
- In Third Int. Conf. on Web Information Systems Engineering
, 2002
"... Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. In this paper, we study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by p ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. In this paper, we study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by performing a sequence of pairwise delta compressions. The problem of finding an optimal delta encoding for a collection of files by taking pairwise deltas can be reduced to the problem of computing a branching of maximum weight in a weighted directed graph, but this solution is inefficient and thus does not scale to larger file collections. This motivates us to propose a framework for cluster-based delta compression that uses text clustering techniques to prune the graph of possible pairwise delta encodings. To demonstrate the efficacy of our approach, we present experimental results on collections of web pages. Our experiments show that cluster-based delta compression of collections provides significant improvements in compression ratio as compared to individually compressing each file or using tar+gzip, at a moderate cost in efficiency.
Algorithms for Delta Compression and Remote File Synchronization
- In Khalid Sayood, editor, Lossless Compression Handbook
, 2002
"... Delta compression and remote file synchronization techniques are concerned with efficient file transfer over a slow communication link in the case where the receiving party already has a similar file (or files). This problem arises naturally, e.g., when distributing updated versions of software o ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
Delta compression and remote file synchronization techniques are concerned with efficient file transfer over a slow communication link in the case where the receiving party already has a similar file (or files). This problem arises naturally, e.g., when distributing updated versions of software over a network or synchronizing personal files between different accounts and devices. More generally, the problem is becoming increasingly common in many networkbased applications where files and content are widely replicated, frequently modified, and cut and reassembled in different contexts and packagings.
Band Ordering in Lossless Compression of Multispectral Images
- IEEE Transactions on Computers
, 1994
"... This paper examines the compression benefits that can be obtained by reordering the bands of a multispectral image. In particular, we consider a model of lossless image compression in which each band of a multispectral image is coded using a prediction function involving values from a previously cod ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper examines the compression benefits that can be obtained by reordering the bands of a multispectral image. In particular, we consider a model of lossless image compression in which each band of a multispectral image is coded using a prediction function involving values from a previously coded band of the compression. Clearly, the ordering of the bands determines which bands can be used for prediction, and this, in turn, can strongly influence compression performance. We present an efficient algorithm for computing the optimal band ordering for a multispectral image. This algorithm has time complexity O(n 2 ) for an n-band image, while the naive algorithm takes time \Omega\Gamma n!). We also define a slight variant of the optimal ordering problem that is motivated by some practical concerns on band extraction, and prove that this problem is NPhard, and hence computationally infeasible, in all cases except for the most trivial possibility. In addition, we report on our experi...
Arborescence optimization problems solvable by Edmonds’ algorithm
- Theoretical Computer Science
, 2003
"... Abstract. We consider a general class of optimization problems regarding spanning trees in directed graphs (arborescences). We present an algorithm for solving such problems, which can be considered as a generalization of Edmonds ’ algorithm for the solution of the minimum-cost arborescence problem. ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Abstract. We consider a general class of optimization problems regarding spanning trees in directed graphs (arborescences). We present an algorithm for solving such problems, which can be considered as a generalization of Edmonds ’ algorithm for the solution of the minimum-cost arborescence problem. The considered class of optimization problems includes as special cases the standard minimum-cost arborescence problem, the bottleneck and the lexicographically optimal arborescence problem.
Compression file collections with a TSP-based approach
, 2004
"... Delta compression techniques solve the problem of encoding a given target file with respect to one or more reference files. Recent work in [15, 12, 7] has demonstrated the benefits of using such techniques in the context of file collection compression. In these scenarios, files are often better comp ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Delta compression techniques solve the problem of encoding a given target file with respect to one or more reference files. Recent work in [15, 12, 7] has demonstrated the benefits of using such techniques in the context of file collection compression. In these scenarios, files are often better compressed by computing deltas with respect to other similar files from the same collection, as opposed to compressing each file by itself. It is known that the optimal set of such delta encodings, assuming that only a single reference file is used for each target file, can be found by computing an optimal branching on a directed graph. In this paper we propose two techniques for improving the compression of file collections. The first one utilizes deltas computed with respect to more than one file, while the second one improves the compressibility of batched file collections, such as tar archives, using standard compression tools. Both techniques are based on a reduction to the Traveling Sales Person problem on directed weighted graphs. We present experiments demonstrating the benefits of our methods. 1 1
Learning Linear Dynamical Systems without Sequence Information
"... Virtually all methods of learning dynamic systems from data start from the same basic assumption: that the learning algorithm will be provided with a sequence, or trajectory, of data generated from the dynamic system. In this paper we consider the case where the data is not sequenced. The learning a ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Virtually all methods of learning dynamic systems from data start from the same basic assumption: that the learning algorithm will be provided with a sequence, or trajectory, of data generated from the dynamic system. In this paper we consider the case where the data is not sequenced. The learning algorithm is presented a set of data points from the system’s operation but with no temporal ordering. The data are simply drawn as individual disconnected points. While making this assumption may seem absurd at first glance, we observe that many scientific modeling tasks have exactly this property. In this paper we restrict our attention to learning linear, discrete time models. We propose several algorithms for learning these models based on optimizing approximate likelihood functions and test the methods on several synthetic data sets. 1.
An additive branch-andbound algorithm for the pickup and delivery traveling salesman problem with lifo loading. submitted
"... This paper introduces an additive branch-and-bound algorithm for two variants of the pickup and delivery traveling salesman problem in which loading and unloading operations have to be performed either in a Last-In-First-Out (LIFO) or in a First-In-First-Out (FIFO) order. Two relaxations are used wi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper introduces an additive branch-and-bound algorithm for two variants of the pickup and delivery traveling salesman problem in which loading and unloading operations have to be performed either in a Last-In-First-Out (LIFO) or in a First-In-First-Out (FIFO) order. Two relaxations are used within the additive approach: the assignment problem and the shortest spanning r-arborescence problem. The quality of the lower bounds is further improved by a set of elimination rules applied at each node of the search tree to remove from the problem arcs that cannot belong to feasible solutions because of precedence relationships. The performance of the algorithm and the effectiveness of the elimination rules are assessed on instances from the literature.
Approximate Maximum Weight Branchings
, 2005
"... We consider a special subgraph of a weighted directed graph: one comprising only the k heaviest edges incoming to each vertex. We show that the maximum weight branching in this subgraph closely approximates the maximum weight branching in the original graph. Specifi-cally, it is within a factor of k ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We consider a special subgraph of a weighted directed graph: one comprising only the k heaviest edges incoming to each vertex. We show that the maximum weight branching in this subgraph closely approximates the maximum weight branching in the original graph. Specifi-cally, it is within a factor of k k+1. Our interest in finding branchings in this subgraph is motivated by a data compression application in which calculating edge weights is expensive but estimating which are the heaviest k incoming edges is easy. An additional benefit is that since algorithms for finding branchings run in time linear in the number of edges our results imply faster algorithms although we sacrifice optimality by a small factor. We also extend our results to the case of edge-disjoint branching of maximum weight and to maximum weight spanning forests.

