Results 1  10
of
138
Approximate Join Processing Over Data Streams
, 2003
"... We consider the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources. In our model, we deal with resource constraints by shedding load in the form of dropping tuples from the data streams. We first discuss alternate architectural m ..."
Abstract

Cited by 122 (3 self)
 Add to MetaCart
We consider the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources. In our model, we deal with resource constraints by shedding load in the form of dropping tuples from the data streams. We first discuss alternate architectural models for data stream join processing, and we survey suitable measures for the quality of an approximation of a setvalued query result. We then consider the number of generated result tuples as the quality measure, and we give optimal offline and fast online algorithms for it. In a thorough experimental study with synthetic and real data we show the efficacy of our solutions. For applications with demand for exact results we introduce a new Archivemetric which captures the amount of work needed to complete the join in case the streams are archived for later processing.
GloballyOptimal Greedy Algorithms for Tracking a Variable Number of Objects
"... We analyze the computational problem of multiobject tracking in video sequences. We formulate the problem using a cost function that requires estimating the number of tracks, as well as their birth and death states. We show that the global solution can be obtained with a greedy algorithm that seque ..."
Abstract

Cited by 93 (1 self)
 Add to MetaCart
(Show Context)
We analyze the computational problem of multiobject tracking in video sequences. We formulate the problem using a cost function that requires estimating the number of tracks, as well as their birth and death states. We show that the global solution can be obtained with a greedy algorithm that sequentially instantiates tracks using shortest path computations on a flow network. Greedy algorithms allow one to embed preprocessing steps, such as nonmax suppression, within the tracking algorithm. Furthermore, we give a nearoptimal algorithm based on dynamic programming which runs in time linear in the number of objects and linear in the sequence length. Our algorithms are fast, simple, and scalable, allowing us to process dense input data. This results in stateoftheart performance. 1.
Quincy: Fair Scheduling for Distributed Computing Clusters
"... This paper addresses the problem of scheduling concurrent jobs on clusters where application data is stored on the computing nodes. This setting, in which scheduling computations close to their data is crucial for performance, is increasingly common and arises in systems such as MapReduce, Hadoop, a ..."
Abstract

Cited by 72 (1 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of scheduling concurrent jobs on clusters where application data is stored on the computing nodes. This setting, in which scheduling computations close to their data is crucial for performance, is increasingly common and arises in systems such as MapReduce, Hadoop, and Dryad as well as many gridcomputing environments. We argue that data intensive computation benefits from a finegrain resource sharing model that differs from the coarser semistatic resource allocations implemented by most existing cluster computing architectures. The problem of scheduling with locality and fairness constraints has not previously been extensively studied under this model of resourcesharing. We introduce a powerful and flexible new framework for scheduling concurrent distributed jobs with finegrain resource sharing. The scheduling problem is mapped to a graph datastructure, where edge weights and capacities encode the competing demands of data locality, fairness, and starvationfreedom, and a standard solver computes the optimal online schedule according to a global cost model. We evaluate our implementation of this framework, which we call Quincy, on a cluster of a few hundred computers using a varied workload of data and CPUintensive jobs. We evaluate Quincy against an existing queuebased algorithm and implement several policies for each scheduler, with and without fairness constraints. Quincy gets better fairness when fairness is requested, while substantially improving data locality. The volume of data transferred across the cluster is reduced by up to a factor of 3.9 in our experiments, leading to a throughput increase of up to 40%.
An Efficient Cost Scaling Algorithm for the Assignment Problem
 MATH. PROGRAM
, 1995
"... The cost scaling pushrelabel method has been shown to be efficient for solving minimumcost flow problems. In this paper we apply the method to the assignment problem and investigate implementations of the method that take advantage of assignment's special structure. The results show that the ..."
Abstract

Cited by 51 (1 self)
 Add to MetaCart
The cost scaling pushrelabel method has been shown to be efficient for solving minimumcost flow problems. In this paper we apply the method to the assignment problem and investigate implementations of the method that take advantage of assignment's special structure. The results show that the method is very promising for practical use.
A NetworkAware Distributed Storage Cache for Data Intensive Environments
 Proceeding of IEEE High Performance Distributed Computing conference (HPDC8
, 1999
"... Modern scientific computing involves organizing, moving, visualizing, and analyzing massive amounts of data at multiple sites around the world. The technologies, the middleware services, and the architectures that are used to build useful highspeed, wide area distributed systems, constitute the fie ..."
Abstract

Cited by 47 (5 self)
 Add to MetaCart
(Show Context)
Modern scientific computing involves organizing, moving, visualizing, and analyzing massive amounts of data at multiple sites around the world. The technologies, the middleware services, and the architectures that are used to build useful highspeed, wide area distributed systems, constitute the field of data intensive computing. In this paper we will describe an architecture for data intensive applications where we use a highspeed distributed data cache as a common element for all of the sources and sinks of data. This cachebased approach provides standard interfaces to a large, applicationoriented, distributed, online, transient storage system. We describe our implementation of this cache, how we have made it "network aware," and how we do dynamic load balancing based on the current network conditions. We also show large increases in application throughput by access to knowledge of the network conditions. 1.0 Introduction Highspeed data streams resulting from the operatio...
A Truncated PrimalInfeasible DualFeasible Network Interior Point Method
, 1996
"... In this paper we introduce the truncated primalinfeasible dualfeasible interior point algorithm for linear programming and describe an implementation of this algorithm for solving the minimum cost network flow problem. In each iteration, the linear system that determines the search direction is co ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
In this paper we introduce the truncated primalinfeasible dualfeasible interior point algorithm for linear programming and describe an implementation of this algorithm for solving the minimum cost network flow problem. In each iteration, the linear system that determines the search direction is computed inexactly, and the norm of the resulting residual vector is used in the stopping criteria of the iterative solver employed for the solution of the system. In the implementation, a preconditioned conjugate gradient method is used as the iterative solver. The details of the implementation are described and the code, pdnet, is tested on a large set of standard minimum cost network flow test problems. Computational results indicate that the implementation is competitive with stateoftheart network flow codes.
A minutiabased partial fingerprint recognition system
 THE JOURNAL OF PATTERN RECOGNITION
, 2005
"... Matching incomplete or partial fingerprints continues to be an important challenge today, despite the advances made in fingerprint identification techniques. While the introduction of compact silicon chipbased sensors that capture only part of the fingerprint has made this problem important from a ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
(Show Context)
Matching incomplete or partial fingerprints continues to be an important challenge today, despite the advances made in fingerprint identification techniques. While the introduction of compact silicon chipbased sensors that capture only part of the fingerprint has made this problem important from a commercial perspective, there is also considerable interest in processing partial and latent fingerprints obtained at crime scenes. When the partial print does not include structures such as core and delta, common matching methods based on alignment of singular structures fail. We present an approach that uses localized secondary features derived from relative minutiae information. A flow networkbased matching technique is introduced to obtain onetoone correspondence of secondary features. Our method balances the tradeoffs between maximizing the number of matches and minimizing total feature distance between query and reference fingerprints. A twohiddenlayer fully connected neural network is trained to generate the final similarity score based on minutiae matched in the overlapping areas. Since the minutiabased fingerprint representation is an ANSINIST standard, our approach has the advantage of being directly applicable to existing databases. We present results of testing on FVC2002’s DB1 and DB2 databases.
Mobility limited flipbased sensor networks deployment
 in Proceedings of IEEE MASS
, 2005
"... An important phase of sensor networks operation is deployment of sensors in the field of interest. Critical goals during sensor networks deployment include coverage, connectivity, load balancing etc. A class of work has recently appeared, where mobility in sensors is leveraged to meet deployment obj ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
(Show Context)
An important phase of sensor networks operation is deployment of sensors in the field of interest. Critical goals during sensor networks deployment include coverage, connectivity, load balancing etc. A class of work has recently appeared, where mobility in sensors is leveraged to meet deployment objectives. In this paper, we study deployment of sensor networks using mobile sensors. The distinguishing feature of our work is that the sensors in our model have limited mobilities. More specifically, the mobility in the sensors we consider is restricted to a flip, where the distance of the flip is bounded. We call such sensors as flipbased sensors. Given an initial deployment of flipbased sensors in a field, our problem is to determine a movement plan for the sensors in order to maximize the sensor network coverage, and minimize the number of flips. We propose a minimumcost maximumflow based solution to this problem. We prove that our solution optimizes both the coverage and the number of flips. We also study the sensitivity of coverage and the number of flips to flip distance under different initial deployment distributions of sensors. We observe that increased flip distance achieves better coverage, and reduces the number of flips required per unit increase in coverage. However, such improvements are constrained by initial deployment distributions of sensors, due to the limitations on sensor mobility.
Multitarget tracking by lagrangian relaxation to mincost network flow
 In CVPR
, 2013
"... We propose a method for global multitarget tracking that can incorporate higherorder track smoothness constraints such as constant velocity. Our problem formulation readily lends itself to path estimation in a trellis graph, but unlike previous methods, each node in our network represents a can ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
(Show Context)
We propose a method for global multitarget tracking that can incorporate higherorder track smoothness constraints such as constant velocity. Our problem formulation readily lends itself to path estimation in a trellis graph, but unlike previous methods, each node in our network represents a candidate pair of matching observations between consecutive frames. Extra constraints on binary flow variables in the graph result in a problem that can no longer be solved by mincost network flow. We therefore propose an iterative solution method that relaxes these extra constraints using Lagrangian relaxation, resulting in a series of problems that ARE solvable by mincost flow, and that progressively improve towards a highquality solution to our original optimization problem. We present experimental results showing that our method outperforms the standard networkflow formulation as well as other recent algorithms that attempt to incorporate higherorder smoothness constraints. 1.
Maximum Likelihood Genome Assembly
, 2009
"... Whole genome shotgun assembly is the process of taking many short sequenced segments (reads) and reconstructing the genome from which they originated. We demonstrate how the technique of bidirected network flow can be used to explicitly model the doublestranded nature of DNA for genome assembly. By ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
Whole genome shotgun assembly is the process of taking many short sequenced segments (reads) and reconstructing the genome from which they originated. We demonstrate how the technique of bidirected network flow can be used to explicitly model the doublestranded nature of DNA for genome assembly. By combining an algorithm for the Chinese Postman Problem on bidirected graphs with the construction of a bidirected de Bruijn graph, we are able to find the shortest doublestranded DNA sequence that contains a given set of klong DNAmolecules. This is the first exact polynomial time algorithm for the assembly of a doublestranded genome. Furthermore, we propose a maximum likelihood framework for assembling the genome that is the most likely source of the reads, in lieu of the standard maximum parsimony approach (which finds the shortest genome subject to some constraints). In this setting, we give a bidirected network flowbased algorithm that, by taking advantage of high coverage, accurately estimates the copy counts of repeats in a genome. Our second algorithm combines these predicted copy counts with matepair data in order to assemble the reads into contigs.We run our algorithms on simulated read data fromEscherichia coli and predict copy counts with extremely high accuracy, while assembling long contigs.