Cuckoo hashing
 Journal of Algorithms
, 2001
Cited by 124 (6 self)
We present a simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. (Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4):738–761, 1994). The space usage is similar to that of binary search trees, i.e., three words per key on average. Besides being conceptually much simpler than previous dynamic dictionaries with worst case constant lookup time, our data structure is interesting in that it does not use perfect hashing, but rather a variant of open addressing where keys can be moved back in their probe sequences. An implementation inspired by our algorithm, but using weaker hash functions, is found to be quite practical. It is competitive with the best known dictionaries having an average case (but no nontrivial worst case) guarantee. Key Words: data structures, dictionaries, information retrieval, searching, hashing, experiments * Partially supported by the Future and Emerging Technologies programme of the EU
On the design of CGAL a computational geometry algorithms library
 Softw. – Pract. Exp
, 1998
Cited by 90 (15 self)
CGAL is a Computational Geometry Algorithms Library written in C++, which is being developed by research groups in Europe and Israel. The goal is to make the large body of geometric algorithms developed in the field of computational geometry available for industrial application. We discuss the major design goals for CGAL, which are correctness, flexibility, easeofuse, efficiency, and robustness, and present our approach to reach these goals. Generic programming using templates in C++ plays a central role in the architecture of CGAL. We give a short introduction to generic programming in C++, compare it to the objectoriented programming paradigm, and present examples where both paradigms are used effectively in CGAL. Moreover, we give an overview of the current structure of the CGALlibrary and consider software engineering aspects in the CGALproject. Copyright c ○ 1999 John Wiley & Sons, Ltd. KEY WORDS: computational geometry; software library; C++; generic programming;
Geometric SpeedUp Techniques for Finding Shortest Paths in Large Sparse Graphs
, 2003
Cited by 53 (14 self)
In this paper, we consider Dijkstra's algorithm for the single source single target shortest paths problem in large sparse graphs. The goal is to reduce the response time for online queries by using precomputed information. For the result of the preprocessing, we admit at most linear space. We assume that a layout of the graph is given. From this layout, in the preprocessing, we determine for each edge a geometric object containing all nodes that can be reached on a shortest path starting with that edge. Based on these geometric objects, the search space for online computation can be reduced significantly. We present an extensive experimental study comparing the impact of different types of objects. The test data we use are traffic networks, the typical field of application for this scenario.
An Algorithm for Clustering cDNAs for Gene Expression Analysis
 In RECOMB99: Proceedings of the Third Annual International Conference on Computational Molecular Biology
, 1999
Cited by 45 (4 self)
We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clustering with some provably good properties. The application that motivated this study was gene expression analysis, where a collection of cDNAs must be clustered based on their oligonucleotide fingerprints. The algorithm has been tested intensively on simulated libraries and was shown to outperform extant methods. It demonstrated robustness to high noise levels. In a blind test on real cDNA fingerprint data the algorithm obtained very good results. Utilizing the results of the algorithm would have saved over 70% of the cDNA sequencing cost on that data set. 1 Introduction Cluster analysis seeks grouping of data elements into subsets, so that elements in the same subset are in some sense more cl...
Managing Uncertainty in Schema Matching with TopK Schema Mappings
 Journal on Data Semantics
, 2006
Cited by 29 (6 self)
In this paper, we propose to extend current practice in schema matching with the simultaneous use of topK schema mappings rather than a single best mapping. This is a natural extension of existing methods (which can be considered to fall into the top1 category), taking into account the imprecision inherent in the schema matching process. The essence of this method is the simultaneous generation and examination of K best schema mappings to identify useful mappings. The paper discusses efficient methods for generating topK methods and propose a generic methodology for the simultaneous utilization of topK mappings. We also propose a concrete heuristic that aims at improving precision at the cost of recall. We have tested the heuristic on real as well as synthetic data and anlyze the emricial results. The novelty of this paper lies in the robust extension of existing methods for schema matching, one that can gracefully accommodate lessthanperfect scenarios in which the exact mapping cannot be identified in a single iteration. Our proposal represents a step forward in achieving fully automated schema matching, which is currently semiautomated at best. 1
A BranchandCut Algorithm for Multiple Sequence Alignment
 IN PROC. OF THE 1ST ANN. INTERN. CONF. ON COMP. MOLEC. BIO. (RECOMB 97
, 1997
Cited by 28 (5 self)
Multiple sequence alignment is an important problem in computational biology. We study the Maximum Trace formulation introduced by Kececioglu [Kec91]. We first phrase the problem in terms of forbidden subgraphs, which enables us to express Maximum Trace as an integer linearprogramming problem, and then solve the integer linear program using methods from polyhedral combinatorics. The trace polytope is the convex hull of all feasible solutions to the Maximum Trace problem; for the case of two sequences, we give a complete characterization of this polytope. This yields a polynomialtime algorithm for a general version of pairwise sequence alignment that, perhaps suprisingly, does not use dynamic programming; this yields, for instance, a nondynamic programming algorithm for sequence comparison under the 01 metric, which gives another answer to a longopen question in the area of string algorithms [PW93]. For the multiplesequence case, we derive several classes of facetdefining inequali...
Graph Based Modeling and Implementation with EER/GRAL
, 1996
Cited by 21 (11 self)
This paper gives a cohesive approach to modeling and implementation with graphs. This approach uses extended entity relationship (EER) diagrams supplemented with the Zlike constraint language GRAL. Due to the foundation of EER/GRAL on Z a common formal basis exists. EER/GRAL descriptions give conceptual models which can be implemented in a seamless manner by efficient data structures using the GraLab graph library. Descriptions of four medium size EER/GRALapplications conclude the paper to demonstrate the usefulness of the approach in practice.
A polyhedral approach to sequence alignment problems
 DISCRETE APPL. MATH
, 2000
Cited by 20 (1 self)
We study two new problems in sequence alignment both from a practical and a theoretical view, using tools from combinatorial optimization to develop branchandcut algorithms. The Generalized Maximum Trace formulation captures several forms of multiple sequence alignment problems in a common framework, among them the original formulation of Maximum Trace. The RNA Sequence Alignment Problem captures the comparison of RNA molecules on the basis of their primary sequence and their secondary structure. Both problems have a characterization in terms of graphs which we reformulate in terms of integer linear programming. We then study the polytopes (or convex hulls of all feasible solutions) associated with the integer linear program for both problems. For each polytope we derive several classes of facetdefining inequalities and show that for some of these classes the corresponding separation problem can be solved in polynomial time. This leads to a polynomial time algorithm for pairwise sequence alignment that is not based on dynamic programming. Moreover, for multiple sequences the branchandcut algorithms for both sequence alignment problems are able to solve to optimality instances that are beyond the range of present dynamic programming approaches.
Methods for Achieving Fast Query Times in Point Location Data Structures
, 1997
Cited by 20 (1 self)
Given a collection S of n line segments in the plane, the planar point location problem is to construct a data structure that can efficiently determine for a given query point p the first segment(s) in S intersected by vertical rays emanating out from p. It is well known that linearspace data structures can be constructed so as to achieve O(log n) query times. But applications, such as those common in geographic information systems, motivate a reexamination of this problem with the goal of improving query times further while also simplifying the methods needed to achieve such query times. In this paper we perform such a reexamination, focusing on the issues that arise in three different classes of pointlocation query sequences: ffl sequences that are reasonably uniform spatially and temporally (in which case the constant factors in the query times become critical), ffl sequences that are nonuniform spatially or temporally (in which case one desires data structures that adapt to s...
Radial Level Planarity Testing and Embedding in Linear Time
 Journal of Graph Algorithms and Applications
, 2005
Cited by 19 (9 self)
A graph with a given partition of the vertices on k concentric circles is radial level planar if there is a vertex permutation such that the edges can be routed strictly outwards without crossings. Radial level planarity extends level planarity, where the vertices are placed on k horizontal lines and the edges are routed strictly downwards without crossings. The extension is characterised by rings, which are level nonplanar biconnected components. Our main results are linear time algorithms for radial level planarity testing and for computing an embedding. We introduce PQRtrees as a new data structure where Rnodes and associated templates for their manipulation are introduced to deal with rings. Our algorithms extend level planarity testing and embedding algorithms which use PQtrees.