Results 11  20
of
27
Fast Set Intersection in Memory
"... Set intersection is a fundamental operation in information retrieval and database systems. This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worstcase efficient way. In general, given k (preprocessed) sets, with totally n elements ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Set intersection is a fundamental operation in information retrieval and database systems. This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worstcase efficient way. In general, given k (preprocessed) sets, with totally n elements, we will show how to compute their intersection in expected time O(n / √ w + kr), where r is the intersection size and w is the number of bits in a machineword. In addition,we introduce a very simple version of this algorithm that has weaker asymptotic guarantees but performs even better in practice; both algorithms outperform the state of the art techniques for both synthetic and real data sets and workloads. 1.
Finger Search Trees
, 2005
"... One of the most studied problems in computer science is the problem of maintaining a sorted sequence of elements to facilitate efficient searches. The prominent solution to the problem is to organize the sorted sequence as a balanced search tree, enabling insertions, deletions and searches in logari ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
One of the most studied problems in computer science is the problem of maintaining a sorted sequence of elements to facilitate efficient searches. The prominent solution to the problem is to organize the sorted sequence as a balanced search tree, enabling insertions, deletions and searches in logarithmic time. Many different search trees have been developed and studied intensively in the literature. A discussion of balanced binary search trees can e.g. be found in [4]. This chapter is devoted to finger search trees which are search trees supporting fingers, i.e. pointers, to elements in the search trees and supporting efficient updates and searches in the vicinity of the fingers. If the sorted sequence is a static set of n elements then a simple and space efficient representation is a sorted array. Searches can be performed by binary search using 1+⌊log n⌋ comparisons (we throughout this chapter let log x denote log 2 max{2, x}). A finger search starting at a particular element of the array can be performed by an exponential search by inspecting elements at distance 2 i − 1 from the finger for increasing i followed by a binary search in a range of 2 ⌊log d ⌋ − 1 elements, where d is the rank difference in the sequence between the finger and the search element. In Figure 11.1 is shown an exponential search for the element 42 starting at 5. In the example d = 20. An exponential search requires
Identifying Occurrences of Maximal Pairs in Multiple Strings
 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching,Lecture Notes In Computer Science
, 2002
"... A molecular sequence "model" is a (structured) sequence of distinct or identical strings separated by gaps; here we design and analyze e#cient algorithms for variations of the "Model Matching" and "Model Identification" problems. ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
A molecular sequence "model" is a (structured) sequence of distinct or identical strings separated by gaps; here we design and analyze e#cient algorithms for variations of the "Model Matching" and "Model Identification" problems.
AN EFFICIENT NOMINAL UNIFICATION ALGORITHM
"... Abstract. Nominal Unification is an extension of firstorder unification where terms can contain binders and unification is performed modulo αequivalence. Here we prove that the existence of nominal unifiers can be decided in quadratic time. First, we linearlyreduce nominal unification problems to ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract. Nominal Unification is an extension of firstorder unification where terms can contain binders and unification is performed modulo αequivalence. Here we prove that the existence of nominal unifiers can be decided in quadratic time. First, we linearlyreduce nominal unification problems to a sequence of freshness and equalities between atoms, modulo a permutation, using ideas as Paterson and Wegman for firstorder unification. Second, we prove that solvability of these reduced problems may be checked in quadratic time. Finally, we point out how using ideas of Brown and Tarjan for unbalanced merging, we could solve these reduced problems more efficiently. 1.
Checking Determinism of XML Schema Content Models in Optimal Time
"... We consider the determinism checking of XML Schema content models, as required by the W3C Recommendation. We argue that currently applied solutions have flaws and make processors vulnerable to exponential resource needs by pathological schemas, and we help to eliminate this potential vulnerability o ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We consider the determinism checking of XML Schema content models, as required by the W3C Recommendation. We argue that currently applied solutions have flaws and make processors vulnerable to exponential resource needs by pathological schemas, and we help to eliminate this potential vulnerability of XML Schema based systems. XML Schema content models are essentially regular expressions extended with numeric occurrence indicators. A previously published polynomialtime solution to check the determinism of such expressions is improved to run in linear time, and the improved algorithm is implemented and evaluated experimentally. When compared to the corresponding method of a popular productionquality XML Schema processor, the new implementation runs orders of magnitude faster. Enhancing the solution to take further extensions of XML Schema into account without compromising its linear scalability is also discussed. Key words: Regular expression, numeric occurrence indicator, oneunambiguity, weak determinism, unique particle attribution, Java 1. Introduction and
Motif Extraction from Weighted Sequences
 Proc. 11th Symposium on String Processing and Information Retrieval (SPIRE), volume 3246 of LNCS
, 2004
"... We present in this paper three algorithms. The first extracts repeated motifs from a weighted sequence. The motifs correspond to words which occur at least q times and with hamming distance e in a weighted sequence with probability 1/k each time, where k is a small constant. The second algori ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We present in this paper three algorithms. The first extracts repeated motifs from a weighted sequence. The motifs correspond to words which occur at least q times and with hamming distance e in a weighted sequence with probability 1/k each time, where k is a small constant. The second algorithm extracts common motifs from a set of N 2 weighted sequences with hamming distance e. In the second case, the motifs must occur twice with probability 1/k, in 1 distinct sequences of the set. The third algorithm extracts maximal pairs from a weighted sequence. A pair in a sequence is the occurrence of the same substring twice. In addition, the algorithms presented in this paper improve slightly on previous work on these problems.
Lipschitz unimodal and isotonic regression on paths and trees
, 2008
"... Let M = (V, A) be a planar graph, let γ ≥ 0 be a real parameter, and t: V → R a height function. A γLipschitz unimodal regression (γLUR) of t is a function s: V → R such that s has a unique local minimum, s(u) − s(v)  ≤ γ for each {u, v} ∈ A, and ‖s − t‖2 = ∑ v∈V (s(v) − t(v))2 is minimized. ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Let M = (V, A) be a planar graph, let γ ≥ 0 be a real parameter, and t: V → R a height function. A γLipschitz unimodal regression (γLUR) of t is a function s: V → R such that s has a unique local minimum, s(u) − s(v)  ≤ γ for each {u, v} ∈ A, and ‖s − t‖2 = ∑ v∈V (s(v) − t(v))2 is minimized. Here, a local minimum of s is a vertex v such that s(u)> s(v) for any neighbor u of v. For a directed planar graph, s: V → R is the γLipschitz isotonic regression (γLIR) of t if s(u) ≤ s(v) ≤ s(u)+γ for each directed edge (u, v) and ‖s − t‖2 is minimized. These problems arise, for example, in topological simplification of a height function. We present nearlineartime algorithms for LUR and LIR problems for two special cases where M is a path or a tree.
Computational Biology
, 2000
"... During four years of arduous service, a Ph. D. student is expected to familiarise himself with his field of research, and, hopefully, contribute to this field. This is reflected by the division of this dissertation into two parts. Part I is a (partial) overview of the field of computational biology ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
During four years of arduous service, a Ph. D. student is expected to familiarise himself with his field of research, and, hopefully, contribute to this field. This is reflected by the division of this dissertation into two parts. Part I is a (partial) overview of the field of computational biology as I conceive it, an overview that is aimed at presenting the context for my contributions to the field of computational biology. These contributions are presented in part II as five independent articles
Finding the Most Relevant Fragments in Networks
, 2009
"... We study a point pattern detection problem on networks, motivated by applications in geographical analysis, such as crime hotspot detection. Given a network N (a connected graph with nonnegative edge lengths) together with a set of sites, which lie on the edges or vertices of N, we look for a conne ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We study a point pattern detection problem on networks, motivated by applications in geographical analysis, such as crime hotspot detection. Given a network N (a connected graph with nonnegative edge lengths) together with a set of sites, which lie on the edges or vertices of N, we look for a connected subnetwork F of N of small total length that contains many sites. The edges of F can form parts of the edges of N. We consider different variants of this problem where N is either a general graph or restricted to a tree, and the subnetwork F that we are looking for is either a simple path, a path with selfintersections at vertices, or a tree. We give polynomialtime algorithms, NPhardness and NPcompleteness proofs, approximation algorithms, and also fixedparameter tractable algorithms. 1
Algorithms for L ∞ Isotonic Regression
, 2009
"... This paper gives algorithms for determining L ∞ weighted isotonic regressions satisfying order constraints given by a DAG with n vertices and m edges. Throughout, topological sorting plays an important role. A modification to an algorithm of Kaufman and Tamir gives an algorithm taking Θ(m log n) tim ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper gives algorithms for determining L ∞ weighted isotonic regressions satisfying order constraints given by a DAG with n vertices and m edges. Throughout, topological sorting plays an important role. A modification to an algorithm of Kaufman and Tamir gives an algorithm taking Θ(m log n) time for the general case, improving upon theirs when the graph is sparse. When the regression values are restricted to a set S then scaling can be used to find an optimal regression in Θ(m log S) time. The prefix isotonic regression problem is used as an intermediate step in finding isotonic regressions for some specific orders. For rooted trees the prefix isotonic regression problem is solved in Θ(n log n) time, allowing one to find the unimodal regression of a linear order in the same time bound. When the vertices are points in ddimensional space ordered by domination then the prefix isotonic problem can be solved, and hence the isotonic regression determined, in Θ(n log d n) time. 1