Results 1 
7 of
7
Fast Set Intersection in Memory
"... Set intersection is a fundamental operation in information retrieval and database systems. This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worstcase efficient way. In general, given k (preprocessed) sets, with totally n elements ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Set intersection is a fundamental operation in information retrieval and database systems. This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worstcase efficient way. In general, given k (preprocessed) sets, with totally n elements, we will show how to compute their intersection in expected time O(n / √ w + kr), where r is the intersection size and w is the number of bits in a machineword. In addition,we introduce a very simple version of this algorithm that has weaker asymptotic guarantees but performs even better in practice; both algorithms outperform the state of the art techniques for both synthetic and real data sets and workloads. 1.
Fast evaluation of unionintersection expressions
, 2007
"... Abstract. We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worstcase efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract. We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worstcase efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model of computation, and sets of machine words, but also state our results in the I/O model. On a RAM with word size w, a special case of our result is that the intersection of m (preprocessed) sets, containing n elements in total, can be computed in expected time O(n(log w) 2 /w + km), where k is the number of elements in the intersection. If the first of the two terms dominates, this is a factor w 1−o(1) faster than the standard solution of merging sorted lists. We show a log k cell probe lower bound of time Ω(n/(wm log m) + (1 −)k), meaning w that our upper bound is nearly optimal for small m. Our algorithm uses a novel combination of approximate set representations and wordlevel parallelism. 1
Adaptive search algorithm for patterns, in succinctly encoded XML
, 2006
"... Abstract. We propose an adaptive algorithm for context queries (queries expressed as preorder and ancestordescendant relations on labeled nodes), which can be used to find patterns in XML documents. Our algorithm takes advantage of the correlation between terms of the query without any preprocessed ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract. We propose an adaptive algorithm for context queries (queries expressed as preorder and ancestordescendant relations on labeled nodes), which can be used to find patterns in XML documents. Our algorithm takes advantage of the correlation between terms of the query without any preprocessed information, and it runs in time (kd(lg lg min(n,s)+lg lg(r))) in the RAM model, where k is the number of terms in the query, d is the nondeterministic complexity of the query on the multilabeled tree (i.e. the minimum number of operations required to check the answer to the query), n is the number of nodes in the tree, s is the number of relations between nodes and labels, and r is the maximal number of nodes matching a label on any rooted path in the tree.
Doctoral theses at NTNU, 2011:96
"... NTNU, Trondheim, with Bjørn Olstad as main supervisor, and Øystein Torbjørnsen and Magnus Lie Hetland as cosupervisors. The candidate was supported by the Research Council of Norway under the grant NFR 162349, and by the iAD project, also funded by the Research Council of Norway. 5 Summary This PhD ..."
Abstract
 Add to MetaCart
NTNU, Trondheim, with Bjørn Olstad as main supervisor, and Øystein Torbjørnsen and Magnus Lie Hetland as cosupervisors. The candidate was supported by the Research Council of Norway under the grant NFR 162349, and by the iAD project, also funded by the Research Council of Norway. 5 Summary This PhD thesis is a collection of papers presented with a general introduction to the topic, which is twig pattern matching (TPM) on indexed tree data. TPM is a pattern matching problem where occurrences of a query tree are found in a usually much larger data tree. This has applications in XML search, where the data is tree shaped and the queries specify tree patterns. The papers included present contributions on how to construct and use structure indexes, which can speed up pattern matching, and on how to efficiently join together results for the different parts of the query with socalled twig joins. • Paper 1 [18] shows how to perform more efficient matching of roottoleaf query paths in socalled path indexes, by using new opportunistic algorithms on existing
Doctoral theses at NTNU, 2011:96
"... NTNU, Trondheim, with Bjørn Olstad as main supervisor, and Øystein Torbjørnsen and Magnus Lie Hetland as cosupervisors. The candidate was supported by the Research Council of Norway under the grant NFR 162349, and by the iAD project, also funded by the Research Council of Norway. 5 Summary This PhD ..."
Abstract
 Add to MetaCart
NTNU, Trondheim, with Bjørn Olstad as main supervisor, and Øystein Torbjørnsen and Magnus Lie Hetland as cosupervisors. The candidate was supported by the Research Council of Norway under the grant NFR 162349, and by the iAD project, also funded by the Research Council of Norway. 5 Summary This PhD thesis is a collection of papers presented with a general introduction to the topic, which is twig pattern matching (TPM) on indexed tree data. TPM is a pattern matching problem where occurrences of a query tree are found in a usually much larger data tree. This has applications in XML search, where the data is tree shaped and the queries specify tree patterns. The papers included present contributions on how to construct and use structure indexes, which can speed up pattern matching, and on how to efficiently join together results for the different parts of the query with socalled twig joins. • Paper 1 [18] shows how to perform more efficient matching of roottoleaf query paths in socalled path indexes, by using new opportunistic algorithms on existing
WorkloadAware Indexing for Keyword Search in Social Networks
"... More and more data is accumulated inside social networks. Keywordsearch providesasimple interface for exploringthis content. However, a lot of the content is private, and a search system must enforce the privacy settings of the social network. In this paper, we present a workloadaware keyword searc ..."
Abstract
 Add to MetaCart
More and more data is accumulated inside social networks. Keywordsearch providesasimple interface for exploringthis content. However, a lot of the content is private, and a search system must enforce the privacy settings of the social network. In this paper, we present a workloadaware keyword search system with access control based on a social network. We make two technical contributions: (1) HeapUnion, a novel union operator that improves processing of search queries with access control by up to a factor of two compared to the best previous solution; and (2) highly accurate cost models that vary in sophistication and accuracy; these cost models provide input to an optimization algorithm that selects the most efficient organization of access control metadata for a given workload. Our experimental results with real and synthetic data show that our approach outperforms previous work by up to a factor of three.
Efficient Algorithms for Context Query Evaluation over a Tagged Corpus
"... Abstract—We present an optimal adaptive algorithm for context queries in tagged content. The queries consist of locating instances of a tag within a context specified by the query using patterns with preorder, ancestordescendant and proximity operators in the document tree implied by the tagged con ..."
Abstract
 Add to MetaCart
Abstract—We present an optimal adaptive algorithm for context queries in tagged content. The queries consist of locating instances of a tag within a context specified by the query using patterns with preorder, ancestordescendant and proximity operators in the document tree implied by the tagged content. The time taken to resolve a query Q on a document tree T is logarithmic in the size of T, proportional to the size of Q, and to the difficulty of the combination of Q with T, as measured by the minimal size of a certificate of the answer. The performance of the algorithm is no worse than the classical worstcase optimal, while provably better on simpler queries and corpora. More formally, the algorithm runs in time O(δk lg(n/δk)) in the standard RAM model and in time O(δk lg lg min(n, σ)) in the Θ(lg(n))word RAM model, where k is the number of edges in the query, δ is the minimum number of operations required to certify the answer to the query, n is the number of nodes in the tree, and σ is the number of labels indexed. I.