Results 1 -
6 of
6
Fast evaluation of union-intersection expressions
, 2007
"... Abstract. We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worst-case efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract. We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worst-case efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model of computation, and sets of machine words, but also state our results in the I/O model. On a RAM with word size w, a special case of our result is that the intersection of m (preprocessed) sets, containing n elements in total, can be computed in expected time O(n(log w) 2 /w + km), where k is the number of elements in the intersection. If the first of the two terms dominates, this is a factor w 1−o(1) faster than the standard solution of merging sorted lists. We show a log k cell probe lower bound of time Ω(n/(wm log m) + (1 −)k), meaning w that our upper bound is nearly optimal for small m. Our algorithm uses a novel combination of approximate set representations and word-level parallelism. 1
Adaptive search algorithm for patterns, in succinctly encoded XML
, 2006
"... Abstract. We propose an adaptive algorithm for context queries (queries expressed as preorder and ancestordescendant relations on labeled nodes), which can be used to find patterns in XML documents. Our algorithm takes advantage of the correlation between terms of the query without any preprocessed ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. We propose an adaptive algorithm for context queries (queries expressed as preorder and ancestordescendant relations on labeled nodes), which can be used to find patterns in XML documents. Our algorithm takes advantage of the correlation between terms of the query without any preprocessed information, and it runs in time (kd(lg lg min(n,s)+lg lg(r))) in the RAM model, where k is the number of terms in the query, d is the non-deterministic complexity of the query on the multi-labeled tree (i.e. the minimum number of operations required to check the answer to the query), n is the number of nodes in the tree, s is the number of relations between nodes and labels, and r is the maximal number of nodes matching a label on any rooted path in the tree.
Doctoral theses at NTNU, 2011:96
"... NTNU, Trondheim, with Bjørn Olstad as main supervisor, and Øystein Torbjørnsen and Magnus Lie Hetland as co-supervisors. The candidate was supported by the Research Council of Norway under the grant NFR 162349, and by the iAD project, also funded by the Research Council of Norway. 5 Summary This PhD ..."
Abstract
- Add to MetaCart
NTNU, Trondheim, with Bjørn Olstad as main supervisor, and Øystein Torbjørnsen and Magnus Lie Hetland as co-supervisors. The candidate was supported by the Research Council of Norway under the grant NFR 162349, and by the iAD project, also funded by the Research Council of Norway. 5 Summary This PhD thesis is a collection of papers presented with a general introduction to the topic, which is twig pattern matching (TPM) on indexed tree data. TPM is a pattern matching problem where occurrences of a query tree are found in a usually much larger data tree. This has applications in XML search, where the data is tree shaped and the queries specify tree patterns. The papers included present contributions on how to construct and use structure indexes, which can speed up pattern matching, and on how to efficiently join together results for the different parts of the query with so-called twig joins. • Paper 1 [18] shows how to perform more efficient matching of root-to-leaf query paths in so-called path indexes, by using new opportunistic algorithms on existing
Doctoral theses at NTNU, 2011:96
"... NTNU, Trondheim, with Bjørn Olstad as main supervisor, and Øystein Torbjørnsen and Magnus Lie Hetland as co-supervisors. The candidate was supported by the Research Council of Norway under the grant NFR 162349, and by the iAD project, also funded by the Research Council of Norway. 5 Summary This PhD ..."
Abstract
- Add to MetaCart
NTNU, Trondheim, with Bjørn Olstad as main supervisor, and Øystein Torbjørnsen and Magnus Lie Hetland as co-supervisors. The candidate was supported by the Research Council of Norway under the grant NFR 162349, and by the iAD project, also funded by the Research Council of Norway. 5 Summary This PhD thesis is a collection of papers presented with a general introduction to the topic, which is twig pattern matching (TPM) on indexed tree data. TPM is a pattern matching problem where occurrences of a query tree are found in a usually much larger data tree. This has applications in XML search, where the data is tree shaped and the queries specify tree patterns. The papers included present contributions on how to construct and use structure indexes, which can speed up pattern matching, and on how to efficiently join together results for the different parts of the query with so-called twig joins. • Paper 1 [18] shows how to perform more efficient matching of root-to-leaf query paths in so-called path indexes, by using new opportunistic algorithms on existing
Workload-Aware Indexing for Keyword Search in Social Networks
"... More and more data is accumulated inside social networks. Keywordsearch providesasimple interface for exploringthis content. However, a lot of the content is private, and a search system must enforce the privacy settings of the social network. In this paper, we present a workload-aware keyword searc ..."
Abstract
- Add to MetaCart
More and more data is accumulated inside social networks. Keywordsearch providesasimple interface for exploringthis content. However, a lot of the content is private, and a search system must enforce the privacy settings of the social network. In this paper, we present a workload-aware keyword search system with access control based on a social network. We make two technical contributions: (1) HeapUnion, a novel union operator that improves processing of search queries with access control by up to a factor of two compared to the best previous solution; and (2) highly accurate cost models that vary in sophistication and accuracy; these cost models provide input to an optimization algorithm that selects the most efficient organization of access control meta-data for a given workload. Our experimental results with real and synthetic data show that our approach outperforms previous work by up to a factor of three.
Fast Set Intersection in Memory
"... Set intersection is a fundamental operation in information retrieval and database systems. This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worst-case efficient way. In general, given k (preprocessed) sets, with totally n elements ..."
Abstract
- Add to MetaCart
Set intersection is a fundamental operation in information retrieval and database systems. This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worst-case efficient way. In general, given k (preprocessed) sets, with totally n elements, we will show how to compute their intersection in expected time O(n / √ w + kr), where r is the intersection size and w is the number of bits in a machine-word. In addition,we introduce a very simple version of this algorithm that has weaker asymptotic guarantees but performs even better in practice; both algorithms outperform the state of the art techniques for both synthetic and real data sets and workloads. 1.

