Results 1 -
6 of
6
Early Profile Pruning on XML-aware Publish/Subscribe Systems
- In VLDB 2007
"... Publish-subscribe applications are an important class of contentbased dissemination systems where the message transmission is defined by the message content, rather than its destination IP address. With the increasing use of XML as the standard format on many Internet-based applications, XML aware p ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Publish-subscribe applications are an important class of contentbased dissemination systems where the message transmission is defined by the message content, rather than its destination IP address. With the increasing use of XML as the standard format on many Internet-based applications, XML aware pub-sub applications become necessary. In such systems, the messages (generated by publishers) are encoded as XML documents, and the profiles (defined by subscribers) as XML query statements. As the number of documents and query requests grow, the performance and scalability of the matching phase (i.e. matching of queries to incoming documents) become vital. Current solutions have limited or no flexibility to prune out queries in advance. In this paper, we overcome such limitation by proposing a novel early pruning approach called Bounding-based XML Filtering or BoXFilter. The BoXFilter is based on a new tree-like indexing structure that organizes the queries based on their similarity and provides lower and upper bound estimations needed to prune queries not related to the incoming documents. Our experimental evaluation shows that the early profile pruning approach offers drastic performance improvements over the current state-of-the-art in XML filtering. 1.
RoXSum: Leveraging Data Aggregation and Batch Processing for XML Routing
- In Proc. of ICDE
, 2007
"... Content-based routing is the primary form of communication within publish/subscribe systems. In those systems data transmission is performed by sophisticated overlay networks of content-based routers, which match data messages against registered subscriptions and forward them based on this matching. ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Content-based routing is the primary form of communication within publish/subscribe systems. In those systems data transmission is performed by sophisticated overlay networks of content-based routers, which match data messages against registered subscriptions and forward them based on this matching. Despite their inherent complexities, such systems are expected to deliver information in a timely and scalable fashion. As a result, their successful deployment is a strenuous task. Relevant efforts have so far focused on the construction of the overlay network and the filtering of messages at each broker. However, the efficient transmission of messages has received less attention. In this work, we propose a solution that gracefully handles the transmission task, while providing performance benefits for the matching task as well. Along those lines, we design RoXSum, a message representation scheme that aggregates the routing information from multiple documents in a way that permits subscription matching directly on the aggregated content. Our performance study shows that RoXSum is a viable and effective technique, as it speeds up message routing for more than an order of magnitude. 1
Cache-Oblivious Databases: Limitations and Opportunities
, 2008
"... Cache-oblivious techniques, proposed in the theory community, have optimal asymptotic bounds on the amount of data transferred between any two adjacent levels of an arbitrary memory hierarchy. Moreover, this optimal performance is achieved without any hardware platform specific tuning. These propert ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Cache-oblivious techniques, proposed in the theory community, have optimal asymptotic bounds on the amount of data transferred between any two adjacent levels of an arbitrary memory hierarchy. Moreover, this optimal performance is achieved without any hardware platform specific tuning. These properties are highly attractive to autonomous databases, especially because the hardware architectures are becoming increasingly complex and diverse. In this paper, we present our design, implementation, and evaluation of the first cache-oblivious in-memory query processor, EaseDB. Moreover, we discuss the inherent limitations of the cacheoblivious approach as well as the opportunities given by the upcoming hardware architectures. Specifically, a cache-oblivious technique usually requires sophisticated algorithm design to achieve a comparable performance to its cache-conscious counterpart. Nevertheless, this developmenttime effort is compensated by the automaticity of performance achievement and the reduced ownership cost. Furthermore, this automaticity enables cache-oblivious techniques to outperform their cache-conscious counterparts in multi-threading processors.
Value-Aware RoXSum: Effective Message Aggregation for XML-Aware Information Dissemination
- In Proc. of WebDB
, 2007
"... Publish/subscribe (or pub/sub) systems perform asynchronous message transmission, from publishers to subscribers, without any of the parties having knowledge of the other. The pub/sub infrastructure manages the delivery of the messages, which is guided by user subscriptions that specify the type of ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Publish/subscribe (or pub/sub) systems perform asynchronous message transmission, from publishers to subscribers, without any of the parties having knowledge of the other. The pub/sub infrastructure manages the delivery of the messages, which is guided by user subscriptions that specify the type of information the subscribers are interested in. Since XML prevails as the standard for information exchange, efficient XML-aware pub/sub systems become necessary. Within that context, we propose VA-RoXSum, a novel message representation scheme that aggregates the content of messages in a space efficient manner. Coupled with specialized processing algorithms that operate on its aggregated content, the VA-RoXSum enables the batch processing of groups of messages and considerably improves the performance of the subscription-guided filtering task. Our preliminary experiments show that a pub/sub infrastructure with VA-RoXSum achieves up to two orders of magnitude faster matching, compared with state-of-the-art alternatives, which operate on the original messages. 1.
unknown title
"... Partitioning has been used to improve the performance of the hash join in the main memory; however, cache-conscious partitioning requires the knowledge about the cache parameters, such as the capacity and unit size, of a chosen level of the CPU caches, e.g., the L2 cache. Obtaining this knowledge an ..."
Abstract
- Add to MetaCart
Partitioning has been used to improve the performance of the hash join in the main memory; however, cache-conscious partitioning requires the knowledge about the cache parameters, such as the capacity and unit size, of a chosen level of the CPU caches, e.g., the L2 cache. Obtaining this knowledge and subsequently tuning the algorithm may be inconvenient, and sometimes infeasible, for complex systems. As evidence, our experiments on three different hardware platforms show that, on each platform, the best partitioning granularity was none of the cache parameters. Therefore, we propose a cache-oblivious approach to partitioned hash joins, in which the algorithm is aware of the existence of the memory hierarchy but requires no knowledge about the parameter values. In specific, we perform binary partitioning on a join relation recursively until the base case is reached. To improve the efficiency, we have designed a novel cacheoblivious buffering structure to facilitate this partitioning and have proposed a cache-oblivious cost model to estimate the base case size. Our theoretical and empirical results both show that this cache-oblivious join matches the performance of its manually tuned, cache-conscious counterparts. 1
Advanced Institutes of Convergence Technology
"... XML-enabled publish-subscribe (pub-sub) systems have emerged as an increasingly important tool for e-commerce and Internet applications. In a typical pub-sub system, subscribed users specify their interests in a profile expressed in the XPath language. Each new data content is then matched against t ..."
Abstract
- Add to MetaCart
XML-enabled publish-subscribe (pub-sub) systems have emerged as an increasingly important tool for e-commerce and Internet applications. In a typical pub-sub system, subscribed users specify their interests in a profile expressed in the XPath language. Each new data content is then matched against the user profiles so that the content is delivered only to the interested subscribers. As the number of subscribed users and their profiles can grow very large, the scalability of the service is critical to the success of pub-sub systems. In this article, we propose a novel scalable filtering system called iFiST that transforms user profiles of a twig pattern expressed in XPath into sequences using the Prüfer’s method. Consequently, instead of breaking a twig pattern into multiple linear paths and matching them separately, iFiST performs holistic matching of twig patterns with each incoming document in a bottom-up fashion. iFiST organizes the sequences into a dynamic hash-based index for efficient filtering, and exploits the commonality among user profiles to enable shared processing during the filtering phase. We demonstrate that the holistic matching approach reduces filtering cost and memory consumption, thereby improving the scalability of iFiST.

