Results 1 -
5 of
5
ObjectGlobe: Ubiquitous Query Processing on the Internet
, 2001
"... We present the design of ObjectGlobe, a distribust and open processor for Internet datasouc;Gp Today, data is pu
Abstract
-
Cited by 41 (11 self)
- Add to MetaCart
We present the design of ObjectGlobe, a distribust and open processor for Internet datasouc;Gp Today, data is pu<y<Mcm on the Internet via Web servers which have, if at all, very localizedquli processing capabilities. The goal of the ObjectGlobe project is to establish an open marketplace in which data and query proEkTNNk capabilities can be distribuib and udc by any kind of Internet application.Fuplic - more, ObjectGlobe integrates cycle pro viders (i.e., machines) which carryou quyc processing operators. The overall pictuc is to make it possible to execu@ aquGG with -- in principle -- u@;@EcmE quE operators, cycle providers, and data souac;p Su an infrastruMpyc can serve as enabling technology for scalable e-commerce applications, e.g., B2B and B2C market places, to be able to integrate data and data processing operations of a largenuy>G of participants. One of the main challenges in the design ofsuy an open system is to ensu@ privacy andsecuE;> .
End-to-End Support for Joins in Large-Scale Publish/Subscribe Systems
"... We address the problem of supporting a large number of select-join subscriptions in a wide-area publish/subscribe system. Subscriptions are interested in joins over different data sources (tables), with varying interests expressed as range selection predicates over table attributes. Naive schemes, s ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We address the problem of supporting a large number of select-join subscriptions in a wide-area publish/subscribe system. Subscriptions are interested in joins over different data sources (tables), with varying interests expressed as range selection predicates over table attributes. Naive schemes, such as computing and sending join results from a server, are inefficient because they produce redundant data, and are unable to share dissemination costs across subscribers and events. We propose a novel, scalable scheme that group-processes and disseminates a general mix of multi-way select-join subscriptions. We also propose a simple and application-agnostic extension tocontent-drivennetworks (CN), which further improves sharing of dissemination costs across events and subscribers. We develop and experimentally evaluate our scheme, and show that it can generate an order of magnitude lower network traffic at very low processing cost. Our extension to CN can further reduce traffic by another order of magnitude, with almost no increase in notification latency. 1
On Finding a Memory Lower Bound for Query Evaluation in Lightweight Devices
, 2003
"... Pervasive computing introduces data management requirements that must be tackled in a growing variety of lightweight computing devices. Personal folders on chip (e.g., healthcare folders on smartcards), networks of sensors (e.g., pollution sensors) and data hosted by autonomous mobile computers (e.g ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Pervasive computing introduces data management requirements that must be tackled in a growing variety of lightweight computing devices. Personal folders on chip (e.g., healthcare folders on smartcards), networks of sensors (e.g., pollution sensors) and data hosted by autonomous mobile computers (e.g., tourist information downloaded on a car computer) are different illustrations of the need for evaluating queries confined in hardware constrained computing devices. RAM is the most limiting factor in this context. This paper gives a thorough analysis of the RAM consumption problem and tries to answer three important questions: (1) does a memory lower bound exist whatever be the volume of the queried data? (2) how can a query be optimized without hurting this lower bound? (3) how an incremental growth of RAM impacts the query execution and optimization techniques devised in a lower bound context? Answering these questions paves the way for setting up co-design rules required to calibrate a hardware platform according to given application's requirements as well as to adapt an application to an existing hardware platform. To the best of our knowledge, this work is the first attempt to answer these questions. We illustrate the effectiveness of our answers through a performance evaluation.
ABSTRACT EndtoEndSupportforJoinsinLargeScale Publish/SubscribeSystems ∗
"... We address the problem of supporting a large number of select-join subscriptions for wide-area publish/subscribe. Subscriptions are joins over different tables, with varying interests expressed as range selection conditions over table attributes. Naive schemes, such as computing and sending join res ..."
Abstract
- Add to MetaCart
We address the problem of supporting a large number of select-join subscriptions for wide-area publish/subscribe. Subscriptions are joins over different tables, with varying interests expressed as range selection conditions over table attributes. Naive schemes, such as computing and sending join results from a server, are inefficient because they produce redundant data, and are unable to share dissemination costs across subscribers and events. We propose a novel, scalable scheme that group-processes and disseminates a general mix of multi-way select-join subscriptions. We also propose a simple and application-agnostic extension tocontent-drivennetworks (CN), which further improves sharing of dissemination costs. Experimental evaluations show that our schemes can generate orders of magnitude lower network traffic at very low processing cost. Our extension to CN can further reduce traffic by another order of magnitude, with almost no increase in notification latency. 1
AN OPTIMAL EVALUATION OF ”GROUPBY-JOIN ” QUERIES IN DISTRIBUTED ARCHITECTURES
"... SQL queries involving join and group-by operations are fairly common in many decision support applications where the size of the input relations is usually very large, so the parallelization of these queries is highly recommended in order to obtain a desirable response time. Several parallel algorit ..."
Abstract
- Add to MetaCart
SQL queries involving join and group-by operations are fairly common in many decision support applications where the size of the input relations is usually very large, so the parallelization of these queries is highly recommended in order to obtain a desirable response time. Several parallel algorithms that treat this kind of queries have been presented in the literature. However, their most significant drawbacks are that they are very sensitive to data skew and involve expansive communication and Input/Output costs in the evaluation of the join operation. In this paper, we present an algorithm that overcomes these drawbacks because it evaluates the ”GroupBy-Join ” query without the need of the direct evaluation of the costly join operation, thus reducing its Input/Output and communication costs. Furthermore, the performance of this algorithm is analyzed using the scalable and portable BSP (Bulk Synchronous Parallel) cost model which predicts a linear speedup even for highly skewed data. 1

