Results 1 - 10
of
31
Network-aware operator placement for stream-processing systems
- In ICDE
, 2006
"... To use their pool of resources efficiently, distributed stream-processing systems push query operators to nodes within the network. Currently, these operators, ranging from simple filters to custom business logic, are placed manually at intermediate nodes along the transmission path to meet applicat ..."
Abstract
-
Cited by 82 (7 self)
- Add to MetaCart
To use their pool of resources efficiently, distributed stream-processing systems push query operators to nodes within the network. Currently, these operators, ranging from simple filters to custom business logic, are placed manually at intermediate nodes along the transmission path to meet application-specific performance goals. Determining placement locations is challenging because network and node conditions change over time and because streams may interact with each other, opening venues for reuse and repositioning of operators. This paper describes a stream-based overlay network (SBON), a layer between a stream-processing system and the physical network that manages operator placement for stream-processing systems. Our design is based on a cost space, an abstract representation of the network and on-going streams, which permits decentralized, large-scale multi-query optimization decisions. We present an evaluation of the SBON approach through simulation, experiments on PlanetLab, and an integration with Borealis, an existing stream-processing engine. Our results show that an SBON consistently improves network utilization, provides low stream latency, and enables dynamic optimization at low engineering cost.
Operator Placement for In-Network Stream Query Processing
- In Proc. the 24th ACM SIGACT-SIGMOD-SIGART Symposium on Principle of Database Systems(PODS
, 2005
"... In sensor networks, data acquisition frequently takes place at low-capability devices. The acquired data is then transmitted through a hierarchy of nodes having progressively increasing network bandwidth and computational power. We consider the problem of executing queries over these data streams, p ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
In sensor networks, data acquisition frequently takes place at low-capability devices. The acquired data is then transmitted through a hierarchy of nodes having progressively increasing network bandwidth and computational power. We consider the problem of executing queries over these data streams, posed at the root of the hierarchy. To minimize data transmission, it is desirable to perform “in-network ” query processing: do some part of the work at intermediate nodes as the data travels to the root. Most previous work on in-network query processing has focused on aggregation and inexpensive filters. In this paper, we address in-network processing for queries involving possibly expensive conjunctive filters, and joins. We consider the problem of placing operators along the nodes of the hierarchy so that the overall cost of computation and data transmission is minimized. We show that the problem is tractable, give an optimal algorithm, and demonstrate that a simpler greedy operator placement algorithm can fail to find the optimal solution. Finally we define a number of interesting variations of the basic operator placement problem and demonstrate their hardness. 1
Design Considerations for High Fan-in Systems: The HiFi Approach
- In CIDR
, 2005
"... Advances in data acquisition and sensor technologies are leading towards the development of “high fan-in ” architectures: widely distributed systems whose edges consist of numerous receptors such as sensor networks, RFID readers, or probes, and whose interior nodes are traditional host computers org ..."
Abstract
-
Cited by 46 (7 self)
- Add to MetaCart
Advances in data acquisition and sensor technologies are leading towards the development of “high fan-in ” architectures: widely distributed systems whose edges consist of numerous receptors such as sensor networks, RFID readers, or probes, and whose interior nodes are traditional host computers organized using the principles of cascading streams and successive aggregation. Examples include RFID-enabled supply chain management, largescale environmental monitoring, and various types of network and computing infrastructure monitoring. In this paper, we identify the key characteristics and data management challenges presented by high fan-in systems, and argue for a uniform, query-based approach towards addressing them. We then present our initial design concepts behind HiFi, the system we are building to embody these ideas, and describe a proof-of-concept prototype. 1.
Cobra: Content-based filtering and aggregation of blogs and rss feeds
- In Proc. NSDI ’07
"... Blogs and RSS feeds are becoming increasingly popular. The blogging site LiveJournal has over 11 million user accounts, and according to one report, over 1.6 million postings are made to blogs every day. The “Blogosphere ” is a new hotbed of Internet-based media that represents a shift from mostly s ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Blogs and RSS feeds are becoming increasingly popular. The blogging site LiveJournal has over 11 million user accounts, and according to one report, over 1.6 million postings are made to blogs every day. The “Blogosphere ” is a new hotbed of Internet-based media that represents a shift from mostly static content to dynamic, continuously-updated discussions. The problem is that finding and tracking blogs with interesting content is an extremely cumbersome process. In this paper, we present Cobra (Content-Based RSS Aggregator), a system that crawls, filters, and aggregates vast numbers of RSS feeds, delivering to each user a personalized feed based on their interests. Cobra consists of a three-tiered network of crawlers that scan web feeds, filters that match crawled articles to user subscriptions, and reflectors that provide recently-matching articles on each subscription as an RSS feed, which can be browsed using a standard RSS reader. We present the design, implementation, and evaluation of Cobra in three settings: a dedicated cluster, the Emulab testbed, and on PlanetLab. We present a detailed performance study of the Cobra system, demonstrating that the system is able to scale well to support a large number of source feeds and users; that the mean update detection latency is low (bounded by the crawler rate); and that an offline service provisioning step combined with several performance optimizations are effective at reducing memory usage and network load. 1
Wide-Scale Data Stream Management
"... This paper describes Mortar, a distributed stream processing platform for building very large queries across federated systems (enterprises, grids, datacenters, testbeds). Nodes in such systems can be queried for distributed debugging, application control and provisioning, anomaly detection, and mea ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
This paper describes Mortar, a distributed stream processing platform for building very large queries across federated systems (enterprises, grids, datacenters, testbeds). Nodes in such systems can be queried for distributed debugging, application control and provisioning, anomaly detection, and measurement. We address the primary challenges of managing continuous queries that have thousands of wide-area sources that may periodically be down, disconnected, or overloaded, e.g., multiple data centers filled with cheap PCs, Internet testbeds such as Planetlab, or country-wide sensor installations. Mortar presents a clean-slate design for best-effort innetwork processing. For each query, it builds multiple, static overlays and leverages the union of overlay paths to provide resilient query installation and data routing. Further, a unique data management scheme mitigates the impact of clock skew on distributed stream processing, reducing result latency by a factor of 8, and allows users to specify custom in-network operators that transparently benefit from multipath routing. When compared to a contemporary distributed snapshot querying substrate, Mortar uses 78 % less bandwidth while providing increased query resolution, responsiveness, and accuracy during failures. 1
Path Optimization in Stream-Based Overlay Networks
, 2004
"... The emergence of sensor networks and distributed applications that generate data streams has created a need for Internet overlays designed for streaming data. Such stream-based overlay network (SBONs) consist of a set of Internet hosts that collect, process, and deliver stream-based data to multiple ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
The emergence of sensor networks and distributed applications that generate data streams has created a need for Internet overlays designed for streaming data. Such stream-based overlay network (SBONs) consist of a set of Internet hosts that collect, process, and deliver stream-based data to multiple applications. A key challenge in the design and implementation of SBONs is efficient path optimization when mapping logical query streams to physical network hosts and paths. Suboptimal placements can induce poor utilization of network resources, leading to severe performance penalties, link saturation, and network hotspots. Our goal is to realize efficient stream placement that takes the physical topology of the Internet into account, thereby minimizing overall network utilization. In this paper, we describe a novel, network-aware path optimization algorithm for stream-based overlay networks. Our approach is based on a spring relaxation model that operates in a metric space defined by the
Alleviating hot-spots in peer-to-peer stream processing environments
- IN DBISP2P
, 2007
"... Many emerging distributed applications require the processing of massive amounts of data in real-time. As a result, distributed stream processing systems have been introduced, offering a scalable and efficient means of in-network processing. Managing however the load among the nodes of such a large ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Many emerging distributed applications require the processing of massive amounts of data in real-time. As a result, distributed stream processing systems have been introduced, offering a scalable and efficient means of in-network processing. Managing however the load among the nodes of such a large-scale, dynamic system in real-time is challenging. The peer-to-peer paradigm can help address these challenges via self-organization. We describe a self-managing architecture for identifying and alleviating hot-spots in a peer-to-peer stream processing environment. Resource monitoring and hot-spot detection are carried out by all peers independently, building upon a completely decentralized architecture. To alleviate hot-spots we empower peers to autonomously migrate the execution of stream processing components using a non-disruptive migration protocol. We have integrated our techniques in Synergy, our distributed stream processing middleware. The experimental evaluation of our implementation over PlanetLab demonstrates substantial performance benefits for distributed stream processing applications, with moderate monitoring and migration overheads.
Hot-spot prediction and alleviation in distributed stream processing applications
- In Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN
, 2008
"... Many emerging distributed applications require the realtime processing of large amounts of data that are being updated continuously. Distributed stream processing systems offer a scalable and efficient means of in-network processing of such data streams. However, the large scale and the distributed ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Many emerging distributed applications require the realtime processing of large amounts of data that are being updated continuously. Distributed stream processing systems offer a scalable and efficient means of in-network processing of such data streams. However, the large scale and the distributed nature of such systems, as well as the fluctuation of their load render it difficult to ensure that distributed stream processing applications meet their Quality of Service demands. We describe a decentralized framework for proactively predicting and alleviating hot-spots in distributed stream processing applications in real-time. We base our hot-spot prediction techniques on statistical forecasting methods, while for hot-spot alleviation we employ a non-disruptive component migration protocol. The experimental evaluation of our techniques, implemented in our Synergy distributed stream processing middleware over PlanetLab, using a real stream processing application operating on real streaming data, demonstrates high prediction accuracy and substantial performance benefits. 1.
Instruction Issue Logic in Pipelined Supercomputers
- In COMSOC’06: International Workshop on Computational Social Choice
, 2006
"... In this paper we deal with the problem of optimally placing a set of query operators in an overlay network. Each user is interested in performing a query on streaming data and each query has an associated set of innetwork operators that filter, aggregate and process the data in various ways. Each us ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
In this paper we deal with the problem of optimally placing a set of query operators in an overlay network. Each user is interested in performing a query on streaming data and each query has an associated set of innetwork operators that filter, aggregate and process the data in various ways. Each user has private information about the operators associated with a query and about the utility from different combinations of operator placements. Each server in the overlay network is able to perform some set of operators, and servers differ in their network and computational characteristics. We model this problem as a Distributed Constraint Optimization Problem (DCOP), and apply the M-DPOP algorithm from Petcu et al. [19], executed here by clients associated with users and situated at nodes on the overlay network. M-DPOP makes truth-telling an ex-post Nash equilibrium and determines the social-welfare maximizing placement of operators to servers. No client can benefit by deviating from the M-DPOP algorithm and nodes need only communicate with other nodes that have an interest in placing an operator on the same server. The only central authority required is a bank that can extract payments from users. Preliminary results from simulation show that message size will be a bottleneck in applying M-DPOP to operator placement unless structure can be enforced and then exploited. 1
Distributed Evaluation of Continuous Equi-join Queries over Large Structured Overlay Networks
- In ICDE 2006
, 2005
"... ..."

