Results 1 - 10
of
33
Gigascope: a stream database for network applications
- In SIGMOD
, 2003
"... We have developed Gigascope, a stream database for network applications including traffic analysis, intrusion detection, router configuration analysis, network research, network monitoring, and and performance monitoring and debugging. Gigascope is undergoing installation at many sites within the AT ..."
Abstract
-
Cited by 212 (13 self)
- Add to MetaCart
We have developed Gigascope, a stream database for network applications including traffic analysis, intrusion detection, router configuration analysis, network research, network monitoring, and and performance monitoring and debugging. Gigascope is undergoing installation at many sites within the AT&T network, including at OC48 routers, for detailed monitoring. In this paper we describe our motivation for and constraints in developing Gigascope, the Gigascope architecture and query language, and performance issues. We conclude with a discussion of stream database research problems we have found in our application. 1.
Issues in Data Stream Management
, 2003
"... Traditional databases store sets of relatively static records with no pre-defined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require sup ..."
Abstract
-
Cited by 105 (5 self)
- Add to MetaCart
Traditional databases store sets of relatively static records with no pre-defined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require support for online analysis of rapidly changing data streams. Limitations of traditional DBMSs in supporting streaming applications have been recognized, prompting research to augment existing technologies and build new systems to manage streaming data. The purpose of this paper is to review recent work in data stream management systems, with an emphasis on application requirements, data models, continuous query languages, and query evaluation.
On the Design and Use of Internet Sinks for Network Abuse Monitoring
- In Proceedings of the 7 th International Symposium on Recent Advances in Intrusion Detection (RAID
, 2004
"... Monitoring unused or dark IP addresses offers opportunities to significantly improve and expand knowledge of abuse activity without many of the problems associated with typical network intrusion detection and firewall systems. ..."
Abstract
-
Cited by 68 (11 self)
- Add to MetaCart
Monitoring unused or dark IP addresses offers opportunities to significantly improve and expand knowledge of abuse activity without many of the problems associated with typical network intrusion detection and firewall systems.
Comparing data streams using hamming norms (how to zero in
- In VLDB
, 2002
"... Abstract—Massive data streams are now fundamental to many data processing applications. For example, Internet routers produce large scale diagnostic data streams. Such streams are rarely stored in traditional databases and instead must be processed “on the fly” as they are produced. Similarly, senso ..."
Abstract
-
Cited by 59 (6 self)
- Add to MetaCart
Abstract—Massive data streams are now fundamental to many data processing applications. For example, Internet routers produce large scale diagnostic data streams. Such streams are rarely stored in traditional databases and instead must be processed “on the fly” as they are produced. Similarly, sensor networks produce multiple data streams of observations from their sensors. There is growing focus on manipulating data streams and, hence, there is a need to identify basic operations of interest in managing data streams, and to support them efficiently. We propose computation of the Hamming norm as a basic operation of interest. The Hamming norm formalizes ideas that are used throughout data processing. When applied to a single stream, the Hamming norm gives the number of distinct items that are present in that data stream, which is a statistic of great interest in databases. When applied to a pair of streams, the Hamming norm gives an important measure of (dis)similarity: the number of unequal item counts in the two streams. Hamming norms have many uses in comparing data streams. We present a novel approximation technique for estimating the Hamming norm for massive data streams; this relies on what we call the “l0 sketch ” and we prove its accuracy. We test our approximation method on a large quantity of synthetic and real stream data, and show that the estimation is accurate to within a few percentage points. Index Terms—Data stream analysis, approximate query processing, data structures and algorithms, data reduction. 1
Finding hierarchical heavy hitters in data streams
- In Proc. of VLDB
, 2003
"... Aggregation along hierarchies is a critical summary technique in a large variety of online applications including decision support, and network management (e.g., IP clustering, denial-of-service attack monitoring). Despite the amount of recent study that has been dedicated to online aggregation on s ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
Aggregation along hierarchies is a critical summary technique in a large variety of online applications including decision support, and network management (e.g., IP clustering, denial-of-service attack monitoring). Despite the amount of recent study that has been dedicated to online aggregation on sets (e.g., quantiles, hot items), surprisingly little attention has been paid to summarizing hierarchical structure in stream data. The problem we study in this paper is that of finding Hierarchical Heavy Hitters (HHH): given a hierarchy and a fraction φ, we want to find all HHH nodes that have a total number of descendants in the data stream larger than φ of the total number of elements in the data stream, after discounting the descendant nodes that are HHH nodes. The resulting summary gives a topological “cartogram ” of the hierarchical data. We present deterministic and randomized algorithms for finding HHHs, which builds upon existing techniques by incorporating the hierarchy into the algorithms. Our experiments demonstrate several factors of improvement in accuracy over
Identifying Frequent Items in Sliding Windows over
- IN PROCEEDINGS OF THE INTERNET MEASUREMENT CONFERENCE
, 2003
"... ..."
Holistic UDAFs at streaming speeds
- In SIGMOD
, 2004
"... Many algorithms have been proposed to approximate holistic aggregates, such as quantiles and heavy hitters, over data streams. However, little work has been done to explore what techniques are required to incorporate these algorithms in a data stream query processor, and to make them useful in pract ..."
Abstract
-
Cited by 22 (12 self)
- Add to MetaCart
Many algorithms have been proposed to approximate holistic aggregates, such as quantiles and heavy hitters, over data streams. However, little work has been done to explore what techniques are required to incorporate these algorithms in a data stream query processor, and to make them useful in practice. In this paper, we study the performance implications of using user-defined aggregate functions (UDAFs) to incorporate selectionbased and sketch-based algorithms for holistic aggregates into a data stream management system’s query processing architecture. We identify key performance bottlenecks and tradeoffs, and propose novel techniques to make these holistic UDAFs fast and spaceefficient for use in high-speed data stream applications. We evaluate performance using generated and actual IP packet data, focusing on approximating quantiles and heavy hitters. The best of our current implementations can process streaming queries at OC48 speeds (2x 2.4Gbps). 1.
A heartbeat mechanism and its application in Gigascope
- In VLDB
, 2005
"... Data stream management systems often rely on ordering properties of tuple attributes in order to implement non-blocking operators. However, query operators that work with multiple streams, such as stream merge or join, can often still block if one of the input stream is very slow or bursty. In princ ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
Data stream management systems often rely on ordering properties of tuple attributes in order to implement non-blocking operators. However, query operators that work with multiple streams, such as stream merge or join, can often still block if one of the input stream is very slow or bursty. In principle, punctuation and heartbeat mechanisms have been proposed to unblock streaming operators. In practice, it is a challenge to incorporate such mechanisms into a highperformance stream management system that is operational in an industrial application. In this paper, we introduce a system for punctuation-carrying heartbeat generation that we developed for Gigascope, a high-performance streaming database for network monitoring, that is operationally used within AT&T's IP backbone. We show how heartbeats can be regularly generated by low-level nodes in query execution plans and propagated upward unblocking all streaming operators on its way. Additionally, our heartbeat mechanism can be used for other applications in distributed settings such as detecting node failures, performance monitoring, and query optimization. A performance evaluation using live data feeds shows that our system is capable of working at multiple Gigabit line speeds in a live, industrial deployment and can significantly decrease the query memory utilization.
Sampling for Passive Internet Measurement: A Review
- Statistical Science
, 2004
"... Abstract. Sampling has become an integral part of passive network measurement. This role is driven by the need to control the consumption of resources in the measurement infrastructure under increasing traffic rates and the demand for detailed measurements from applications and service providers. Cl ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract. Sampling has become an integral part of passive network measurement. This role is driven by the need to control the consumption of resources in the measurement infrastructure under increasing traffic rates and the demand for detailed measurements from applications and service providers. Classical sampling methods play an important role in the current practice of Internet measurement. The aims of this review are (i) to explain the classical sampling methodology in the context of the Internet to readers who are not necessarily acquainted with either, (ii) to give an account of newer applications and sampling methods for passive measurement and (iii) to identify emerging areas that are ripe for the application of statistical expertise. Key words and phrases: Traffic measurement, network management, sampling methods, estimation, packets, flows.

