Results 1 -
6 of
6
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
"... DryadLINQ is a system and a set of language extensions that enable a new programming model for large scale distributed computing. It generalizes previous execution environments such as SQL, MapReduce, and Dryad in two ways: by adopting an expressive data model of strongly typed.NET objects; and by s ..."
Abstract
-
Cited by 89 (18 self)
- Add to MetaCart
DryadLINQ is a system and a set of language extensions that enable a new programming model for large scale distributed computing. It generalizes previous execution environments such as SQL, MapReduce, and Dryad in two ways: by adopting an expressive data model of strongly typed.NET objects; and by supporting general-purpose imperative and declarative operations on datasets within a traditional high-level programming language. A DryadLINQ program is a sequential program composed of LINQ expressions performing arbitrary sideeffect-free transformations on datasets, and can be written and debugged using standard.NET development tools. The DryadLINQ system automatically and transparently translates the data-parallel portions of the program into a distributed execution plan which is passed to the Dryad execution platform. Dryad, which has been in continuous operation for several years on production clusters made up of thousands of computers, ensures efficient, reliable execution of this plan. We describe the implementation of the DryadLINQ compiler and runtime. We evaluate DryadLINQ on a varied set of programs drawn from domains such as web-graph analysis, large-scale log mining, and machine learning. We show that excellent absolute performance can be attained—a general-purpose sort of 1012 Bytes of data executes in 319 seconds on a 240-computer, 960disk cluster—as well as demonstrating near-linear scaling of execution time on representative applications as we vary the number of computers used for a job. 1
Intrusion Detection using Continuous Time Bayesian Networks
"... Intrusion detection systems (IDSs) fall into two high-level categories: network-based systems (NIDS) that monitor network behaviors, and host-based systems (HIDS) that monitor system calls. In this work, we present a general technique for both systems. We use anomaly detection, which identifies patt ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Intrusion detection systems (IDSs) fall into two high-level categories: network-based systems (NIDS) that monitor network behaviors, and host-based systems (HIDS) that monitor system calls. In this work, we present a general technique for both systems. We use anomaly detection, which identifies patterns not conforming to a historic norm. In both types of systems, the rates of change vary dramatically over time (due to burstiness) and over components (due to service difference). To efficiently model such systems, we use continuous time Bayesian networks (CTBNs) and avoid specifying a fixed update interval common to discrete-time models. We build generative models from the normal training data, and abnormal behaviors are flagged based on their likelihood under this norm. For NIDS, we construct a hierarchical CTBN model for the network packet traces and use Rao-Blackwellized particle filtering to learn the parameters. We illustrate the power of our method through experiments on detecting real worms and identifying hosts on two publicly available network traces, the MAWI dataset and the LBNL dataset. For HIDS, we develop a novel learning method to deal with the finite resolution of system log file time stamps, without losing the benefits of our continuous time model. We demonstrate the method by detecting intrusions in the DARPA 1998 BSM dataset. 1.
Modeling Events with Cascades of Poisson Processes
"... We present a probabilistic model of events in continuous time in which each event triggers a Poisson process of successor events. The ensemble of observed events is thereby modeled as a superposition of Poisson processes. Efficient inference is feasible under this model with an EM algorithm. Moreove ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present a probabilistic model of events in continuous time in which each event triggers a Poisson process of successor events. The ensemble of observed events is thereby modeled as a superposition of Poisson processes. Efficient inference is feasible under this model with an EM algorithm. Moreover, the EM algorithm can be implemented as a distributed algorithm, permitting the model to be applied to very large datasets. We apply these techniques to the modeling of Twitter messages and the revision history of Wikipedia. 1
unknown title
"... Constellation: automated discovery of service and host dependencies in networked systems tvpitgdca03 ..."
Abstract
- Add to MetaCart
Constellation: automated discovery of service and host dependencies in networked systems tvpitgdca03
Conjoint Modeling of Temporal Dependencies in Event Streams
"... Many real world applications depend on modeling the temporal dynamics of streams of diverse events, many of which are rare. We introduce a novel model class, Conjoint Piecewise-Constant Conditional Intensity Models, and a learning algorithm that together yield a data-driven approach to parameter sha ..."
Abstract
- Add to MetaCart
Many real world applications depend on modeling the temporal dynamics of streams of diverse events, many of which are rare. We introduce a novel model class, Conjoint Piecewise-Constant Conditional Intensity Models, and a learning algorithm that together yield a data-driven approach to parameter sharing with the aim of better modeling such event streams. We empirically demonstrate that our approach yields more accurate models of two real world data sets: search query logs and data center system logs. 1

