Results 1 -
7 of
7
Rb-seeker: Auto-detection of redirection botnets
- In Network & Distributed System Security Symposium
, 2009
"... A Redirection Botnet (RBnet) is a vast collection of compromised computers (called bots) used as a redirection/proxy infrastructure and under the control of a botmaster. We present the design, implementation and evaluation of a system called Redirection Botnet Seeker (RB-Seeker) for automatic detect ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
A Redirection Botnet (RBnet) is a vast collection of compromised computers (called bots) used as a redirection/proxy infrastructure and under the control of a botmaster. We present the design, implementation and evaluation of a system called Redirection Botnet Seeker (RB-Seeker) for automatic detection of RBnets by utilizing three cooperating subsystems. Two of the subsystems are used to generate a database of domains participating in redirection: one detects redirection bots by following links embedded in spam emails, and the other detects redirection behavior based on network traces at a large university edge router using sequential hypothesis testing. The database of redirection domains generated by these two subsystems is fed into the final subsystem, which then performs DNS query probing on the domains over time. Based on certain behavioral attributes extracted from the DNS queries, the final subsystem makes use of a 2-tier detection strategy utilizing hyperplane decision functions. This allows it to quickly identify aggressive RBnets with a low false-positive rate (< 0.008%), while also accurately detecting stealthy RBnets (i.e., those mimicking valid DNS behavior, such as CDNs) by monitoring their behavior over time. Using DNS behavior as a means of detecting RBnets, RB-Seeker is impervious to the botmaster’s choice of Command-and-Control (C&C) channel (i.e., how the botmaster communicates and controls the bots) or use of encryption. 1
A FLOW BASED APPROACH FOR SSH TRAFFIC DETECTION
"... Abstract — The basic objective of this work is to assess the utility of two supervised learning algorithms AdaBoost and RIPPER for classifying SSH traffic from log files without using features such as payload, IP addresses and source/destination ports. Pre-processing is applied to the traffic data t ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract — The basic objective of this work is to assess the utility of two supervised learning algorithms AdaBoost and RIPPER for classifying SSH traffic from log files without using features such as payload, IP addresses and source/destination ports. Pre-processing is applied to the traffic data to express as traffic flows. Results of 10-fold cross validation for each learning algorithm indicate that a detection rate of 99 % and a false positive rate of 0.7 % can be achieved using RIPPER. Moreover, promising preliminary results were obtained when RIPPER was employed to identify which service was running over SSH. Thus, it is possible to detect SSH traffic with high accuracy without using features such as payload, IP addresses and source/destination ports, where this represents a particularly useful characteristic when requiring generic, scalable solutions. I.
On the 95-percentile billing method
"... Abstract. The 95-percentile method is used widely for billing ISPs and websites. In this work, we characterize important aspects of the 95-percentile method using a large set of traffic traces. We first study how the 95-percentile depends on the aggregation window size. We observe that the computed ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. The 95-percentile method is used widely for billing ISPs and websites. In this work, we characterize important aspects of the 95-percentile method using a large set of traffic traces. We first study how the 95-percentile depends on the aggregation window size. We observe that the computed value often follows a noisy decreasing trend along a convex curve as the window size increases. We provide theoretical justification for this dependence using the self-similar model for Internet traffic and discuss observed more complex dependencies in which the 95-percentile increases with the window size. Secondly, we quantify how variations on the window size affect the computed 95-percentile. In our experiments, we find that reasonable differences in the window size can account for an increase between 4.1 % and 42.5 % in the monthly bill of medium and low-volume sites. In contrast, for sites with average traffic rates above 10Mbps the fluctuation of the 95-percentile is bellow 2.9%. Next, we focus on the use of flow data in hosting environments for billing individual sites. We describe the byte-shifting effect introduced by flow aggregation and quantify how it can affect the computed 95-percentile. We find that in our traces it can both decrease and increase the computed 95-percentile with the largest change being a decrease of 9.3%. 1
Building a Better Mousetrap
, 2007
"... Routers in the network core are unable to maintain detailed statistics for every packet; thus, traffic statistics are often based on packet sampling, which reduces accuracy. Because tracking large (“heavy-hitter”) traffic flows is important both for pricing and for traffic engineering, much attentio ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Routers in the network core are unable to maintain detailed statistics for every packet; thus, traffic statistics are often based on packet sampling, which reduces accuracy. Because tracking large (“heavy-hitter”) traffic flows is important both for pricing and for traffic engineering, much attention has focused on maintaining accurate statistics for such flows, often at the expense of small-volume flows. Eradicating these smaller flows makes it difficult to observe communication structure, which is sometimes more important than maintaining statistics about flowsizes. This paper presentsFlexSample, a sampling framework that allows network operators to get the best of both worlds: For a fixed sampling budget, FlexSample can capture significantly more small-volume flows for only a small increase in relative error of large traffic flows. FlexSample uses a fast, lightweight counter array that provides a coarse estimate of the size (“class”) of each traffic flow; a router then can sample at different rates according to the class of the traffic using any existing sampling strategy. Given a fixed sampling rate and a target fraction of sampled packets to allocate across traffic classes, FlexSample computes packet sampling rates for each class that achieve these allocationsonline. Through analysis and trace-based experiments, we find that FlexSample can extract more communication structure, and can capture at least 50 % more mouse flows, than strategies that do not perform class-dependent packet sampling. We also show how FlexSample can be used to capture unique flows for specific applications.
WebClass: Adding rigor to manual labeling of traffic anomalies
- SIGCOMM Comput. Commun. Rev
, 2008
"... This article is an editorial note submitted to CCR. It has NOT been peer reviewed. Authors take full responsibility for this article’s technical content. Comments can be posted through CCR Online. Despite the flurry of anomaly-detection papers in recent years, effective ways to validate and compare ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This article is an editorial note submitted to CCR. It has NOT been peer reviewed. Authors take full responsibility for this article’s technical content. Comments can be posted through CCR Online. Despite the flurry of anomaly-detection papers in recent years, effective ways to validate and compare proposed solutions have remained elusive. We argue that evaluating anomaly detectors on manually labeled traces is both important and unavoidable. In particular, it is important to evaluate detectors on traces from operational networks because it is in this setting that the detectors must ultimately succeed. In addition, manual labeling of such traces is unavoidable because new anomalies will be identified and characterized from manual inspection long before there are realistic models for them. It is well known, however, that manual labeling is slow and error-prone. In order to mitigate these challenges, we present WebClass, a web-based infrastructure that adds rigor to the manual labeling process. WebClass allows researchers to share, inspect, and label traffic timeseries through a common graphical user interface. We are releasing WebClass to the research community in the hope that it will foster greater collaboration in creating labeled traces and that the traces will be of higher quality because the entire community has access to all the information that led to a given label.
Privacy-Preserving Collaborative Anomaly Detection
"... Unwanted traffic is a major concern in the Internet today. Unwanted traffic includes Denial of Service attacks, worms, and spam. Identifying and mitigating unwanted traffic costs businesses many billions of USD every year. The process of identifying this traffic is called anomaly detection, and Intr ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Unwanted traffic is a major concern in the Internet today. Unwanted traffic includes Denial of Service attacks, worms, and spam. Identifying and mitigating unwanted traffic costs businesses many billions of USD every year. The process of identifying this traffic is called anomaly detection, and Intrusion Detection Systems (IDS’es) are among the most prevalent techniques. IDS’es, such as Snort, allow users to write “rules ” that specify the properties of traffic that should be detected and the corrective action to be taken in response. Unfortunately, applying these rules in an online setting can be prohibitively expensive for large networks, such as Tier-1 ISPs, which may have tens of thousands of links and many Gbps of traffic. In the first chapter of this thesis we present a system that leverages machine learning algorithms to detect the same type of unwanted traffic as Snort, but on summarized data for faster processing. Our results demonstrate that this system can effectively learn to classify many Snort rules with a high degree of accuracy. Unfortunately, distinguishing good traffic from unwanted traffic is challenging even in an offline setting because many types of unwanted traffic traffic, such as
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, ACCEPTED FOR PUBLICATION 1 LatLong: Diagnosing Wide-Area Latency Changes for CDNs
"... Abstract—Minimizing user-perceived latency is crucial for Content Distribution Networks (CDNs) hosting interactive services. Latency may increase for many reasons, such as interdomain routing changes and the CDN’s own load-balancing policies. CDNs need greater visibility into the causes of latency i ..."
Abstract
- Add to MetaCart
Abstract—Minimizing user-perceived latency is crucial for Content Distribution Networks (CDNs) hosting interactive services. Latency may increase for many reasons, such as interdomain routing changes and the CDN’s own load-balancing policies. CDNs need greater visibility into the causes of latency increases, so they can adapt by directing traffic to different servers or paths. In this paper, we propose a tool for CDNs to diagnose large latency increases, based on passive measurements of performance, traffic, and routing. Separating the many causes from the effects is challenging. We propose a decision tree for classifying latency changes, and determine how to distinguish traffic shifts from increases in latency for existing servers, routers, and paths. Another challenge is that network operators group related clients to reduce measurement and control overhead, but the clients in a region may use multiple servers and paths during a measurement interval. We propose metrics that quantify the latency contributions across sets of servers and routers. Based on the design, we implement the LatLong tool for diagnosing large latency increases for CDN. We use LatLong to analyze a month of data from Google’s CDN, and find that nearly 1% of the daily latency changes increase delay by more than 100 msec. Note that the latency increase of 100 msec is significant, since these are daily averages over groups of clients, and we only focus on latency-sensitive traffic for our study. More than 40 % of these increases coincide with interdomain routing changes, and more than one-third involve a shift in traffic to different servers. This is the first work to diagnose latency problems in a large, operational CDN from purely passive measurements. Through case studies of individual events, we identify research challenges for managing wide-area latency for CDNs. Index Terms—Network diagnosis, latency increases, content distribution networks (CDNs). I.

