Results 1 - 10
of
25
Paragraph: Thwarting signature learning by training maliciously
- In Proc. Recent Advances in Intrusion Detection: 9th International Symposium (RAID) (2006
"... Abstract. Defending a server against Internet worms and defending a user’s email inbox against spam bear certain similarities. In both cases, a stream of samples arrives, and a classifier must automatically determine whether each sample falls into a malicious target class (e.g., worm network traffic ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
Abstract. Defending a server against Internet worms and defending a user’s email inbox against spam bear certain similarities. In both cases, a stream of samples arrives, and a classifier must automatically determine whether each sample falls into a malicious target class (e.g., worm network traffic, or spam email). A learner typically generates a classifier automatically by analyzing two labeled training pools: one of innocuous samples, and one of samples that fall in the malicious target class. Learning techniques have previously found success in settings where the content of the labeled samples used in training is either random, or even constructed by a helpful teacher, who aims to speed learning of an accurate classifier. In the case of learning classifiers for worms and spam, however, an adversary controls the content of the labeled samples to a great extent. In this paper, we describe practical attacks against learning, in which an adversary constructs labeled samples that, when used to train a learner, prevent or severely delay generation of an accurate classifier. We show that even a delusive adversary, whose samples are all correctly labeled, can obstruct learning. We simulate and implement highly effective instances of these attacks against the Polygraph [15] automatic polymorphic worm signature generation algorithms. Key words: automatic signature generation, machine learning, worm, spam 1
Allergy attack against automatic signature generation
- in Proc. of RAID
, 2006
"... Abstract. Research in systems that automatically generate signatures to filter out zero-day worm instances at perimeter defense has received a lot of attention recently. While a well known problem with these systems is that the signatures generated are usually not very useful against polymorphic wor ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
Abstract. Research in systems that automatically generate signatures to filter out zero-day worm instances at perimeter defense has received a lot of attention recently. While a well known problem with these systems is that the signatures generated are usually not very useful against polymorphic worms, we shall in this paper investigate a different, and potentially more serious problem facing automatic signature generation systems: attacks that manipulate the signature generation system and turn it into an active agent for DoS attack against the protected system. We call this new attack the “allergy attack”. This type of attack should be anticipated and has in fact been an issue in the context of “detraining” in machine learning. However, we have not seen a demonstration of its practical impact in real intrusion detection/prevention systems. In this paper, we shall demonstrate the practical impact of “allergy attacks”. 1
Incentive compatible regression learning
- IN THE ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA
, 2008
"... We initiate the study of incentives in a general machine learning framework. We focus on a game-theoretic regression learning setting where private information is elicited from multiple agents with different, possibly conflicting, views on how to label the points of an input space. This conflict pot ..."
Abstract
-
Cited by 19 (11 self)
- Add to MetaCart
We initiate the study of incentives in a general machine learning framework. We focus on a game-theoretic regression learning setting where private information is elicited from multiple agents with different, possibly conflicting, views on how to label the points of an input space. This conflict potentially gives rise to untruthfulness on the part of the agents. In the restricted but important case when every agent cares about a single point, and under mild assumptions, we show that agents are motivated to tell the truth. In a more general setting, we study the power and limitations of mechanisms without payments. We finally establish that, in the general setting, the VCG mechanism goes a long way in guaranteeing truthfulness and economic efficiency.
Exploiting machine learning to subvert your spam filter
- In Proceedings of the First Workshop on Large-scale Exploits and Emerging Threats (LEET
, 2008
"... Using statistical machine learning for making security decisions introduces new vulnerabilities in large scale systems. This paper shows how an adversary can exploit statistical machine learning, as used in the SpamBayes spam filter, to render it useless—even if the adversary’s access is limited to ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
Using statistical machine learning for making security decisions introduces new vulnerabilities in large scale systems. This paper shows how an adversary can exploit statistical machine learning, as used in the SpamBayes spam filter, to render it useless—even if the adversary’s access is limited to only 1 % of the training messages. We further demonstrate a new class of focused attacks that successfully prevent victims from receiving specific email messages. Finally, we introduce two new types of defenses against these attacks. 1
Limits of Learning-based Signature Generation with Adversaries
"... Automatic signature generation is necessary because there may often be little time between the discovery of a vulnerability, and exploits developed to target the vulnerability. Much research effort has focused on patternextraction techniques to generate signatures. These have included techniques tha ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Automatic signature generation is necessary because there may often be little time between the discovery of a vulnerability, and exploits developed to target the vulnerability. Much research effort has focused on patternextraction techniques to generate signatures. These have included techniques that look for a single large invariant substring of the byte sequences, as well as techniques that look for many short invariant substrings. Pattern-extraction techniques are attractive because signatures can be generated and matched efficiently, and earlier work has shown the existence of invariants in exploits. In this paper, we show fundamental limits on the accuracy of pattern-extraction algorithms for signaturegeneration in an adversarial setting. We formulate a framework that allows a unified analysis of these algorithms, and prove lower bounds on the number of mistakes any patternextraction learning algorithm must make under common assumptions, by showing how to adapt results from learning theory. While previous work has targeted specific algorithms, our work generalizes these attacks through theoretical analysis to any algorithm with similar assumptions, not just the techniques developed so far. We also analyze when pattern-extraction algorithms may work, by showing conditions under which these lower bounds are weakened. Our results are applicable to other kinds of signature-generation algorithms as well, those that use properties of the exploit that can be manipulated. 1
Design and Evaluation of a Real-Time URL Spam Filtering Service
"... On the heels of the widespread adoption of web services such as social networks and URL shorteners, scams, phishing, and malware have become regular threats. Despite extensive research, email-based spam filtering techniques generally fall short for protecting other web services. To better address th ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
On the heels of the widespread adoption of web services such as social networks and URL shorteners, scams, phishing, and malware have become regular threats. Despite extensive research, email-based spam filtering techniques generally fall short for protecting other web services. To better address this need, we present Monarch, a real-time system that crawls URLs as they are submitted to web services and determines whether the URLs direct to spam. We evaluate the viability of Monarch and the fundamental challenges that arise due to the diversity of web service spam. We show that Monarch can provide accurate, real-time protection, but that the underlying characteristics of spam do not generalize across web services. In particular, we find that spam targeting email qualitatively differs in significant ways from spam campaigns targeting Twitter. We explore the distinctions between email and Twitter spam, including the abuse of public web hosting and redirector services. Finally, we demonstrate Monarch’s scalability, showing our system could protect a service such as Twitter— which needs to process 15 million URLs/day—for a bit under $800/day.
Large-Scale Automatic Classification of Phishing Pages
"... Phishing websites, fraudulent sites that impersonate a trusted third party to gain access to private data, continue to cost Internet users over a billion dollars each year. In this paper, we describe the design and performance characteristics of a scalable machine learning classifier we developed to ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Phishing websites, fraudulent sites that impersonate a trusted third party to gain access to private data, continue to cost Internet users over a billion dollars each year. In this paper, we describe the design and performance characteristics of a scalable machine learning classifier we developed to detect phishing websites. We use this classifier to maintain Google’s phishing blacklist automatically. Our classifier analyzes millions of pages a day, examining the URL and the contents of a page to determine whether or not a page is phishing. Unlike previous work in this field, we train the classifier on a noisy dataset consisting of millionsofsamplesfrompreviously collectedliveclassification data. Despite the noise in the training data, our classifier learns a robust model for identifying phishing pages which correctly classifies more than 90 % of phishing pages several weeks after training concludes.
ANTIDOTE: Understanding and Defending against Poisoning of Anomaly Detectors
"... Statistical machine learning techniques have recently garnered increased popularity as a means to improve network design and security. For intrusion detection, such methods build a model for normal behavior from training data and detect attacks as deviations from that model. This process invites adv ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Statistical machine learning techniques have recently garnered increased popularity as a means to improve network design and security. For intrusion detection, such methods build a model for normal behavior from training data and detect attacks as deviations from that model. This process invites adversaries to manipulate the training data so that the learned model fails to detect subsequent attacks. We evaluate poisoning techniques and develop a defense, in the context of a particular anomaly detector—namely the PCA-subspace method for detecting anomalies in backbone networks. For three poisoning schemes, we show how attackers can substantially increase their chance of successfully evading detection by only adding moderate amounts of poisoned data. Moreover such poisoning throws off the balance between false positives and false negatives thereby dramatically reducing the efficacy of the detector. To combat these poisoning activities, we propose an antidote based on techniques from robust statistics and present a new robust PCA-based detector. Poisoning has little effect on the robust model, whereas it significantly distorts the model produced by the original PCA method. Our technique substantially reduces the effectiveness of poisoning for a variety of scenarios and indeed maintains a significantly better balance between false positives and false negatives than the original method when under attack.
ABSTRACT iTrustPage: A User-Assisted Anti-Phishing Tool
"... Despite the many solutions proposed by industry and the research community to address phishing attacks, this problem continues to cause enormous damage. Because of our inability to deter phishing attacks, the research community needs to develop new approaches to anti-phishing solutions. Most of toda ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Despite the many solutions proposed by industry and the research community to address phishing attacks, this problem continues to cause enormous damage. Because of our inability to deter phishing attacks, the research community needs to develop new approaches to anti-phishing solutions. Most of today’s anti-phishing technologies focus on automatically detecting and preventing phishing attacks. While automation makes anti-phishing tools user-friendly, automation also makes them suffer from false positives, false negatives, and various practical hurdles. As a result, attackers often find simple ways to escape automatic detection. This paper presents iTrustPage – an anti-phishing tool that does not rely completely on automation to detect phishing. Instead, iTrustPage relies on user input and external repositories of information to prevent users from filling out phishing Web forms. With iTrustPage, users help to decide whether or not a Web page is legitimate. Because iTrustPage is user-assisted, iTrustPage avoids the false positives and the false negatives associated with automatic phishing detection. We implemented iTrustPage as a downloadable extension to FireFox. After being featured on the Mozilla website for FireFox extensions, iTrustPage was downloaded by more than 5,000 users in a two week period. We present an analysis of our tool’s effectiveness and ease of use based on our examination of usage logs collected from the 2,050 users who used iTrustPage for more than two weeks. Based on these logs, we find that iTrustPage disrupts users on fewer than 2 % of the pages they visit, and the number of disruptions decreases over time.

