Results 1 - 10
of
25
Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs
- Proceedingsof theSIGKDD Conference. Paris,France
, 2009
"... Malicious Web sites are a cornerstone of Internet criminal activities. As a result, there has been broad interest in developing systems to prevent the end user from visiting such sites. In this paper, we describe an approach to this problem based on automated URL classification, using statistical me ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
Malicious Web sites are a cornerstone of Internet criminal activities. As a result, there has been broad interest in developing systems to prevent the end user from visiting such sites. In this paper, we describe an approach to this problem based on automated URL classification, using statistical methods to discover the tell-tale lexical and host-based properties of malicious Web site URLs. These methods are able to learn highly predictive models by extracting and automatically analyzing tens of thousands of features potentially indicative of suspicious URLs. The resulting classifiers obtain 95–99 % accuracy, detecting large numbers of malicious Web sites from their URLs, with only modest false positives.
Behind Phishing: An Examination of Phisher Modi Operandi
- In Proceedings of the USENIX Workshop on Large-scale Exploits and Emergent Threats
, 2008
"... Phishing costs Internet users billions of dollars a year. Using various data sets collected in real-time, this paper analyzes various aspects of phisher modi operandi. We examine the anatomy of phishing URLs and domains, registration of phishing domains and time to activation, and the machines used ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Phishing costs Internet users billions of dollars a year. Using various data sets collected in real-time, this paper analyzes various aspects of phisher modi operandi. We examine the anatomy of phishing URLs and domains, registration of phishing domains and time to activation, and the machines used to host the phishing sites. Our findings can be used as heuristics in filtering phishing-related e-mails and in identifying suspicious domain registrations. 1
An Empirical Analysis of Phishing Blacklists
"... In this paper, we study the effectiveness of phishing blacklists. We used 191 fresh phish that were less than 30 minutes old to conduct two tests on eight anti-phishing toolbars. We found that 63 % of the phishing campaigns in our dataset lasted less than two hours. Blacklists were ineffective when ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
In this paper, we study the effectiveness of phishing blacklists. We used 191 fresh phish that were less than 30 minutes old to conduct two tests on eight anti-phishing toolbars. We found that 63 % of the phishing campaigns in our dataset lasted less than two hours. Blacklists were ineffective when protecting users initially, as most of them caught less than 20 % of phish at hour zero. We also found that blacklists were updated at different speeds, and varied in coverage, as 47 %- 83 % of phish appeared on blacklists 12 hours from the initial test. We found that two tools using heuristics to complement blacklists caught significantly more phish initially than those using only blacklists. However, it took a long time for phish detected by heuristics to appear on blacklists. Finally, we tested the toolbars on a set of 13,458 legitimate URLs for false positives, and did not find any instance of mislabeling for either blacklists or heuristics. We present these findings and discuss ways in which anti-phishing tools can be improved. 1.
Improved phishing detection using model-based features
- In Fifth Conference on Email and Anti-Spam, CEAS
, 2008
"... Phishing emails are a real threat to internet communication and web economy. Criminals are trying to convince unsuspecting online users to reveal passwords, account numbers, social security numbers or other personal information. Filtering approaches using blacklists are not completely effective as a ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Phishing emails are a real threat to internet communication and web economy. Criminals are trying to convince unsuspecting online users to reveal passwords, account numbers, social security numbers or other personal information. Filtering approaches using blacklists are not completely effective as about every minute a new phishing scam is created. We investigate the statistical filtering of phishing emails, where a classifier is trained on characteristic features of existing emails and subsequently is able to identify new phishing emails with different contents. We propose advanced email features generated by adaptively trained Dynamic Markov Chains and by novel latent Class-Topic Models. On a publicly available test corpus classifiers using these features are able to reduce the number of misclassified emails by two thirds compared to previous work. Using a recently proposed more expressive evaluation method we show that these results are statistically significant. In addition we successfully tested our approach on a non-public email corpus with a real-life composition. 1
Design and Evaluation of a Real-Time URL Spam Filtering Service
"... On the heels of the widespread adoption of web services such as social networks and URL shorteners, scams, phishing, and malware have become regular threats. Despite extensive research, email-based spam filtering techniques generally fall short for protecting other web services. To better address th ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
On the heels of the widespread adoption of web services such as social networks and URL shorteners, scams, phishing, and malware have become regular threats. Despite extensive research, email-based spam filtering techniques generally fall short for protecting other web services. To better address this need, we present Monarch, a real-time system that crawls URLs as they are submitted to web services and determines whether the URLs direct to spam. We evaluate the viability of Monarch and the fundamental challenges that arise due to the diversity of web service spam. We show that Monarch can provide accurate, real-time protection, but that the underlying characteristics of spam do not generalize across web services. In particular, we find that spam targeting email qualitatively differs in significant ways from spam campaigns targeting Twitter. We explore the distinctions between email and Twitter spam, including the abuse of public web hosting and redirector services. Finally, we demonstrate Monarch’s scalability, showing our system could protect a service such as Twitter— which needs to process 15 million URLs/day—for a bit under $800/day.
A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval ABSTRACT
"... Phishing is a significant security threat to the Internet, which causes tremendous economic loss every year. In this paper, we proposed a novel hybrid phish detection method based on state-of-the-art information extraction (IE) and information retrieval (IR) techniques. The identity-based component ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Phishing is a significant security threat to the Internet, which causes tremendous economic loss every year. In this paper, we proposed a novel hybrid phish detection method based on state-of-the-art information extraction (IE) and information retrieval (IR) techniques. The identity-based component of our method detects phishing webpages by directly discovering the inconsistency between their identity and the identity they are imitating. The keywords-retrieval component utilizes IR algorithms exploiting the power of search engines to identify phish. Our method requires no training data, no prior knowledge of phishing signatures and specific implementations, and thus is able to adapt quickly to constantly appearing new phishing patterns. Comprehensive experiments over a diverse spectrum of data sources with 11449 pages show that both components have a low false positive rate and the stacked approach achieves a true positive rate of 90.06 % with a false positive rate of 1.95%.
Large-Scale Automatic Classification of Phishing Pages
"... Phishing websites, fraudulent sites that impersonate a trusted third party to gain access to private data, continue to cost Internet users over a billion dollars each year. In this paper, we describe the design and performance characteristics of a scalable machine learning classifier we developed to ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Phishing websites, fraudulent sites that impersonate a trusted third party to gain access to private data, continue to cost Internet users over a billion dollars each year. In this paper, we describe the design and performance characteristics of a scalable machine learning classifier we developed to detect phishing websites. We use this classifier to maintain Google’s phishing blacklist automatically. Our classifier analyzes millions of pages a day, examining the URL and the contents of a page to determine whether or not a page is phishing. Unlike previous work in this field, we train the classifier on a noisy dataset consisting of millionsofsamplesfrompreviously collectedliveclassification data. Despite the noise in the training data, our classifier learns a robust model for identifying phishing pages which correctly classifies more than 90 % of phishing pages several weeks after training concludes.
PhishNet: Predictive Blacklisting to Detect Phishing Attacks
"... Abstract—Phishing has been easy and effective way for trickery and deception on the Internet. While solutions such as URL blacklisting have been effective to some degree, their reliance on exact match with the blacklisted entries makes it easy for attackers to evade. We start with the observation th ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract—Phishing has been easy and effective way for trickery and deception on the Internet. While solutions such as URL blacklisting have been effective to some degree, their reliance on exact match with the blacklisted entries makes it easy for attackers to evade. We start with the observation that attackers often employ simple modifications (e.g., changing top level domain) to URLs. Our system, PhishNet, exploits this observation using two components. In the first component, we propose five heuristics to enumerate simple combinations of known phishing sites to discover new phishing URLs. The second component consists of an approximate matching algorithm that dissects a URL into multiple components that are matched individually against entries in the blacklist. In our evaluation with real-time blacklist feeds, we discovered around 18,000 new phishing URLs from a set of 6,000 new blacklist entries. We also show that our approximate matching algorithm leads to very few false positives (3%) and negatives (5%). I.
ABSTRACT iTrustPage: A User-Assisted Anti-Phishing Tool
"... Despite the many solutions proposed by industry and the research community to address phishing attacks, this problem continues to cause enormous damage. Because of our inability to deter phishing attacks, the research community needs to develop new approaches to anti-phishing solutions. Most of toda ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Despite the many solutions proposed by industry and the research community to address phishing attacks, this problem continues to cause enormous damage. Because of our inability to deter phishing attacks, the research community needs to develop new approaches to anti-phishing solutions. Most of today’s anti-phishing technologies focus on automatically detecting and preventing phishing attacks. While automation makes anti-phishing tools user-friendly, automation also makes them suffer from false positives, false negatives, and various practical hurdles. As a result, attackers often find simple ways to escape automatic detection. This paper presents iTrustPage – an anti-phishing tool that does not rely completely on automation to detect phishing. Instead, iTrustPage relies on user input and external repositories of information to prevent users from filling out phishing Web forms. With iTrustPage, users help to decide whether or not a Web page is legitimate. Because iTrustPage is user-assisted, iTrustPage avoids the false positives and the false negatives associated with automatic phishing detection. We implemented iTrustPage as a downloadable extension to FireFox. After being featured on the Mozilla website for FireFox extensions, iTrustPage was downloaded by more than 5,000 users in a two week period. We present an analysis of our tool’s effectiveness and ease of use based on our examination of usage logs collected from the 2,050 users who used iTrustPage for more than two weeks. Based on these logs, we find that iTrustPage disrupts users on fewer than 2 % of the pages they visit, and the number of disruptions decreases over time.
A Hierarchical Adaptive Probabilistic Approach for Zero Hour Phish Detection
"... Abstract. Phishing attacks are a significant threat to users of the Internet, causing tremendous economic loss every year. In combating phish, industry relies heavily on manual verification to achieve a low false positive rate, which, however, tends to be slow in responding to the huge volume of uni ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. Phishing attacks are a significant threat to users of the Internet, causing tremendous economic loss every year. In combating phish, industry relies heavily on manual verification to achieve a low false positive rate, which, however, tends to be slow in responding to the huge volume of unique phishing URLs created by toolkits. Our goal here is to combine the best aspects of human verified blacklists and heuristic-based methods, i.e., the low false positive rate of the former and the broad coverage of the latter. To that end, we present the design and evaluation of a hierarchical blacklist-enhanced phish detection framework. The key insight behind our detection algorithm is to leverage existing humanverified blacklists and apply the shingling technique, a popular nearduplicate detection algorithm used by search engines, to detect phish in a probabilistic fashion with very high accuracy. To achieve an extremely low false positive rate, we use a filtering module in our layered system, harnessing the power of search engines via information retrieval techniques to correct false positives. Comprehensive experiments over a diverse spectrum of data sources show that our method achieves 0 % false positive rate (FP) with a true positive rate (TP) of 67.74 % using searchoriented filtering, and 0.03 % FP and 73.53 % TP without the filtering module. With incremental model building capability via a sliding window mechanism, our approach is able to adapt quickly to new phishing variants, and is thus more responsive to the evolving attacks. 1

