Results 1 - 10
of
15
Trafficking Fraudulent Accounts: The Role of the Underground Market in Twitter Spam and Abuse
"... As web services such as Twitter, Facebook, Google, and Yahoo now dominate the daily activities of Internet users, cyber criminals have adapted their monetization strategies to engage users within these walled gardens. To facilitate access to these sites, an underground market has emerged where fraud ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
As web services such as Twitter, Facebook, Google, and Yahoo now dominate the daily activities of Internet users, cyber criminals have adapted their monetization strategies to engage users within these walled gardens. To facilitate access to these sites, an underground market has emerged where fraudulent accounts – automatically generated credentials used to perpetrate scams, phishing, and malware – are sold in bulk by the thousands. In order to understand this shadowy economy, we investigate the market for fraudulent Twitter accounts to monitor prices, availability, and fraud perpetrated by 27 merchants over the course of a 10-month period. We use our insights to develop a classifier to retroactively detect several million fraudulent accounts sold via this marketplace, 95% of which we disable with Twitter’s help. During active months, the 27 merchants we monitor appeared responsible for registering 10–20 % of all accounts later flagged for spam by Twitter, generating $127–459K for their efforts. 1
Funding
, 2014
"... Research Interests My primary research interests are in the areas of cyber security and cyber-physical security. I am especially focused on empirical measurement based and data-driven security that enables the systematic identification of potential intervention points and evaluation of security defe ..."
Abstract
- Add to MetaCart
Research Interests My primary research interests are in the areas of cyber security and cyber-physical security. I am especially focused on empirical measurement based and data-driven security that enables the systematic identification of potential intervention points and evaluation of security defenses.
On the Precision of Social and Information Networks
, 2013
"... The diffusion of information on online social and information networks has been a popular topic of study in recent years, but attention has typically focused on speed of dissemination and recall (i.e. the fraction of users getting a piece of information). In this paper, we study the complementary no ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The diffusion of information on online social and information networks has been a popular topic of study in recent years, but attention has typically focused on speed of dissemination and recall (i.e. the fraction of users getting a piece of information). In this paper, we study the complementary notion of the precision of information diffusion. Our model of information dissemination is “broadcast-based”, i.e., one where every message (original or forwarded) from a user goes to a fixed set of recipients, often called the user’s “friends ” or “followers”, as in Facebook and Twitter. The precision of the diffusion process is then defined as the fraction of received messages that a user finds interesting. On first glance, it seems that broadcast-based information diffusion is a “blunt ” targeting mechanism, and must necessarily suffer from low precision. Somewhat surprisingly, we present preliminary experimental and analytical evidence to the contrary: it is possible to simultaneously have high precision (i.e. is bounded below by a constant), high recall, and low diameter! We start by presenting a set of conditions on the structure of user interests, and analytically show the necessity of each of these conditions for obtaining high precision. We also present preliminary experimental evidence from Twitter verifying that these conditions are satisfied. We then prove that the Kronecker-graph based generative model of Leskovec et al. satisfies these conditions given an appropriate and natural definition of user interests. Further, we show that this model also has high precision, high recall, and low diameter. We finally present preliminary experimental evidence showing Twitter has high precision, validating our conclusion. This is perhaps a first step towards a formal understanding of the immense popularity of online social networks as an information dissemination mechanism.
Personalized spam filtering for gray mail
- In Proceedings of the Fifth Conference on Email and Anti-Spam (CEAS
, 2008
"... Gray mail, messages that could reasonably be considered either spam or good by different email users, is a commonly observed issue in production spam filtering systems. In this paper we study this class of mail using a large real-world email corpus and signaturebased campaign detection techniques. O ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Gray mail, messages that could reasonably be considered either spam or good by different email users, is a commonly observed issue in production spam filtering systems. In this paper we study this class of mail using a large real-world email corpus and signaturebased campaign detection techniques. Our analysis shows that even an optimal filter will inevitably perform unsatisfactorily on gray mail, unless user preferences are taken into account. To overcome this difficulty we design a light-weight user model that is highly scalable and can be easily combined with a traditional global spam filter. Our approach is able to incorporate both partial and complete user feedback on message labels and catches up to 40 % more spam from gray mail in the low false-positive region. 1
In Association with Thirteenth Pacific-Asia Conference
"... Recent years brought increased interest in applying data mining techniques to difficult "real-world" problems, many of which are characterized by imbalanced learning data, where at least one class is under-represented relative to others. Examples include (but are not limited to): fraud/int ..."
Abstract
- Add to MetaCart
Recent years brought increased interest in applying data mining techniques to difficult "real-world" problems, many of which are characterized by imbalanced learning data, where at least one class is under-represented relative to others. Examples include (but are not limited to): fraud/intrusion detection, risk management, medical diagnosis/monitoring, bioinformatics, text categorization and personalization of information. The problem of imbalanced data is also often associated with asymmetric costs of misclassifying elements of different classes. Additionally the distribution of the test data may differ from that of the learning sample and the true misclassification costs may be unknown at learning time. The AAAI’2000 and ICML’2003 Workshops on "Learning from Imbalanced Data Sets " provided venues where this important problem was explicitly addressed and has been received with much interest. Although much awareness of the issues related to class imbalance has been raised, many of the key problems still remain open and are in fact encountered more often, especially when applied to massive datasets. We believe that it would be of value to the data mining community to not only examine the progress achieved in this area over the last five years but also discuss the current school of thought on research in learning from imbalanced datasets and cost-sensitive learning. The Workshop on “Data Mining When Classes are Imbalanced and Errors Have Costs ” (ICEC’2009)
Results 1 - 10
of
15