Results 1 -
3 of
3
Graph-Based User Behavior Modeling: From Prediction to Fraud Detection Perspective and Target Audience
"... Abstract How can we model users' preferences? How do anomalies, fraud, and spam effect our models of normal users? How can we modify our models to catch fraudsters? In this tutorial we will answer these questions -connecting graph analysis tools for user behavior modeling to anomaly and fraud ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract How can we model users' preferences? How do anomalies, fraud, and spam effect our models of normal users? How can we modify our models to catch fraudsters? In this tutorial we will answer these questions -connecting graph analysis tools for user behavior modeling to anomaly and fraud detection. In particular, we will focus on the application of subgraph analysis, label propagation, and latent factor models to static, evolving, and attributed graphs. For each of these techniques we will give a brief explanation of the algorithms and the intuition behind them. We will then give examples of recent research using the techniques to model, understand and predict normal behavior. With this intuition for how these methods are applied to graphs and user behavior, we will focus on state-of-the-art research showing how the outcomes of these methods are effected by fraud, and how they have been used to catch fraudsters. Perspective and Target Audience Perspective: In this tutorial we focus on understanding anomaly and fraud detection through the lens of normal user behavior modeling. The data mining and machine learning communities have developed a plethora of models and methods for understanding user behavior. However, these methods generally assume that the behavior is that of real, honest people. On the other hand, fraud detection systems frequently use similar techniques as those used in modeling "normal" behavior, but are often framed as an independent problem. However, by focusing on the relations and intersections of the two perspectives we can gain a more complete understanding of the methods and hopefully inspire new research joining these two communities. Target Audience: This tutorial is aimed at anyone interested in modeling and understanding user behavior, from data mining and machine learning researchers to practitioners from industry and government. For those new to the field, the tutorial will cover the necessary background material to understand these systems and will offer a concise, intuitive overview of the state-of-the-art. Additionally, the tutorial aims to offer a new perspective that will be valuable and interesting even for researchers with more experience in these domains. For those having worked in classic user behavior modeling, we will demonstrate how fraud can effect commonly-used models that expect normal behavior, with the hope that future models will directly account for fraud. For those having worked in fraud detection systems, we hope to inspire new research directions through connecting with recent developments in modeling "normal" behavior.
A General Suspiciousness Metric for Dense Blocks in Multimodal Data
"... Abstract—Which seems more suspicious: 5,000 tweets from 200 users on 5 IP addresses, or 10,000 tweets from 500 users on 500 IP addresses but all with the same trending topic and all in 10 minutes? The literature has many methods that try to find dense blocks in matrices, and, recently, tensors, but ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Which seems more suspicious: 5,000 tweets from 200 users on 5 IP addresses, or 10,000 tweets from 500 users on 500 IP addresses but all with the same trending topic and all in 10 minutes? The literature has many methods that try to find dense blocks in matrices, and, recently, tensors, but no method gives a principled way to score the suspiciouness of dense blocks with different numbers of modes and rank them to draw human attention accordingly. Dense blocks are worth inspecting, typically indicating fraud, emerging trends, or some other noteworthy deviation from the usual. Our main contribution is that we show how to unify these methods and how to give a principled answer to questions like the above. Specifically, (a) we give a list of axioms that any metric of suspicousness should satisfy; (b) we propose an intuitive, principled metric that satisfies the axioms, and is fast to compute; (c) we propose CROSSSPOT, an algorithm to spot dense regions, and sort them in importance (“suspiciousness”) order. Finally, we apply CROSSSPOT to real data, where it improves the F1 score over previous techniques by 68 % and finds retweet-boosting in a real social dataset spanning 0.3 billion posts. I.
General
"... We present an analysis of taxi flows in Manhattan (NYC) using a variety of data mining approaches. The methods presented here can aid in development of representative and accurate models of large-scale traffic flows with applications to many areas, including outlier detection and characteriza-tion. ..."
Abstract
- Add to MetaCart
(Show Context)
We present an analysis of taxi flows in Manhattan (NYC) using a variety of data mining approaches. The methods presented here can aid in development of representative and accurate models of large-scale traffic flows with applications to many areas, including outlier detection and characteriza-tion.