Results 1  10
of
11
Outlier Detection for Temporal Data: A Survey
"... Abstract—In the statistics community, outlier detection for time series data has been studied for decades. Recently, with advances in hardware and software technology, there has been a large body of work on temporal outlier detection from a computational perspective within the computer science commu ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In the statistics community, outlier detection for time series data has been studied for decades. Recently, with advances in hardware and software technology, there has been a large body of work on temporal outlier detection from a computational perspective within the computer science community. In particular, advances in hardware technology have enabled the availability of various forms of temporal data collection mechanisms, and advances in software technology have enabled a variety of data management mechanisms. This has fueled the growth of different kinds of data sets such as data streams, spatiotemporal data, distributed streams, temporal networks, and time series data, generated by a multitude of applications. There arises a need for an organized and detailed study of the work done in the area of outlier detection with respect to such temporal datasets. In this survey, we provide a comprehensive and structured overview of a large set of interesting outlier definitions for various forms of temporal data, novel techniques, and application scenarios in which specific definitions and techniques have been widely used. Index Terms—temporal outlier detection, time series data, data streams, distributed data streams, temporal networks, spatiotemporal outliers 1
On Detecting AssociationBased Clique Outliers in Heterogeneous Information Networks
 In ASONAM, To appear
, 2013
"... Abstract—In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. People like to discover groups (or cliques) of entities linked to each other with rare and surprising associations from such networks. We define such anomalous clique ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Abstract—In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. People like to discover groups (or cliques) of entities linked to each other with rare and surprising associations from such networks. We define such anomalous cliques as AssociationBased Clique Outliers (ABCOutliers) for heterogeneous information networks, and design effective approaches to detect them. The need to find such outlier cliques from networks can be formulated as a conjunctive select query consisting of a set of (type, predicate) pairs. Answering such conjunctive queries efficiently involves two main challenges: (1) computing all matching cliques which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the cliques. In this paper, we address these two challenges as follows. First, we introduce a new lowcost graph index to assist clique matching. Second, we define the outlierness of an association between two entities based on their attribute values and provide a methodology to efficiently compute such outliers given a conjunctive select query. Experimental results on several synthetic datasets and the Wikipedia dataset containing thousands of entities show the effectiveness of the proposed approach in computing interesting ABCOutliers. I.
Community Distribution Outlier Detection in Heterogeneous Information Networks
"... Abstract. Heterogeneous networks are ubiquitous. For example, bibliographic data, social data, medical records, movie data and many more can be modeled as heterogeneous networks. Rich information associated with multityped nodes in heterogeneous networks motivates us to propose a new definition of ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Heterogeneous networks are ubiquitous. For example, bibliographic data, social data, medical records, movie data and many more can be modeled as heterogeneous networks. Rich information associated with multityped nodes in heterogeneous networks motivates us to propose a new definition of outliers, which is different from those defined for homogeneous networks. In this paper, we propose the novel concept of Community Distribution Outliers (CDOutliers) for heterogeneous information networks, which are defined as objects whose community distribution does not follow any of the popular community distribution patterns. We extract such outliers using a typeaware joint analysis of multiple types of objects. Given community membership matrices for all types of objects, we follow an iterative twostage approach which performs pattern discovery and outlier detection in a tightly integrated manner. We first propose a novel outlieraware approach based on joint nonnegative matrix factorization to discover popular community distribution patterns for all the object types in a holistic manner, and then detect outliers based on such patterns. Experimental results on both synthetic and real datasets show that the proposed approach is highly effective in discovering interesting community distribution outliers. 1
Local Learning for Mining Outlier Subgraphs from Network Datasets
 In: SDM
"... In the real world, various systems can be modeled using entityrelationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous subgraphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anom ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
In the real world, various systems can be modeled using entityrelationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous subgraphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anomalous based on the connectivity structure within itself as well as with its neighborhood. For example for a coauthorship network, given a subgraph containing three authors, one expects all three authors to be say data mining authors. Also, one expects the neighborhood to mostly consist of data mining authors. But a 3author clique of data mining authors with all theory authors in the neighborhood clearly seems interesting. Similarly, having one of the authors in the clique as a theory author when all other authors (both in the clique and neighborhood) are data mining authors, is also suspicious. Thus, existence of lowprobability links and absence of highprobability links can be a good indicator of subgraph outlierness. The probability of an edge can in turn be modeled based on the weighted similarity between the attribute values of the nodes linked by the edge. We claim that the attribute weights must be learned locally for accurate link existence probability computations. In this paper, we design a system that finds subgraph outliers given a graph and a query by modeling the problem as a linear optimization. Experimental results on several synthetic and real datasets show the effectiveness of the proposed approach in computing interesting outliers. 1
Anomaly detection in dynamic networks: a survey
 Wiley Interdisciplinary Reviews: Computational Statistics
, 2015
"... Anomaly detection is an important problem with multiple applications, and thus has been studied for decades in various research domains. In the past decade there has been a growing interest in anomaly detection in data represented as networks, or graphs, largely because of their robust expressivene ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Anomaly detection is an important problem with multiple applications, and thus has been studied for decades in various research domains. In the past decade there has been a growing interest in anomaly detection in data represented as networks, or graphs, largely because of their robust expressiveness and their natural ability to represent complex relationships. Originally, techniques focused on anomaly detection in static graphs, which do not change and are capable of representing only a single snapshot of data. As realworld networks are constantly changing, there has been a shift in focus to dynamic graphs, which evolve over time. In this survey, we aim to provide a comprehensive overview of anomaly detection in dynamic networks, concentrating on the stateoftheart methods. We first describe four types of anomalies that arise in dynamic networks, providing an intuitive explanation, applications, and a concrete example for each. Having established an idea for what constitutes an anomaly, a general twostage approach to anomaly detection in dynamic networks that is common among the methods is presented. We then construct a twotiered taxonomy, first partitioning the methods based on the intuition behind their approach, and subsequently subdividing them based on the types of anomalies they detect. Within each of the tier one categoriescommunity, compression, decomposition, distance, and probabilistic model basedwe highlight the major similarities and differences, showing the wealth of techniques derived from similar conceptual approaches. © 2015 The Authors. financial systems connecting banks across the world, electric power grids connecting geographically distributed areas, and social networks that connect users, businesses, or customers using relationships such as friendship, collaboration, or transactional interactions. These are examples of dynamic networks, which, unlike static networks, are constantly undergoing changes to their structure or attributes. Possible changes include insertion and deletion of vertices (objects), insertion and deletion of edges (relationships), and modification of attributes (e.g., vertex or edge labels). WIREs Computational Statistics An important problem over dynamic networks is anomaly detectionfinding objects, relationships, or
Declaration of Authorship
, 2012
"... I hereby declare that content of this thesis is my own work and that it is the result of work done during the period of registration. To the best of my knowledge, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted fo ..."
Abstract
 Add to MetaCart
(Show Context)
I hereby declare that content of this thesis is my own work and that it is the result of work done during the period of registration. To the best of my knowledge, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted for the award of any other degree or diploma of the university or other institute of higher learning, except where due acknowledgement has been made in the text. Parts of this thesis appeared in the following publications, to each of which I have made substantial contributions:
Outlier Detection for Graph Data — Proposal for a Tutorial at ASONAM’13 Conference —
"... Outlier detection has been studied in the context of many research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatiotemporal mining, etc. Outlier detection has been studied on a large variety of data types including highdimensional data, uncert ..."
Abstract
 Add to MetaCart
Outlier detection has been studied in the context of many research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatiotemporal mining, etc. Outlier detection has been studied on a large variety of data types including highdimensional data, uncertain data, stream data, graph data, time series data, spatial data, and spatiotemporal data. We present an organized picture of recent research in outlier detection for graph data for both static as well as dynamic graphs. We begin by motivating the importance of graph outlier detection and briefing the challenges beyond usual outlier detection. Static graph outlier detection techniques include Minimum Description Length techniques, techniques based on egonet metrics and random field models. For dynamic graphs, we discuss graph similarity based algorithms, evolutionary community based algorithms and online graph outlier detection algorithms. We also present applications where such techniques have been applied to discover interesting outliers. 2 Rationale of Presenting the Tutorial at ASONAM 2013 With the rapid increase of stored data, the interest in the discovery of hidden information has exploded in the last decade. One important problem that arises during the discovery process is treating data along with links together. Given such huge amounts of graph data, an important task is to find surprising in
OUTLIER DETECTION FOR INFORMATION NETWORKS
, 2013
"... The study of networks has emerged in diverse disciplines as a means of analyzing complex relationship data. There has been a significant amount of work in network science which studies properties of networks, querying over networks, link analysis, influence propagation, network optimization, and ma ..."
Abstract
 Add to MetaCart
(Show Context)
The study of networks has emerged in diverse disciplines as a means of analyzing complex relationship data. There has been a significant amount of work in network science which studies properties of networks, querying over networks, link analysis, influence propagation, network optimization, and many other forms of network analysis. Only recently has there been some work in the area of outlier detection for information network data. Outlier (or anomaly) detection is a very broad field and has been studied in the context of a large number of application domains. Many algorithms have been proposed for outlier detection in highdimensional data, uncertain data, stream data and time series data. By its inherent nature, network data provides very different challenges that need to be addressed in a special way. Network data is gigantic, contains nodes of different types, rich nodes with associated attribute data, noisy attribute data, noisy link data, and is dynamically evolving in multiple ways. This thesis focuses on outlier detection for such networks with respect to two interesting perspectives: (1) community based outliers and (2) query based outliers. For community based outliers, we discuss the problem in both static as well as dynamic settings.
Querybased Graph Cuboid Outlier Detection
"... Abstract—Various projections or views of a heterogeneous information network can be modeled using the graph OLAP (Online Analytical Processing) framework for effective decision making. Detecting anomalous projections of the network can help the analysts identify regions of interest from the graph ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Various projections or views of a heterogeneous information network can be modeled using the graph OLAP (Online Analytical Processing) framework for effective decision making. Detecting anomalous projections of the network can help the analysts identify regions of interest from the graph specific to the projection attribute. While most previous studies on outlier detection in graphs deal with outlier nodes, edges or subgraphs, we are the first to propose detection of graph cuboid outliers. Further we perform this detection in a query sensitive way. Given a general subgraph query on a heterogeneous network, we study the problem of finding outlier cuboids from the graph OLAP lattice. A Graph Cuboid Outlier (GCOutlier) is a cuboid with exceptionally high density of matches for the query. The GCOutlier detection task is clearly challenging because: (1) finding matches for the query (subgraph isomorphism) is NPhard; (2) number of matches for the query can be very high; and (3) number of cuboids can be large. We provide an approximate solution to the problem by computing only a fraction of the total matches originating from a select set of candidate nodes and including a select set of edges, chosen smartly. We perform extensive experiments on synthetic datasets to showcase the execution time versus accuracy tradeoff. Experiments on real datasets like Four Area and Delicious containing thousands of nodes reveal interesting GCOutliers.
Outlier Detection for Temporal Data — Proposal for a Tutorial at SDM’13 Conference —
"... Outlier (or anomaly) detection is a very broad field which has been studied in the context of a large number of research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatiotemporal mining, etc. The first few articles in outlier detection focused o ..."
Abstract
 Add to MetaCart
(Show Context)
Outlier (or anomaly) detection is a very broad field which has been studied in the context of a large number of research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatiotemporal mining, etc. The first few articles in outlier detection focused on time series based outliers (in statistics). Since then, outlier detection has been studied on a large variety of data types including highdimensional data, uncertain data, stream data, network data, time series data, spatial data, and spatiotemporal data. While there have been many tutorials and surveys for general outlier detection, we focus on outlier detection for temporal data in this tutorial. A large number of applications generate temporal datasets. For example, in our everyday life, various kinds of records like credit, personnel, financial, judicial, medical, etc. are all temporal. This stresses the need for an organized and detailed study of outliers with respect to such temporal data. In the past decade, there has been a lot of research on various forms of temporal data including consecutive data snapshots, series of data snapshots and data streams. Besides the initial work on time series, researchers have focused on rich forms of data including multiple data streams, spatiotemporal data, network data, community distribution data, etc. Compared to general outlier detection, techniques for temporal outlier