• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Community trend outlier detection using soft temporal pattern mining. In: Machine Learning and Knowledge Discovery in Databases. (2012)

by M Gupta, J Gao, Y Sun, J Han
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 11
Next 10 →

Outlier Detection for Temporal Data: A Survey

by Manish Gupta, Jing Gao, Charu C. Aggarwal, Jiawei Han
"... Abstract—In the statistics community, outlier detection for time series data has been studied for decades. Recently, with advances in hardware and software technology, there has been a large body of work on temporal outlier detection from a computational perspective within the computer science commu ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Abstract—In the statistics community, outlier detection for time series data has been studied for decades. Recently, with advances in hardware and software technology, there has been a large body of work on temporal outlier detection from a computational perspective within the computer science community. In particular, advances in hardware technology have enabled the availability of various forms of temporal data collection mechanisms, and advances in software technology have enabled a variety of data management mechanisms. This has fueled the growth of different kinds of data sets such as data streams, spatiotemporal data, distributed streams, temporal networks, and time series data, generated by a multitude of applications. There arises a need for an organized and detailed study of the work done in the area of outlier detection with respect to such temporal datasets. In this survey, we provide a comprehensive and structured overview of a large set of interesting outlier definitions for various forms of temporal data, novel techniques, and application scenarios in which specific definitions and techniques have been widely used. Index Terms—temporal outlier detection, time series data, data streams, distributed data streams, temporal networks, spatiotemporal outliers 1
(Show Context)

Citation Context

... processing based techniques. In the past few decades, outlier detection has been studied for high-dimensional data [4], uncertain data [5], streaming data [6], [7], [8], network data [8], [9], [10], =-=[11]-=-, [12] and time series data [13], [14]. Outlier detection has been used extensively for intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor netwo...

On Detecting Association-Based Clique Outliers in Heterogeneous Information Networks

by Manish Gupta, Jing Gao, Xifeng Yan, Hasan Cam, Jiawei Han - In ASONAM, To appear , 2013
"... Abstract—In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. People like to discover groups (or cliques) of entities linked to each other with rare and surprising associations from such networks. We define such anomalous clique ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
Abstract—In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. People like to discover groups (or cliques) of entities linked to each other with rare and surprising associations from such networks. We define such anomalous cliques as Association-Based Clique Outliers (ABCOutliers) for heterogeneous information networks, and design effective approaches to detect them. The need to find such outlier cliques from networks can be formulated as a conjunctive select query consisting of a set of (type, predicate) pairs. Answering such conjunctive queries efficiently involves two main challenges: (1) computing all matching cliques which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the cliques. In this paper, we address these two challenges as follows. First, we introduce a new low-cost graph index to assist clique matching. Second, we define the outlierness of an association between two entities based on their attribute values and provide a methodology to efficiently compute such outliers given a conjunctive select query. Experimental results on several synthetic datasets and the Wikipedia dataset containing thousands of entities show the effectiveness of the proposed approach in computing interesting ABCOutliers. I.
(Show Context)

Citation Context

...detection on networks has been studied for both static [2], [9] and dynamic [1], [10] scenarios. Quite different from existing work which only considers outlier detection in homogeneous networks [4], =-=[5]-=-, [6], the proposed work aims at discovering outliers from heterogeneous networks. Moreover, existing outlier detection work for network data sets has focused on finding outliers for the entire networ...

Community Distribution Outlier Detection in Heterogeneous Information Networks

by Manish Gupta, Jing Gao, Jiawei Han
"... Abstract. Heterogeneous networks are ubiquitous. For example, bibliographic data, social data, medical records, movie data and many more can be modeled as heterogeneous networks. Rich information associated with multi-typed nodes in heterogeneous networks motivates us to propose a new definition of ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
Abstract. Heterogeneous networks are ubiquitous. For example, bibliographic data, social data, medical records, movie data and many more can be modeled as heterogeneous networks. Rich information associated with multi-typed nodes in heterogeneous networks motivates us to propose a new definition of outliers, which is different from those defined for homogeneous networks. In this paper, we propose the novel concept of Community Distribution Outliers (CDOutliers) for heterogeneous information networks, which are defined as objects whose community distribution does not follow any of the popular community distribution patterns. We extract such outliers using a type-aware joint analysis of multiple types of objects. Given community membership matrices for all types of objects, we follow an iterative two-stage approach which performs pattern discovery and outlier detection in a tightly integrated manner. We first propose a novel outlier-aware approach based on joint non-negative matrix factorization to discover popular community distribution patterns for all the object types in a holistic manner, and then detect outliers based on such patterns. Experimental results on both synthetic and real datasets show that the proposed approach is highly effective in discovering interesting community distribution outliers. 1
(Show Context)

Citation Context

...8] are defined based on comparison with all the other objects in the dataset. (3) Community Context: Different from existing community outlier detection approaches (Community Outliers [6], CTOutliers =-=[9]-=-, ECOutliers [10]), we model multiple data types in a heterogeneous network simultaneously to find outliers. Homogeneous versus Heterogeneous Networks Recently there has been work on outlier detection...

Local Learning for Mining Outlier Subgraphs from Network Datasets

by Manish Gupta, Arun Mallya, Subhro Roy, Jason H. D. Cho, Jiawei Han - In: SDM
"... In the real world, various systems can be modeled using entity-relationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous sub-graphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anom ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
In the real world, various systems can be modeled using entity-relationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous sub-graphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anomalous based on the connectivity structure within itself as well as with its neighborhood. For exam-ple for a co-authorship network, given a subgraph contain-ing three authors, one expects all three authors to be say data mining authors. Also, one expects the neighborhood to mostly consist of data mining authors. But a 3-author clique of data mining authors with all theory authors in the neighborhood clearly seems interesting. Similarly, having one of the authors in the clique as a theory author when all other authors (both in the clique and neighborhood) are data mining authors, is also suspicious. Thus, existence of low-probability links and absence of high-probability links can be a good indicator of subgraph outlierness. The probabil-ity of an edge can in turn be modeled based on the weighted similarity between the attribute values of the nodes linked by the edge. We claim that the attribute weights must be learned locally for accurate link existence probability computations. In this paper, we design a system that finds subgraph outliers given a graph and a query by modeling the problem as a lin-ear optimization. Experimental results on several synthetic and real datasets show the effectiveness of the proposed ap-proach in computing interesting outliers. 1
(Show Context)

Citation Context

...lier with connectivity very different from their neighborhood. Comparison with Previous Work Quite different from existing work which only considers outlier detection of single vertices from networks =-=[7, 9, 10]-=-, the proposed work aims at discovering subgraph outliers. Moreover, existing outlier detection work for network data sets has focused on finding outliers for the entire network or in the context of a...

Anomaly detection in dynamic networks: a survey

by Stephen Ranshous , Shitian Shen , Danai Koutra , Steve Harenberg , Christos Faloutsos , Nagiza F Samatova - Wiley Interdisciplinary Reviews: Computational Statistics , 2015
"... Anomaly detection is an important problem with multiple applications, and thus has been studied for decades in various research domains. In the past decade there has been a growing interest in anomaly detection in data represented as networks, or graphs, largely because of their robust expressivene ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Anomaly detection is an important problem with multiple applications, and thus has been studied for decades in various research domains. In the past decade there has been a growing interest in anomaly detection in data represented as networks, or graphs, largely because of their robust expressiveness and their natural ability to represent complex relationships. Originally, techniques focused on anomaly detection in static graphs, which do not change and are capable of representing only a single snapshot of data. As real-world networks are constantly changing, there has been a shift in focus to dynamic graphs, which evolve over time. In this survey, we aim to provide a comprehensive overview of anomaly detection in dynamic networks, concentrating on the state-of-the-art methods. We first describe four types of anomalies that arise in dynamic networks, providing an intuitive explanation, applications, and a concrete example for each. Having established an idea for what constitutes an anomaly, a general two-stage approach to anomaly detection in dynamic networks that is common among the methods is presented. We then construct a two-tiered taxonomy, first partitioning the methods based on the intuition behind their approach, and subsequently subdividing them based on the types of anomalies they detect. Within each of the tier one categories-community, compression, decomposition, distance, and probabilistic model based-we highlight the major similarities and differences, showing the wealth of techniques derived from similar conceptual approaches. © 2015 The Authors. financial systems connecting banks across the world, electric power grids connecting geographically distributed areas, and social networks that connect users, businesses, or customers using relationships such as friendship, collaboration, or transactional interactions. These are examples of dynamic networks, which, unlike static networks, are constantly undergoing changes to their structure or attributes. Possible changes include insertion and deletion of vertices (objects), insertion and deletion of edges (relationships), and modification of attributes (e.g., vertex or edge labels). WIREs Computational Statistics An important problem over dynamic networks is anomaly detection-finding objects, relationships, or

Declaration of Authorship

by Muhammad Muzammal , 2012
"... I hereby declare that content of this thesis is my own work and that it is the result of work done during the period of registration. To the best of my knowledge, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted fo ..."
Abstract - Add to MetaCart
I hereby declare that content of this thesis is my own work and that it is the result of work done during the period of registration. To the best of my knowledge, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted for the award of any other degree or diploma of the university or other institute of higher learning, except where due acknowledgement has been made in the text. Parts of this thesis appeared in the following publications, to each of which I have made substantial contributions:
(Show Context)

Citation Context

...ted by real life examples, thus establishing that SPM is probabilistic databases is an interesting topic of study. Indeed, our work has been followed up by Wan [74], Zhao et al. [46] and Gupta et al. =-=[75]-=-. (2) We discussed evaluating the interestingness predicate from a complexity-theoretic viewpoint, and we showed that different uncertainty models have different outcomes from a complexity theoretic v...

Outlier Detection for Graph Data — Proposal for a Tutorial at ASONAM’13 Conference —

by Manish Gupta, Jing Gao, Charu Aggarwal, Jiawei Han
"... Outlier detection has been studied in the context of many research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatio-temporal mining, etc. Outlier detection has been studied on a large variety of data types including high-dimensional data, uncert ..."
Abstract - Add to MetaCart
Outlier detection has been studied in the context of many research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatio-temporal mining, etc. Outlier detection has been studied on a large variety of data types including high-dimensional data, uncertain data, stream data, graph data, time series data, spatial data, and spatio-temporal data. We present an organized picture of recent research in outlier detection for graph data for both static as well as dynamic graphs. We begin by motivating the importance of graph outlier detection and briefing the challenges beyond usual outlier detec-tion. Static graph outlier detection techniques include Minimum Description Length techniques, techniques based on egonet metrics and random field models. For dynamic graphs, we discuss graph similarity based algorithms, evolutionary community based algorithms and online graph outlier detection algorithms. We also present applications where such techniques have been applied to discover interesting outliers. 2 Rationale of Presenting the Tutorial at ASONAM 2013 With the rapid increase of stored data, the interest in the discovery of hidden information has exploded in the last decade. One important problem that arises during the discovery process is treating data along with links together. Given such huge amounts of graph data, an important task is to find surprising in-

OUTLIER DETECTION FOR INFORMATION NETWORKS

by Manish Gupta , 2013
"... The study of networks has emerged in diverse disciplines as a means of analyzing complex relation-ship data. There has been a significant amount of work in network science which studies properties of networks, querying over networks, link analysis, influence propagation, network optimization, and ma ..."
Abstract - Add to MetaCart
The study of networks has emerged in diverse disciplines as a means of analyzing complex relation-ship data. There has been a significant amount of work in network science which studies properties of networks, querying over networks, link analysis, influence propagation, network optimization, and many other forms of network analysis. Only recently has there been some work in the area of outlier detection for information network data. Outlier (or anomaly) detection is a very broad field and has been studied in the context of a large number of application domains. Many algorithms have been proposed for outlier detection in high-dimensional data, uncertain data, stream data and time series data. By its inherent nature, network data provides very different challenges that need to be addressed in a special way. Network data is gigantic, contains nodes of different types, rich nodes with associated attribute data, noisy attribute data, noisy link data, and is dynamically evolving in multiple ways. This thesis focuses on outlier detection for such networks with respect to two interesting perspectives: (1) community based outliers and (2) query based outliers. For community based outliers, we discuss the problem in both static as well as dynamic settings.
(Show Context)

Citation Context

... much research in outlier detection for information networks, especially heterogeneous information networks. Our work as presented in this thesis focuses on outlier detection for information networks =-=[66, 69, 70, 71]-=-. 15 2.2 Outlier Detection for Information Networks We discuss the literature in the area of outlier detection on information networks in this section. Outlier detection for information networks can b...

Query-based Graph Cuboid Outlier Detection

by Ayushi Dalmia, Manish Gupta, Vasudeva Varma
"... Abstract—Various projections or views of a heterogeneous infor-mation network can be modeled using the graph OLAP (On-line Analytical Processing) framework for effective decision making. Detecting anomalous projections of the network can help the analysts identify regions of interest from the graph ..."
Abstract - Add to MetaCart
Abstract—Various projections or views of a heterogeneous infor-mation network can be modeled using the graph OLAP (On-line Analytical Processing) framework for effective decision making. Detecting anomalous projections of the network can help the analysts identify regions of interest from the graph specific to the projection attribute. While most previous studies on outlier detection in graphs deal with outlier nodes, edges or subgraphs, we are the first to propose detection of graph cuboid outliers. Further we perform this detection in a query sensitive way. Given a general subgraph query on a heterogeneous network, we study the problem of finding outlier cuboids from the graph OLAP lattice. A Graph Cuboid Outlier (GCOutlier) is a cuboid with exceptionally high density of matches for the query. The GCOutlier detection task is clearly challenging because: (1) finding matches for the query (subgraph isomorphism) is NP-hard; (2) number of matches for the query can be very high; and (3) number of cuboids can be large. We provide an approximate solution to the problem by computing only a fraction of the total matches originating from a select set of candidate nodes and including a select set of edges, chosen smartly. We perform extensive experiments on synthetic datasets to showcase the execution time versus accuracy trade-off. Experiments on real datasets like Four Area and Delicious containing thousands of nodes reveal interesting GCOutliers.
(Show Context)

Citation Context

...ngth [16], egonets [2], and community detection [17]. In case of temporal networks, techniques explored include similarity between graph snapshots [18], spectral methods [19], community outliers [3], =-=[20]-=-, [21], etc. As discussed in Section I, recently query based outlier detection have become quite popular where outliers are discovered from a graph in the context of a subgraph query. However, the pro...

Outlier Detection for Temporal Data — Proposal for a Tutorial at SDM’13 Conference —

by Manish Gupta, Jing Gao, Charu Aggarwal, Jiawei Han
"... Outlier (or anomaly) detection is a very broad field which has been studied in the context of a large number of research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatio-temporal mining, etc. The first few articles in outlier detection focused o ..."
Abstract - Add to MetaCart
Outlier (or anomaly) detection is a very broad field which has been studied in the context of a large number of research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatio-temporal mining, etc. The first few articles in outlier detection focused on time series based outliers (in statistics). Since then, outlier detection has been studied on a large variety of data types including high-dimensional data, uncertain data, stream data, network data, time series data, spatial data, and spatio-temporal data. While there have been many tutorials and surveys for general outlier detection, we focus on outlier detection for temporal data in this tutorial. A large number of applications generate temporal datasets. For example, in our everyday life, various kinds of records like credit, personnel, financial, judicial, medical, etc. are all temporal. This stresses the need for an organized and detailed study of outliers with respect to such temporal data. In the past decade, there has been a lot of research on various forms of temporal data including consecutive data snapshots, series of data snapshots and data streams. Besides the initial work on time series, researchers have focused on rich forms of data including multiple data streams, spatio-temporal data, network data, community distribution data, etc. Compared to general outlier detection, techniques for temporal outlier
(Show Context)

Citation Context

...ased Approaches (a) Outlier detection using dynamic cluster maintenance [10, 27] (b) Community outlier detection i. Evolutionary community outlier detection [37] ii. Community trend outlier detection =-=[36]-=- 5. Network based Approaches (a) Centralized approaches i. Outlier detection by comparing graphs [67, 74] ii. Outlier detection using PCA on origin-destination flows in networks [51] iii. Graph outlie...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University