Results 1  10
of
12
Diagnosing NetworkWide Traffic Anomalies
 In ACM SIGCOMM
, 2004
"... Anomalies are unusual and significant changes in a network's traffic levels, which can often span multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of ..."
Abstract

Cited by 360 (19 self)
 Add to MetaCart
Anomalies are unusual and significant changes in a network's traffic levels, which can often span multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of highdimensional, noisy data.
Communicationefficient online detection of networkwide anomalies
 In IEEE Conference on Computer Communications (INFOCOM
, 2007
"... Abstract—There has been growing interest in building largescale distributed monitoring systems for sensor, enterprise, and ISP networks. Recent work has proposed using Principal Component Analysis (PCA) over global traffic matrix statistics to effectively isolate networkwide anomalies. To allow suc ..."
Abstract

Cited by 51 (10 self)
 Add to MetaCart
(Show Context)
Abstract—There has been growing interest in building largescale distributed monitoring systems for sensor, enterprise, and ISP networks. Recent work has proposed using Principal Component Analysis (PCA) over global traffic matrix statistics to effectively isolate networkwide anomalies. To allow such a PCAbased anomaly detection scheme to scale, we propose a novel approximation scheme that dramatically reduces the burden on the production network. Our scheme avoids the expensive step of centralizing all the data by performing intelligent filtering at the distributed monitors. This filtering reduces monitoring bandwidth overheads, but can result in the anomaly detector making incorrect decisions based on a perturbed view of the global data set. We employ stochastic matrix perturbation theory to bound such errors. Our algorithm selects the filtering parameters at local monitors such that the errors made by the detector are guaranteed to lie below a userspecified upper bound. Our algorithm thus allows network operators to explicitly balance the tradeoff between detection accuracy and the amount of data communicated over the network. In addition, our approach enables realtime detection because we exploit continuous monitoring at the distributed monitors. Experiments with traffic data from Abilene backbone network demonstrate that our methods yield significant communication benefits while simultaneously achieving high detection accuracy. I.
Innetwork PCA and anomaly detection
 In NIPS
, 2006
"... We consider the problem of network anomaly detection in large distributed systems. In this setting, Principal Component Analysis (PCA) has been proposed as a method for discovering anomalies by continuously tracking the projection of the data onto a residual subspace. This method was shown to work w ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
(Show Context)
We consider the problem of network anomaly detection in large distributed systems. In this setting, Principal Component Analysis (PCA) has been proposed as a method for discovering anomalies by continuously tracking the projection of the data onto a residual subspace. This method was shown to work well empirically in highly aggregated networks, that is, those with a limited number of large nodes and at coarse time scales. This approach, however, has scalability limitations. To overcome these limitations, we develop a PCAbased anomaly detector in which adaptive local data filters send to a coordinator just enough data to enable accurate global detection. Our method is based on a stochastic matrix perturbation analysis that characterizes the tradeoff between the accuracy of anomaly detection and the amount of data communicated over the network. 1
Applying pca for traffic anomaly detection: Problems and solutions
, 2008
"... Abstract—Spatial Principal Component Analysis (PCA) has been proposed for networkwide anomaly detection. A recent work has shown that PCA is very sensitive to calibration settings, unfortunately, the authors did not provide further explanations for this observation. In this paper, we fill this gap ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Spatial Principal Component Analysis (PCA) has been proposed for networkwide anomaly detection. A recent work has shown that PCA is very sensitive to calibration settings, unfortunately, the authors did not provide further explanations for this observation. In this paper, we fill this gap and provide the reasoning behind the found discrepancies. First, we revisit PCA for anomaly detection and evaluate its performance on our data. We develop a slightly modified version of PCA that uses only data from a single router. Instead of correlating data across different spatial measurement points, we correlate the data across different metrics. With the help of the analyzed data, we explain the pitfalls of PCA and underline our argumentation with measurement results. We show that the main problems that make PCA difficult to apply are (i) the temporal correlation in the data; (ii) the nonstationarity of the data; and (iii) the difficulty about choosing the right number of components. Moreover, we propose a solution to deal with the most dominant problem, the temporal correlation in data. We find that when we consider temporal correlation, PCA detection results are significantly improved. I.
Adapting Multivariate Analysis for Monitoring and Modeling of Dynamic Systems
, 1991
"... This work considers the application of several related multivariate data analysis techniques to the monitoring and modeling of dynamic processes. Included are the method of Principal Components Analysis (PCA), and the regression technique Continuum Regression (CR), which encompasses Principal Comp ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
This work considers the application of several related multivariate data analysis techniques to the monitoring and modeling of dynamic processes. Included are the method of Principal Components Analysis (PCA), and the regression technique Continuum Regression (CR), which encompasses Principal Components Regression (PCR), Partial Least Squares (PLS) and Multiple Linear Regression (MLR), all of which are based on eigenvector decompositions. It is shown that proper application of PCA to the measurements from multivariate processes can facilitate the detection of failed sensors and process upsets. The relationship between PCA and the statespace process model form is shown, providing a theoretical basis for the use of PCA in dynamic systems. For processes with more measurements than sta...
Latent Fault Detection With Unbalanced Workloads
"... Big data means big datacenters, comprised of hundreds or thousands of machines. With so many machines, failures are commonplace. Failure detection is crucial: undetected failures may lead to data loss and outages. Recent health monitoring approaches use anomaly detection to forecast failures – anom ..."
Abstract
 Add to MetaCart
(Show Context)
Big data means big datacenters, comprised of hundreds or thousands of machines. With so many machines, failures are commonplace. Failure detection is crucial: undetected failures may lead to data loss and outages. Recent health monitoring approaches use anomaly detection to forecast failures – anomalous machines are considered to be at risk of future failures. Our previous work focused on detecting latent faults in large web services, which are often characterized by scaleout architecture where load is dynamically balanced. We proposed a robust and unsupervised latent fault detector for such systems, with statistical bounds on the rate of false positives. That detector, however, is unsuitable for applications without dynamic load balancing, such as staticallybalanced keyvalue stores, Hadoop jobs, and supercomputer applications. We describe an improved latent fault detection method for unbalanced workloads. It retains the advantages of our previous methods: it is unsupervised, robust to changes, and statistically sound. Moreover, the statistical bounds for the new method scale better with the number of machines, and so dramatically reduce the number of measurements needed. Preliminary evaluation on supercomputer logs shows that the new method is able to correctly predict some failures, while our previous methods completely fail in this setting.
Printed in Great Britain The Wilson–Hilferty transformation is locally saddlepoint
"... Interest in transform methods for normalising test statistics declined with the advent of computers. More recently, smallsample asymptotic methods have been developed to approximate the distributions of complicated test statistics. We propose a generalisation of a classical symmetrising transform t ..."
Abstract
 Add to MetaCart
(Show Context)
Interest in transform methods for normalising test statistics declined with the advent of computers. More recently, smallsample asymptotic methods have been developed to approximate the distributions of complicated test statistics. We propose a generalisation of a classical symmetrising transform to a similar range of statistics. It is shown to behave comparably in a parametric neighbourhood to methods that exploit exponential tilting. Some key words: Exponential tilting; Normalising transform; Smallsample asymptotics. 1.
Outlier Detection Methods in Multivariate Regression Models
"... Abstract. Outlier detection statistics based on two models, the casedeletion model and the meanshift model, are developed in the context of a multivariate linear regression model. These are generalizations of the univariate Cook’s distance and other diagnostic statistics. Approximate distribution ..."
Abstract
 Add to MetaCart
Abstract. Outlier detection statistics based on two models, the casedeletion model and the meanshift model, are developed in the context of a multivariate linear regression model. These are generalizations of the univariate Cook’s distance and other diagnostic statistics. Approximate distributions of the proposed statistics are also obtained to get suitable cutoff points for significance tests. In addition, a simulation study has been conducted to examine the performance of these two approximate distributions. The methods are applied to a set of data to illustrate the multiple outlier detection procedure in multivariate linear regression models. key words: likelihood displacement; likelihood ratio; multivariate regression; outlier detection. 2 1
Minos Garofalakis
"... We consider the problem of network anomaly detection in large distributed systems. In this setting, Principal Component Analysis (PCA) has been proposed as a method for discovering anomalies by continuously tracking the projection of the data onto a residual subspace. While it is successful empirica ..."
Abstract
 Add to MetaCart
(Show Context)
We consider the problem of network anomaly detection in large distributed systems. In this setting, Principal Component Analysis (PCA) has been proposed as a method for discovering anomalies by continuously tracking the projection of the data onto a residual subspace. While it is successful empirically, this approach has serious scalability limitations. To overcome these limitations, we develop a PCAbased anomaly detector in which adaptive local data filters send to a coordinator just enough data to enable accurate global detection. Our method is based on a stochastic matrix perturbation analysis that characterizes the tradeoff between the accuracy of anomaly detection and the amount of data communicated over the network. 1