Results 1 -
7 of
7
Diagnosing Network-Wide Traffic Anomalies
- In ACM SIGCOMM
, 2004
"... Anomalies are unusual and significant changes in a network's traffic levels, which can often span multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of high- ..."
Abstract
-
Cited by 184 (12 self)
- Add to MetaCart
Anomalies are unusual and significant changes in a network's traffic levels, which can often span multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of high-dimensional, noisy data.
Proactive Network Fault Detection
- IEEE Transactions on Reliability
, 1997
"... To improve network reliability and management in today's high-speed communication networks, we propose an intelligent system using adaptive statistical approaches. The system learns the normal behavior of the network. Deviations from the norm are detected and the information is combined in the proba ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
To improve network reliability and management in today's high-speed communication networks, we propose an intelligent system using adaptive statistical approaches. The system learns the normal behavior of the network. Deviations from the norm are detected and the information is combined in the probabilistic framework of a Bayesian network. The proposed system is thereby able to detect unknown or unseen faults. As demonstrated on real network data, this method can detect abnormal behavior before a fault actually occurs, giving the network management system (human or automated) the ability to avoid a potentially serious problem. 1 1
Fault Isolation based on Decision-Theoretic Troubleshooting
, 1996
"... A decision-theoretic approach for fault isolation in broadband networks is presented. Our approach considers faults due to software and hardware as well as performance degradation and configuration problems. Belief networks are used to represent the relationships among various network entities. Duri ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
A decision-theoretic approach for fault isolation in broadband networks is presented. Our approach considers faults due to software and hardware as well as performance degradation and configuration problems. Belief networks are used to represent the relationships among various network entities. During a troubleshooting session, the network manager iteratively derives a sequence of tests based on the conditional probabilities, computed from statistics gathered (via alarms and tests) about the state of the network, and the costs associated with testing entities. An online dynamic programming technique is used to get the optimal sequence of tests. A system prototype that was implemented based on data from the XUNET testbed is also described. Keywords: Fault Isolation, Fault Management, Decision-Theoretic Troubleshooting, Broadband Networks February 15, 1996. Center for Telecommunications Research Tech. Rep. CU/CTR/TR 442-96-08 Contact author: Jean-Fran¸cois Huard CTR/Columbia University ...
A Model For Alarm Correlation in Telecommunications Networks
, 1997
"... This thesis proposes a general model for telecommunications networks and, from this model, it proposes a model for alarm correlation in the network as a whole. The model is based on a principle named recursive multifocal correlation, also developed in this thesis, according to which the telecommunic ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This thesis proposes a general model for telecommunications networks and, from this model, it proposes a model for alarm correlation in the network as a whole. The model is based on a principle named recursive multifocal correlation, also developed in this thesis, according to which the telecommunications network is partitioned into several sub-networks, each one of them constituting a correlation focus. The breakdown of the problem into smaller sub-problems facilitates its solution and allows the use, in each sub-network, of the correlation technique most suitable to its peculiarities. The multifocal correlation principle may be recursively utilized in each sub-network until the network element level is reached. The concepts developed were utilized in the implementation of a prototype, used for alarm correlation in a canonical telecommunications network. By utilizing a commercial product as a tool for the development and evaluation of Bayesian networks, the occurrence of alarms was simulated and the functioning of the model was verified, both concerning the identification of the possible causes for the received alarms (diagnostic inference), as in the prediction of possible effects (predictive inference).
A framework for distributed fault management using intelligent software agents
- in CCEC, Monteral
, 2003
"... This paper proposes a framework for distributed management of network faults by software agents. Intelligent network agents with advanced reasoning capabilities address many of the issues for the distribution of processing and control in network management. The agents detect, correlate and selective ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper proposes a framework for distributed management of network faults by software agents. Intelligent network agents with advanced reasoning capabilities address many of the issues for the distribution of processing and control in network management. The agents detect, correlate and selectively seek to derive a clear explanation of alarms generated in their domain. The causal relationship between faults and their effects is presented as a Bayesian network. As evidence (alarms) is gathered, the probability of the presence of any particular fault is strengthened or weakened. Agents having a narrower view of the network forward their findings to another with a much broader view of the network. Depending on the network’s degree of automation, the agent can carry out local recovery actions. A prototype reflecting the ideas discussed in this paper is under implementation.
Probabilistic Reasoning for Fault Management on XUNET
, 1994
"... This document describes a fault management application that is based upon the concept of probabilistic reasoning. The application is the first step towards an automated fault management expert system that uses statistical methods to isolate faults in broadband networks. In its core, the inference en ..."
Abstract
- Add to MetaCart
This document describes a fault management application that is based upon the concept of probabilistic reasoning. The application is the first step towards an automated fault management expert system that uses statistical methods to isolate faults in broadband networks. In its core, the inference engine is based on Lauritzen and Spiegelhalter algorithm for computations of probabilities on graphical structure. In this report we describe the various steps of the knowledge engineering taken toward the composition of a belief network for fault identification on XUNET as well as the design of the inference engine and the data acquisition system. Future research recommendations are outlined at the end. September 28, 1994. This document reports the work done at AT&T Bell Labs, Murray Hill, NJ, from May 2 to August 19, 1994, as part of XUNET summer research program. Probabilistic Reasoning for Fault Management on XUNET Jean-Fran¸cois Huard Department of Electrical Engineering and Center fo...
Research Proposal: Characterizing Network Complexity by Means of Fault Diagnosis
"... Systems are often qualified as being complex, when they are made of a large number of components, when many types of relationships and interconnections between the elements are possible, and when uncertainty exists concerning the state of the system. After a review of complexity concepts, we propose ..."
Abstract
- Add to MetaCart
Systems are often qualified as being complex, when they are made of a large number of components, when many types of relationships and interconnections between the elements are possible, and when uncertainty exists concerning the state of the system. After a review of complexity concepts, we propose to define the complexity of a network as the minimum expected number of binary queries needed to find the cause of a sudden change of its state. Our complexity definition is closely related to the detection, localization and isolation of faults. These are all tasks in fault management. Fault detection is accomplished by using alarms provided by the network either through active monitoring or error reporting. For localizing and identifying faults, a probabilistic approach that is "natural" for modeling the network and is analytically tractable is considered; more specifically the network is modeled as a random field whose equilibrium probability has a product form structure. These models are...

