Results 1 - 10
of
21
Model based Bayesian Exploration
- In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence
, 1999
"... Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information --- the expected improvement in future deci ..."
Abstract
-
Cited by 67 (0 self)
- Add to MetaCart
Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information --- the expected improvement in future decision quality arising from the information acquired by exploration. Estimating this quantity requires an assessment of the agent's uncertainty about its current value estimates for states. In this paper we investigate ways to represent and reason about this uncertainty in algorithms where the system attempts to learn a model of its environment. We explicitly represent uncertainty about the parameters of the model and build probability distributions over Qvalues based on these. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation. 1 Introduction Rei...
A Bayesian framework for reinforcement learning
- In Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... The reinforcement learning problem can be decomposed into two parallel types of inference: (i) estimating the parameters of a model for the underlying process; (ii) determining behavior which maximizes return under the estimated model. Following Dearden, Friedman and Andre (1999), it is proposed tha ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
The reinforcement learning problem can be decomposed into two parallel types of inference: (i) estimating the parameters of a model for the underlying process; (ii) determining behavior which maximizes return under the estimated model. Following Dearden, Friedman and Andre (1999), it is proposed that the learning process estimates online the full posterior distribution over models. To determine behavior, a hypothesis is sampled from this distribution and the greedy policy with respect to the hypothesis is obtained by dynamic programming. By using a different hypothesis for each trial appropriate exploratory and exploitative behavior is obtained. This Bayesian method always converges to the optimal policy for a stationary process with discrete states. 1.
Detecting malicious software by monitoring anomalous windows registry accesses
- In Proceedings of the Fifth International Symposium on Recent Advances in Intrusion Detection (RAID-2002
, 2002
"... Abstract. We present a host-based intrusion detection system (IDS) for Microsoft Windows. The core of the system is an algorithm that detects attacks on a host machine by looking for anomalous accesses to the Windows Registry. The key idea is to first train a model of normal registry behavior on a w ..."
Abstract
-
Cited by 27 (9 self)
- Add to MetaCart
Abstract. We present a host-based intrusion detection system (IDS) for Microsoft Windows. The core of the system is an algorithm that detects attacks on a host machine by looking for anomalous accesses to the Windows Registry. The key idea is to first train a model of normal registry behavior on a windows host, and use this model to detect abnormal registry accesses at run-time. The normal model is trained using clean (attack-free) data. At run-time the model is used to check each access to the registry in real time to determine whether or not the behavior is abnormal and (possibly) corresponds to an attack. The system is effective in detecting the actions of malicious software while maintaining a low rate of false alarms. 1 Introduction Microsoft Windows is one of the most popular operating systems today, and also one of the most often attacked. Malicious software running on the host is often used to perpetrate these attacks. There are two widely deployed first lines of defense against malicious software, virus scanners and security patches. Virus scanners attempt to detect malicious software on the host, and security patches are operating systems updates to fix the security holes that malicious software exploits. Both of these methods suffer from the same drawback. They are effective against known attacks but are unable to detect and prevent new types of attacks. Most virus scanners are signature based meaning they use byte sequences or embedded strings in software to identify certain programs as malicious [10, 24]. If a virus scanner's signature database does not contain a signature for a specific malicious program, the virus scanner can not detect or protect against that program. In general, virus scanners require frequent updating of signature databases, otherwise the scanners become useless [29]. Similarly, security patches protect systems only when they have been written, distributed and applied to host systems. Until then, systems remain vulnerable and attacks can and do spread widely.
One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses
- In Proc. of the workshop on Data Mining for Computer Security
, 2003
"... We present a new Host-based Intrusion Detection System (IDS) that monitors accesses to the Microsoft Windows Registry using Registry Anomaly Detection (RAD). Our system uses a one class Support Vector Machine (OCSVM) to detect anomalous registry behavior by training on a dataset of normal registry a ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
We present a new Host-based Intrusion Detection System (IDS) that monitors accesses to the Microsoft Windows Registry using Registry Anomaly Detection (RAD). Our system uses a one class Support Vector Machine (OCSVM) to detect anomalous registry behavior by training on a dataset of normal registry accesses. It then uses this model to detect outliers in new (unclassified) data generated from the same system. Given the success of OCSVMs in other applications, we apply them to the Windows Registry anomaly detection problem. We compare our system to the RAD system using the Probabilistic Anomaly Detection (PAD) algorithm on the same dataset. Surprisingly, we find that PAD outperforms our OCSVM system due to properties of the hierarchical prior incorporated in the PAD algorithm. In the future, these properties may be used to develop an improved kernel and increase the performance of the OCSVM system. 1.
Measures and Applications of Lexical Distributional Similarity
, 2003
"... This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, s ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, statistics, Information Retrieval (IR) and Information Theory. Our aim is to investigate the properties which make a good measure of lexical distributional similarity. We start by introducing the concept of lexical distributional similarity. We discuss potential applications, which can be roughly divided into distributional or language modelling applications and semantic applications, and methods of evaluation (Chapter 2). We look at existing measures of distributional similarity and carry out an empirical comparison of fifteen of these measures, paying particular attention to the effects of word frequency (Chapter 3). We propose a new general framework for distributional similarity based on the context of lexical substitutability, which me measure using the IR concepts of precision and recall. This framework allows us to investigate the key factors in similarity of asymmetry, the relative influence of different contexts and the extent to which words share a context (Chapter 4). Finally, we consider the application of distributional similarity in language modelling (Chapter 5) and as a predictor of semantic similarity using human judgements of similarity and a spelling correction task (Chapter 6).
On-Line Cumulative Learning of Hierarchical Sparse n-Grams
- Proceedings of the Third International Conference on Development and Learning
, 2004
"... We present a system for on-line, cumulative learning of hierarchical collections of frequent patterns from unsegmented data streams. Such learning is critical for long-lived intelligent agents in complex worlds. Learned patterns enable prediction of unseen data and serve as building blocks for highe ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We present a system for on-line, cumulative learning of hierarchical collections of frequent patterns from unsegmented data streams. Such learning is critical for long-lived intelligent agents in complex worlds. Learned patterns enable prediction of unseen data and serve as building blocks for higher-level knowledge representation. We introduce a novel sparse n-gram model that, unlike pruned n-grams, learns on-line by stochastic search for frequent n-tuple patterns. Adding patterns as data arrives complicates probability calculations. We discuss an EM approach to this problem and introduce hierarchical sparse n-grams, a model that uses a better solution based on a new method for combining information across levels. A second new method for combining information from multiple granularities (n-gram widths) enables these models to more effectively search for frequent patterns (an on-line, stochastic analog of pruning in association rule mining). The result is an example of a rare combination---unsupervised, on-line, cumulative, structure learning. Unlike prediction suffix tree (PST) mixtures, the model learns with no size bound but using less space than the data. It does not repeatedly iterate over data (unlike MaxEnt feature construction). It discovers repeated structure on-line and (unlike PSTs) uses this to learn larger patterns. The type of repeated structure is limited (e.g., compared to hierarchical HMMs) but still useful, and these are important first steps towards learning repeated structure in more expressive representations, which has seen little progress especially in unsupervised, on-line contexts.
A Comparative Evaluation of Two Algorithms for Windows Registry Anomaly Detection
- OF JOURNAL OF COMPUTER SECURITY
, 2005
"... We present a component anomaly detector for a host-based intrusion detection system (IDS) for Microsoft Windows. The core of the detector is a learning-based anomaly detection algorithm that detects attacks on a host machine by looking for anomalous accesses to the Windows Registry. We present and c ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We present a component anomaly detector for a host-based intrusion detection system (IDS) for Microsoft Windows. The core of the detector is a learning-based anomaly detection algorithm that detects attacks on a host machine by looking for anomalous accesses to the Windows Registry. We present and compare two anomaly detection algorithms for use in our IDS system and evaluate their performance. One algorithm called PAD, for Probabilistic Anomaly Detection, is based upon a probability density estimation while the second uses the Support Vector Machine framework. The key idea behind the detector is to first train a model of normal Registry behavior on a Windows host, even when noise may be present in the training data, and use this model to detect abnormal Registry accesses. At run-time the model is used to check each access to the Registry in real-time to determine whether or not the behavior is abnormal and possibly corresponds to an attack. The system is effective in detecting the actions of malicious software while maintaining a low rate of false alarms. We show that the probabilistic anomaly detection algorithm exhibits better performance in accuracy and in computational complexity over the support vector machine implementation under three different kernel functions.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process
"... We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i.e., the “topics”). In the sparse topic model (sparseTM), each topic is represented by a bank of selector variables that determine which terms appear ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i.e., the “topics”). In the sparse topic model (sparseTM), each topic is represented by a bank of selector variables that determine which terms appear in the topic. Thus each topic is associated with a subset of the vocabulary, and topic smoothness is modeled on this subset. We develop an efficient Gibbs sampler for the sparseTM that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components. We demonstrate the sparseTM on four real-world datasets. Compared to traditional approaches, the empirical results will show that sparseTMs give better predictive performance with simpler inferred models. 1
Applying General Bayesian Techniques to Improve TAN Induction
- In Proceedings of the International Conference on Knowledge Discovery and Data Mining
, 1999
"... Tree Augmented Naive Bayes (TAN) has shown to be competitive with stateof-the-art machine learning algorithms [9]. However, the TAN induction algorithm that appears in [9] can be improved in several ways. In this paper we identify three weak points in it and introduce two ideas to overcome those pro ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Tree Augmented Naive Bayes (TAN) has shown to be competitive with stateof-the-art machine learning algorithms [9]. However, the TAN induction algorithm that appears in [9] can be improved in several ways. In this paper we identify three weak points in it and introduce two ideas to overcome those problems: the multinomial sampling approach to learning bayesian networks and local bayesian model averaging. These ideas are generic and can thus be reused to improve other learning algorithms. We empirically test the new algorithms, and conclude that in many cases they lead to an improvement in accuracy in the classification and in the quality of the probabilities given as predictions.
Paid: A probabilistic agent-based intrusion detection system
- IN COMPUTERS & SECURITY
, 2005
"... ..."

