On Measuring the Similarity of Network Hosts: Pitfalls, New Metrics, and Empirical Analyses
| Citations: | 1 - 1 self |
BibTeX
@MISC{Coull_onmeasuring,
author = {Scott E. Coull and Fabian Monrose and Michael Bailey},
title = {On Measuring the Similarity of Network Hosts: Pitfalls, New Metrics, and Empirical Analyses},
year = {}
}
OpenURL
Abstract
As the scope and scale of network data grows, security practitioners and network operators are increasingly turning to automated data analysis methods to extract meaningful information. Underpinning these methods are distance metrics that represent the similarity between two values or objects. In this paper, we argue that many of the obvious distance metrics used to measure behavioral similarity among network hosts fail to capture the semantic meaning imbued by network protocols. Furthermore, they also tend to ignore long-term temporal structure of the objects being measured. To explore the role of these semantic and temporal characteristics, we develop a new behavioral distance metric for network hosts and compare its performance to a metric that ignores such information. Specifically, we propose semantically meaningful metrics for common data types found within network data, show how these metrics can be combined to treat network data as a unified metric space, and describe a temporal sequencing algorithm that captures long-term causal relationships. In doing so, we bring to light several challenges inherent in defining behavioral metrics for network data, and put forth a new way of approaching network data analysis problems. Our proposed metric is empirically evaluated on a dataset of over 30 million network flows, with results that underscore the utility of a holistic approach to network data analysis. 1







