Results 1  10
of
77
Spam filtering using statistical data compression models
 Journal of Machine Learning Research
, 2006
"... Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task call ..."
Abstract

Cited by 55 (12 self)
 Add to MetaCart
(Show Context)
Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task calls for fast, incremental and robust learning algorithms. In this paper, we investigate a novel approach to spam filtering based on adaptive statistical data compression models. The nature of these models allows them to be employed as probabilistic text classifiers based on characterlevel or binary sequences. By modeling messages as sequences, tokenization and other errorprone preprocessing steps are omitted altogether, resulting in a method that is very robust. The models are also fast to construct and incrementally updateable. We evaluate the filtering performance of two different compression algorithms; dynamic Markov compression and prediction by partial matching. The results of our empirical evaluation indicate that compression models outperform currently established spam filters, as well as a number of methods proposed in previous studies.
Understanding Complex Network Attack Graphs through Clustered Adjacency
 Matrices”, Proceedings of the 21st Annual Computer Security Applications Conference (ACSAC
, 2005
"... We apply adjacency matrix clustering to network attack graphs for attack correlation, prediction, and hypothesizing. We selfmultiply the clustered adjacency matrices to show attacker reachability across the network for a given number of attack steps, culminating in transitive closure for attack pre ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
We apply adjacency matrix clustering to network attack graphs for attack correlation, prediction, and hypothesizing. We selfmultiply the clustered adjacency matrices to show attacker reachability across the network for a given number of attack steps, culminating in transitive closure for attack prediction over all possible number of steps. This reachability analysis provides a concise summary of the impact of network configuration changes on the attack graph. Using our framework, we also place intrusion alarms in the context of vulnerabilitybased attack graphs, so that false alarms become apparent and missed detections can be inferred. We introduce a graphical technique that shows multiplestep attacks by matching rows and columns of the clustered adjacency matrix. This allows attack impact/responses to be identified and prioritized according to the number of attack steps to victim machines, and allows attack origins to be determined. Our techniques have quadratic complexity in the size of the attack graph. 1.
Parameterfree spatial data mining using MDL
 In 5th International Conference on Data Mining (ICDM
, 2005
"... Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatial correlation and feature cooccurrence patterns, without any parameters. In particular, we employ the Minimum Description ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatial correlation and feature cooccurrence patterns, without any parameters. In particular, we employ the Minimum Description Length (MDL) principle coupled with a natural way of compressing regions. This defines what “good” means: a feature cooccurrence pattern is good, if it helps us better compress the set of locations for these features. Conversely, a spatial correlation is good, if it helps us better compress the set of features in the corresponding region. Our approach is scalable for large datasets (both number of locations and of features). We evaluate our method on both real and synthetic datasets. 1
Adaptive design optimization: A mutual information based approach to model discrimination in cognitive science
 Neural Computation
, 2010
"... Discriminating among competing statistical models is a pressing issue for many experimentalists in the field of cognitive science. Resolving this issue begins with designing maximally informative experiments. To this end, the problem to be solved in adaptive design optimization is identifying experi ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Discriminating among competing statistical models is a pressing issue for many experimentalists in the field of cognitive science. Resolving this issue begins with designing maximally informative experiments. To this end, the problem to be solved in adaptive design optimization is identifying experimental designs under which one can infer the underlying model in the fewest possible steps. When the models under consideration are nonlinear, as is often the case in cognitive science, this problem can be impossible to solve analytically without simplifying assumptions. However, as we show in this paper, a full solution can be found numerically with the help of a Bayesian computational trick derived from the statistics literature, which recasts the problem as a probability density simulation in which the optimal design is the mode of the density. We use a utility function based on mutual information, and give three intuitive interpretations of the utility function in terms of Bayesian posterior estimates. As a proof of concept, we offer a simple example application to an experiment on memory retention. 1
MDL denoising revisited
 IEEE Transactions on Signal Processing, 57(9):3347 – 3360
, 2009
"... Abstract — We refine and extend an earlier MDL denoising criterion for waveletbased denoising. We start by showing that the denoising problem can be reformulated as a clustering problem, where the goal is to obtain separate clusters for informative and noninformative wavelet coefficients, respecti ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract — We refine and extend an earlier MDL denoising criterion for waveletbased denoising. We start by showing that the denoising problem can be reformulated as a clustering problem, where the goal is to obtain separate clusters for informative and noninformative wavelet coefficients, respectively. This suggests two refinements, adding a codelength for the model index, and extending the model in order to account for subbanddependent coefficient distributions. A third refinement is derivation of soft thresholding inspired by predictive universal coding with weighted mixtures. We propose a practical method incorporating all three refinements, which is shown to achieve good performance and robustness in denoising both artificial and natural signals. Index Terms — Minimum description length (MDL) principle, wavelets, denoising. I.
Outlierrobust clustering using independent components
 SIGMOD Conf
, 2008
"... How can we efficiently find a clustering, i.e. a concise description of the cluster structure, of a given data set which contains an unknown number of clusters of different shape and distribution and is contaminated by noise? Most existing clustering methods are restricted to the Gaussian cluster ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
How can we efficiently find a clustering, i.e. a concise description of the cluster structure, of a given data set which contains an unknown number of clusters of different shape and distribution and is contaminated by noise? Most existing clustering methods are restricted to the Gaussian cluster model and are very sensitive to noise. If the cluster content follows a nonGaussian distribution and/or the data set contains a few outliers belonging to no cluster, then the computed data distribution does not match well the true data distribution, or an unnaturally high number of clusters is required to represent the true data distribution of the data set. In this paper we propose OCI (Outlierrobust Clustering using Independent Components), a clustering method which overcomes these problems by (1) applying the exponential power distribution (EPD) as cluster model which is a generalization of Gaussian, uniform, Laplacian and many other distribution functions, (2) applying the Independent Component Analysis (ICA) for both determining the main directions inside a cluster as well as finding split planes in a topdown clustering approach, and (3) defining an efficient and effective filter for outliers, based on EPD and ICA. Our method is parameterfree and as a topdown clustering approach very efficient. An extensive experimental evaluation shows both the accuracy of the obtained clustering result as well as the efficiency of our method.
1 Unsupervised organization of image collections: taxonomies and beyond
"... Abstract—We introduce a nonparametric Bayesian model, called TAX, which can organize image collections into a treeshaped taxonomy without supervision. The model is inspired by the Nested Chinese Restaurant Process (NCRP) and associates each image with a path through the taxonomy. Similar images sh ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We introduce a nonparametric Bayesian model, called TAX, which can organize image collections into a treeshaped taxonomy without supervision. The model is inspired by the Nested Chinese Restaurant Process (NCRP) and associates each image with a path through the taxonomy. Similar images share initial segments of their paths and thus share some aspects of their representation. Each internal node in the taxonomy represents information that is common to multiple images. We explore the properties of the taxonomy through experiments on a large ( ∼ 10 4) image collection with a number of users trying to locate quickly a given image. We find that the main benefits are easier navigation through image collections and reduced description length. A natural question is whether a taxonomy is the optimal form of organization for natural images. Our experiments indicate that although taxonomies can organize images in a useful manner, more elaborate structures may be even better suited for this task. Index Terms—Taxonomy, hierarchy, clustering. 1
An Efficient, Generic Approach to Extracting MultiWord Expressions from Dependency Trees
"... The Varro toolkit offers an intuitive mechanism for extracting syntactically motivated multiword expressions (MWEs) from dependency treebanks by looking for recurring connected subtrees instead of subsequences in strings. This approach can find MWEs that are in varying orders and have words inserte ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
The Varro toolkit offers an intuitive mechanism for extracting syntactically motivated multiword expressions (MWEs) from dependency treebanks by looking for recurring connected subtrees instead of subsequences in strings. This approach can find MWEs that are in varying orders and have words inserted into their components. This paper also proposes description length gain as a statistical correlation measure wellsuited to tree structures. 1
Audio Speech Segmentation Without LanguageSpecific Knowledge
 Proceedings of the 28th Annual Meeting of the Cognitive Science Society
, 2006
"... Speech segmentation is the problem of finding word boundaries in spoken language when the underlying vocabulary is still unknown. Here we show that a system with no phonemic knowledge can find word boundaries. The system first subdivides an utterance by recursively clustering similar parts of the si ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Speech segmentation is the problem of finding word boundaries in spoken language when the underlying vocabulary is still unknown. Here we show that a system with no phonemic knowledge can find word boundaries. The system first subdivides an utterance by recursively clustering similar parts of the signal together until the cepstral coefficient variance is low within each new segment. These segments are then used as inputs to a perceptronlike algorithm that finds repeated segments across utterances. With only a few sample utterances, and no previous linguistic knowledge, the system can find the words that were repeated across utterances and identify new utterances that contain those words. The findings show that the assumption of a phoneme classification module is not necessary for a “minimum description length ” (Brent & Cartwright, 1996; de Marcken, 1996) explanation of word segmentation.
TECHNIQUES FOR VISIONBASED HUMANCOMPUTER INTERACTION
, 2005
"... With the ubiquity of powerful, mobile computers and rapid advances in sensing and robot technologies, there exists a great potential for creating advanced, intelligent computing environments. We investigate techniques for integrating passive, visionbased sensing into such environments, which incl ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
With the ubiquity of powerful, mobile computers and rapid advances in sensing and robot technologies, there exists a great potential for creating advanced, intelligent computing environments. We investigate techniques for integrating passive, visionbased sensing into such environments, which include both conventional interfaces and largescale environments. We propose a new methodology for visionbased humancomputer interaction called the Visual Interaction Cues (VICs) paradigm. VICs fundamentally relies on a shared perceptual space between the user and computer using monocular and stereoscopic video. In this space, we represent each interface component as a localized region in the image(s). By providing a clearly defined interaction locale, it is not necessary to visually track the user. Rather we model interaction as an expected stream of visual cues corresponding to a gesture. Example interaction cues are motion as when the finger moves to press a pushbutton, and 3D hand posture for a communicative gesture like a letter in sign language. We explore both procedurally defined parsers of the lowlevel visual cues and learningbased