Results 1 - 10
of
21
Spam filtering using statistical data compression models
- Journal of Machine Learning Research
, 2006
"... Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task call ..."
Abstract
-
Cited by 33 (12 self)
- Add to MetaCart
Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task calls for fast, incremental and robust learning algorithms. In this paper, we investigate a novel approach to spam filtering based on adaptive statistical data compression models. The nature of these models allows them to be employed as probabilistic text classifiers based on character-level or binary sequences. By modeling messages as sequences, tokenization and other error-prone preprocessing steps are omitted altogether, resulting in a method that is very robust. The models are also fast to construct and incrementally updateable. We evaluate the filtering performance of two different compression algorithms; dynamic Markov compression and prediction by partial matching. The results of our empirical evaluation indicate that compression models outperform currently established spam filters, as well as a number of methods proposed in previous studies.
Understanding Complex Network Attack Graphs through Clustered Adjacency
- Matrices”, Proceedings of the 21st Annual Computer Security Applications Conference (ACSAC
, 2005
"... We apply adjacency matrix clustering to network attack graphs for attack correlation, prediction, and hypothesizing. We self-multiply the clustered adjacency matrices to show attacker reachability across the network for a given number of attack steps, culminating in transitive closure for attack pre ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We apply adjacency matrix clustering to network attack graphs for attack correlation, prediction, and hypothesizing. We self-multiply the clustered adjacency matrices to show attacker reachability across the network for a given number of attack steps, culminating in transitive closure for attack prediction over all possible number of steps. This reachability analysis provides a concise summary of the impact of network configuration changes on the attack graph. Using our framework, we also place intrusion alarms in the context of vulnerabilitybased attack graphs, so that false alarms become apparent and missed detections can be inferred. We introduce a graphical technique that shows multiple-step attacks by matching rows and columns of the clustered adjacency matrix. This allows attack impact/responses to be identified and prioritized according to the number of attack steps to victim machines, and allows attack origins to be determined. Our techniques have quadratic complexity in the size of the attack graph. 1.
Parameter-free spatial data mining using MDL
- In 5th International Conference on Data Mining (ICDM
, 2005
"... Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatial correlation and feature co-occurrence patterns, without any parameters. In particular, we employ the Minimum Description ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatial correlation and feature co-occurrence patterns, without any parameters. In particular, we employ the Minimum Description Length (MDL) principle coupled with a natural way of compressing regions. This defines what “good” means: a feature co-occurrence pattern is good, if it helps us better compress the set of locations for these features. Conversely, a spatial correlation is good, if it helps us better compress the set of features in the corresponding region. Our approach is scalable for large datasets (both number of locations and of features). We evaluate our method on both real and synthetic datasets. 1
Kolmogorov complexity, optimization and hardness
, 2006
"... Abstract — The Kolmogorov complexity (KC) of a string is defined as the length of the shortest program that can print that string and halts. This measure of complexity is often used in optimization to indicate expected function difficulty. While it is often used, there are known counterexamples. Thi ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract — The Kolmogorov complexity (KC) of a string is defined as the length of the shortest program that can print that string and halts. This measure of complexity is often used in optimization to indicate expected function difficulty. While it is often used, there are known counterexamples. This paper investigates the applicability of KC as an estimator of problem difficulty for optimization in the black box scenario. In particular we address the known counterexamples (e.g., pseudorandom functions, the NIAH) and explore the connection of KC to the NFLTs. We conclude that high KC implies hardness however, while easy fitness functions have low KC the reverse is not necessarily true. I.
Audio Speech Segmentation Without Language-Specific Knowledge
- Proceedings of the 28th Annual Meeting of the Cognitive Science Society
, 2006
"... Speech segmentation is the problem of finding word boundaries in spoken language when the underlying vocabulary is still unknown. Here we show that a system with no phonemic knowledge can find word boundaries. The system first subdivides an utterance by recursively clustering similar parts of the si ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Speech segmentation is the problem of finding word boundaries in spoken language when the underlying vocabulary is still unknown. Here we show that a system with no phonemic knowledge can find word boundaries. The system first subdivides an utterance by recursively clustering similar parts of the signal together until the cepstral coefficient variance is low within each new segment. These segments are then used as inputs to a perceptron-like algorithm that finds repeated segments across utterances. With only a few sample utterances, and no previous linguistic knowledge, the system can find the words that were repeated across utterances and identify new utterances that contain those words. The findings show that the assumption of a phoneme classification module is not necessary for a “minimum description length ” (Brent & Cartwright, 1996; de Marcken, 1996) explanation of word segmentation.
TECHNIQUES FOR VISION-BASED HUMAN-COMPUTER INTERACTION by
, 2005
"... With the ubiquity of powerful, mobile computers and rapid advances in sens-ing and robot technologies, there exists a great potential for creating advanced, in-telligent computing environments. We investigate techniques for integrating passive, vision-based sensing into such environments, which incl ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
With the ubiquity of powerful, mobile computers and rapid advances in sens-ing and robot technologies, there exists a great potential for creating advanced, in-telligent computing environments. We investigate techniques for integrating passive, vision-based sensing into such environments, which include both conventional inter-faces and large-scale environments. We propose a new methodology for vision-based human-computer interaction called the Visual Interaction Cues (VICs) paradigm. VICs fundamentally relies on a shared perceptual space between the user and com-puter using monocular and stereoscopic video. In this space, we represent each inter-face component as a localized region in the image(s). By providing a clearly defined interaction locale, it is not necessary to visually track the user. Rather we model interaction as an expected stream of visual cues corresponding to a gesture. Example interaction cues are motion as when the finger moves to press a push-button, and 3D hand posture for a communicative gesture like a letter in sign language. We ex-plore both procedurally defined parsers of the low-level visual cues and learning-based
On the Behavior of MDL Denoising
- In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (AISTATS
, 2005
"... We consider wavelet denoising based on minimum description length (MDL) principle. ..."
Abstract
- Add to MetaCart
We consider wavelet denoising based on minimum description length (MDL) principle.
The Fourth International Workshop
, 2006
"... Distributed data mining (DDM) deals with the problem of analyzing distributed, possibly multi-party data by paying attention to the computing, communication, storage, and human factors-related issues in a distributed environment. Unlike the conventional o#-the-shelf centralized data mining products, ..."
Abstract
- Add to MetaCart
Distributed data mining (DDM) deals with the problem of analyzing distributed, possibly multi-party data by paying attention to the computing, communication, storage, and human factors-related issues in a distributed environment. Unlike the conventional o#-the-shelf centralized data mining products, DDM systems are based on fundamentally distributed algorithms that do not necessarily require centralization of data and other resources. DDM technology is finding increasing number of applications in many domains. Examples include data driven pervasive applications for mobile and embedded devices, grid-based large scale scientific and business data analysis, security and defense related applications involving analysis of multi-party possibly privacy-sensitive data, and peer-topeer data stream mining in sensor and file-sharing networks. This talk will focus on peer-to-peer (P2P) distributed data stream mining and monitoring. It will first discuss the foundation of approximate and exact P2P algorithms for data analysis. Then it will present a class of P2P algorithms for eigen-analysis and clustering in details. The talk will end with a discussion on the future directions of research on P2P data mining.
c ○ 2008 Yang LiINCREMENTAL TRAINING AND GROWTH OF ARTIFICIAL NEURAL NETWORKS BY
"... Training of automatic pattern recognition or function regression systems has been investigated for decades, and it is fairly well understood that the usage of limited amounts of empirical data in the training of such systems necessarily leads to generalization difficulties. The focus of this work is ..."
Abstract
- Add to MetaCart
Training of automatic pattern recognition or function regression systems has been investigated for decades, and it is fairly well understood that the usage of limited amounts of empirical data in the training of such systems necessarily leads to generalization difficulties. The focus of this work is to investigate the generalization issues associated with a particular class of estimators- artificial neural networks- and formulate a novel method to improve the trade-off between performance and generalizability when it comes to training with a limited amount of empirical data. The improvement comes from an effective utilization of prior knowledge: if a network can be trained on a large training corpus sharing certain characteristics with the data from the task at hand, then the network can be “grown ” to be adapted to solve the current task. The network carries the structure obtained from its training on the large dataset over to the smaller dataset; if there are similarities in the structure, this preservation of structure across applications expedites training and ensures lower variability. The thesis
No-Free-Lunch and the Minimum Description Length
"... The No-Free-Lunch theorem (NFL) states that no learning algorithm exists for the complete domain of problems that will outperform any other algorithm. Or in other words, every learning algorithm will perform equally well when averaged on the complete problem domain [3] The minimum description length ..."
Abstract
- Add to MetaCart
The No-Free-Lunch theorem (NFL) states that no learning algorithm exists for the complete domain of problems that will outperform any other algorithm. Or in other words, every learning algorithm will perform equally well when averaged on the complete problem domain [3] The minimum description length (MDL) is a formalization of Occam’s Razor in which the best hypothesis

