Results 1 - 10
of
43
A Survey of Kernels for Structured Data
"... Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much 'real-world ' data, however, is structured- it has no natural representation in a single table. Usually, to apply kernel methods to 'realworl ..."
Abstract
-
Cited by 84 (3 self)
- Add to MetaCart
Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much 'real-world ' data, however, is structured- it has no natural representation in a single table. Usually, to apply kernel methods to 'realworld' data, extensive pre-processing is performed toembed the data into areal vector space and thus in a single table. This survey describes several approaches ofdefining positive definite kernels on structured instances directly.
Internet Traffic Classification Demystified: The Myths, Caveats and Best Practices
- In Proc. ACM CoNEXT
, 2008
"... Recent research on Internet traffic classification algorithms has yielded a flurry of proposed approaches for distinguishing types of traffic, but no systematic comparison of the various algorithms. This fragmented approach to traffic classification research leaves the operational community with no ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Recent research on Internet traffic classification algorithms has yielded a flurry of proposed approaches for distinguishing types of traffic, but no systematic comparison of the various algorithms. This fragmented approach to traffic classification research leaves the operational community with no basis for consensus on what approach to use when, and how to interpret results. In this work we critically revisit traffic classification by conducting a thorough evaluation of three classification approaches, based on transport layer ports, host behavior, and flow features. A strength of our work is the broad range of data against which we test the three classification approaches: seven traces with payload collected in Japan, Korea, and the US. The diverse geographic locations, link characteristics and application traffic mix in these data allowed us to evaluate the approaches under a wide variety of conditions. We analyze the advantages and limitations of each approach, evaluate methods to overcome the limitations, and extract insights and recommendations for both the study and practical application of traffic classification. We make our software, classifiers, and data available for researchers interested in validating or extending this work. 1.
Provably Fast Training Algorithms for Support Vector Machines
"... Support Vector Machines are a family of algorithms for the analysis of data based on convex Quadratic Programming. We focus on their use for classification, where the SVM algorithms work by maximizing the margin of a classifying hyperplane in a feature space. In this paper, based on a variation of R ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Support Vector Machines are a family of algorithms for the analysis of data based on convex Quadratic Programming. We focus on their use for classification, where the SVM algorithms work by maximizing the margin of a classifying hyperplane in a feature space. In this paper, based on a variation of Random Sampling Techniques, techniques successfully used for similar problems, we derive a randomized algorithm for training SVMs and formally prove an upper bound on the expected running time which is quasilinear with respect to the number of data points. To our knowledge, this is the first algorithm with a quasilinear bound that can handle SVMs with kernels. (This is a full version of the conference paper [BDW02].) 1.
Incremental Support Vector Machine Construction
- In ICDM
, 2001
"... SVMs suffer from the problem of large memory requirement and CPU time when trained in batch mode on large data sets. We overcome these limitations, and at the same time make SVMs suitable for learning with data streams, by constructing incremental learning algorithms. ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
SVMs suffer from the problem of large memory requirement and CPU time when trained in batch mode on large data sets. We overcome these limitations, and at the same time make SVMs suitable for learning with data streams, by constructing incremental learning algorithms.
One-Class Novelty Detection for Seizure Analysis from Intracranial EEG
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... This paper describes an application of one-class support vector machine (SVM) novelty detection for detecting seizures in humans. Our technique maps intracranial electroencephalogram (EEG) time series into corresponding novelty sequences by classifying short-time, energy-based statistics computed ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This paper describes an application of one-class support vector machine (SVM) novelty detection for detecting seizures in humans. Our technique maps intracranial electroencephalogram (EEG) time series into corresponding novelty sequences by classifying short-time, energy-based statistics computed from one-second windows of data. We train a classifier on epochs of interictal (normal) EEG. During ictal (seizure) epochs of EEG, seizure activity induces distributional changes in feature space that increase the empirical outlier fraction. A hypothesis test determines when the parameter change differs significantly from its nominal value, signaling a seizure detection event.
What’s the code? automatic classification of source code archives
- Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2002
"... There are various source code archives on the World Wide Web. These archives are usually organized by application categories and programming languages. However, manually organizing source code repositories is not a trivial task since they grow rapidly and are very large (on the order of terabytes). ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
There are various source code archives on the World Wide Web. These archives are usually organized by application categories and programming languages. However, manually organizing source code repositories is not a trivial task since they grow rapidly and are very large (on the order of terabytes). We demonstrate machine learning methods for automatic classification of archived source code into eleven application topics and ten programming languages. For topical classification, we concentrate on C and C++ programs from the Ibiblio and the Sourceforge archives. Support vector machine (SVM) classifiers are trained on examples of a given programming language or programs in a specified category. We show that source code can be accurately and automatically classified into topical categories and can be identified to be in a specific programming language class. 1.
User re-authentication via mouse movements
- In ACM workshop on Visualization and Data Mining for Computer Security
, 2004
"... We present an approach to user re-authentication based on the data collected from the computer’s mouse device. Our underlying hypothesis is that one can successfully model user behavior on the basis of user-invoked mouse movements. Our implemented system raises an alarm when the current behavior of ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We present an approach to user re-authentication based on the data collected from the computer’s mouse device. Our underlying hypothesis is that one can successfully model user behavior on the basis of user-invoked mouse movements. Our implemented system raises an alarm when the current behavior of user X, deviates sufficiently from learned “normal ” behavior of user X. We apply a supervised learning method to discriminate among k users. Our empirical results for eleven users show that we can differentiate these individuals based on their mouse movement behavior with a false positive rate of 0.43 % and a false negative rate of 1.75%. Nevertheless, we point out that analyzing mouse movements alone is not sufficient for a stand-alone user reauthentication system.
Support Vector Clustering Through Proximity Graph Modelling
- Proceedings, 9th International Conference on Neural Information Processing (ICONIP’02), 2002
, 2002
"... Support Vector Machines (SVMs) have been widely adopted for classification, regression and novelty detection. Recent studies [1, 2] proposed to employ them for cluster analysis too. The basis of this support vector clustering (SVC) is density estimation through SVM training. SVC is a boundarybased ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Support Vector Machines (SVMs) have been widely adopted for classification, regression and novelty detection. Recent studies [1, 2] proposed to employ them for cluster analysis too. The basis of this support vector clustering (SVC) is density estimation through SVM training. SVC is a boundarybased clustering method, where the support information is used to construct cluster boundaries. Despite its ability to deal with outliers, to handle high dimensional data and arbitrary boundaries in data space, there are two problems in the process of cluster labelling. The first problem is its low efficiency when the number of free support vectors increases. The other problem is that it sometimes produces false negatives. In the present paper, we propose a robust cluster assignment method that harvests clustering results efficiently. Our method uses proximity graphs to model the proximity structure of the data. We experimentally analyze and illustrate the performance of this new approach.
Robust sparse hyperplane classifiers: application to uncertain molecular profiling data
- Journal of Computational Biology
, 2004
"... Key words: robust sparse hyperplanes; second-order cone program; linear programming; breast cancer; molecular profiling; two-class high-dimensional data ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Key words: robust sparse hyperplanes; second-order cone program; linear programming; breast cancer; molecular profiling; two-class high-dimensional data
Automatic video classification: A survey of the literature
- IEEE Transactions on Systems, Man, and Cybernetics, Part C
"... Abstract—There is much video available today. To help viewers find video of interest, work has begun on methods of automatic video classification. In this paper, we survey the video classification literature. We find that features are drawn from three modalities–text, audio, and visual–and that a la ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Abstract—There is much video available today. To help viewers find video of interest, work has begun on methods of automatic video classification. In this paper, we survey the video classification literature. We find that features are drawn from three modalities–text, audio, and visual–and that a large variety of combinations of features and classification have been explored. We describe the general features chosen and summarize the research in this area. We conclude with ideas for further research. Index Terms—video classification I.

