• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Support vector machines for text categorization based on latent semantic indexing (2003)

by Y Huang
Add To MetaCart

Tools

Sorted by:
Results 1 - 3 of 3

Learning the Kernel Matrix with Semi-Definite Programming

by Gert R. G. Lanckriet, Nello Cristianini, Laurent El Ghaoui, Peter Bartlett, Michael I. Jordan , 2002
"... Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information ..."
Abstract - Cited by 368 (16 self) - Add to MetaCart
Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information is contained in the so-called kernel matrix, a symmetric and positive definite matrix that encodes the relative positions of all points. Specifying this matrix amounts to specifying the geometry of the embedding space and inducing a notion of similarity in the input space---classical model selection problems in machine learning. In this paper we show how the kernel matrix can be learned from data via semi-definite programming (SDP) techniques. When applied

Support Vector Machine for Intrusion Detection Based on LSI Feature Selection *

by Qing Yang, Fangmin Li
"... Abstract- Data mining as a novel approach has been widely applied to intrusion detection, selecting an appropriate representation to extract the most significant feature is very important, and the algorithm of pattern classification is also crucial. This paper describes a new support vector machine ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract- Data mining as a novel approach has been widely applied to intrusion detection, selecting an appropriate representation to extract the most significant feature is very important, and the algorithm of pattern classification is also crucial. This paper describes a new support vector machine (SVM) for anomaly intrusion detection idea based on Latent Semantic Indexing (LSI). The proposed method can generate features in LSI method by singular value decomposition (SVD). SVD as preprocessing step can reduce the dimensionality and remove the noise in the raw data matrix. The computation complexity is also greatly degraded. SVM has been proved that have a good performance for classification. We performed SVM on the new feature space obtained by SVD, the Redial Basis Function (RBF) is chosen as our kernel function. Our experiments performed on PARPA’98 BSM data set show that our approach can lead to a higher detection rate and a lower false positive rate, it is an effective and efficient methods. In particular, LSI technique can lead to a greatly reduction of the computation complexity and CPU-time. Index Terms- Intrusion detection, Support vector machine, Information retrieval, Latent semantic indexing (LSI).

I V E R

by Daniel Kristopher Harvey
"... The belief of the population is very useful information but is hard to measure. Methods such as voting and polling are both expensive and slow to run. Recently prediction markets have become a popular method to aggregate information and beliefs from the population using the market price as the mean ..."
Abstract - Add to MetaCart
The belief of the population is very useful information but is hard to measure. Methods such as voting and polling are both expensive and slow to run. Recently prediction markets have become a popular method to aggregate information and beliefs from the population using the market price as the mean belief. The problem with these is that they have to be used directly by the population which limits their spread and have to be set up for specific questions which limits their application. We propose a novel solution to aggregate the belief of the population indirectly from social media services and overcome these problems. In particular we focus on blogs and Twitter posts which form a very noisy web-scale text collection. We extracted the beliefs by using statistical text analysis on posts which we aggregated using linear regression. We used the recent swine flu outbreak as a novel example to evaluate our models on the belief that it would turn into a pandemic. We found that it was possible to extract the belief of the population from social media and that aggregating the beliefs by linear regression performed comparably to prediction markets. Twitter outperformed blog posts showing it was more informative for this problem. Our forecast model outperformed a strong
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University