• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Text Classification using String Kernels

Cached

  • Download as a PDF

Download Links

  • [www.site.uottawa.ca]
  • [www.doc.ic.ac.uk]
  • [www.cs.rhul.ac.uk]
  • [www.jmlr.org]
  • [eric.univ-lyon2.fr]
  • [www.support-vector.net]
  • [www.neurocolt.com]
  • [www.neurocolt.com]
  • [www.cs.cmu.edu]
  • [jmlr.csail.mit.edu]
  • [www.ai.mit.edu]
  • [oucsace.cs.ohiou.edu]
  • [oucsace.cs.ohiou.edu]
  • [www.jmlr.org]
  • [www.cs.fit.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Huma Lodhi , Craig Saunders , John Shawe-Taylor , Nello Cristianini , Chris Watkins
Citations:282 - 6 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Lodhi_textclassification,
    author = {Huma Lodhi and Craig Saunders and John Shawe-Taylor and Nello Cristianini and Chris Watkins},
    title = {Text Classification using String Kernels},
    year = {}
}

Years of Citing Articles

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by anexponentially decaying factor of their full length in the text, hence emphasising those occurrences that are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be e ciently evaluated by a dynamic programming technique. Experimental comparisons of the performance of the kernel compared with a standard word feature space kernel Joachims (1998) show positive results on modestly sized datasets. The case of contiguous subsequences is also considered for comparison with the subsequences kernel with di erent decay factors. For larger documents and datasets the paper introduces an approximation technique that is shown to deliver good approximations e ciently for large datasets.

Citations

6696 The Nature of Statistical Learning Theory - Vapnik - 1995
1368 Text categorization with support vector machines - Joachims - 1998
1086 Making large-scale SVM learning practical - Joachims - 1999
933 A training algorithm for optimal margin classifiers - Boser, Guyon, et al. - 1992
741 Shawe-Taylor J (2000) An Introduction to Support Vector Machines and other kernel-based learning methods - Cristianini
594 A vector space model for automatic indexing - Salton, Wong, et al. - 1975
281 Convolutional kernels on discrete structures - Haussler - 2007
221 Structural risk minimization over data-dependent hierarchies - Shawe-Taylor, Bartlett, et al. - 1940
173 Functions of positive and negative type and their connection with the theory of integral equations - Mercer - 1909
139 Sparse greedy matrix approximation for machine learning - Smola, Schölkopf - 2000
109 Dynamic alignment kernels - Watkins - 1999
95 Input Space vs. Feature Space in Kernel-Based Methods - Schölkopf, Mika, et al. - 1999
26 On the extensions of kernel alignment - Kandola, Shawe-Taylor, et al. - 2002
20 Support Vector Learning. R. Oldenbourg Verlag - Schölkopf - 1997
14 Margin distribution and soft margin - Shawe-Taylor, Cristianini - 2000
10 Using an N-Gram based document representation with a vector processing retrieval model - Cavnar - 1994
4 Acquaintance: Language-independent document categorization by n-grams - Human - 1995
2 The kernel-Adaton: a fast and simple training procedure for support vector machines, in - Friess, Cristianini, et al. - 1998
1 Reuters-21578 collection - Lewis - 1987
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University