• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

U N I V E R S

Cached

  • Download as a PDF

Download Links

  • [www.iccs.inf.ed.ac.uk]
  • [www.informatics.ed.ac.uk]
  • [www.inf.ed.ac.uk]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Calum Robert , William Clark
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

Versions

  • Version 0

Version History

Metadata Version 0

DatumValueSource
TITLE U N I V E R S SVM HeaderParse 0.2
AUTHOR NAME Calum Robert SVM HeaderParse 0.2
AUTHOR AFFIL s1049262; E SVM HeaderParse 0.2
AUTHOR NAME William Clark SVM HeaderParse 0.2
AUTHOR AFFIL s1049262; E SVM HeaderParse 0.2
ABSTRACT This project considers a number of the methods for instance/example selection in training data for language models with the most promising being experimented with and evaluated via hypothesis testing. The most successful, the expansion on the perplexity based work of Roger Moore was selected for further development due to its good test results and ability to locate related sentences. A number of possible filter methods were produced for improving the performance and results of that method. Each of these filters were tested with a decrease in data size of between 2.6 and 75 % being returned. The best performing of these filters with a decrease in data of 57 % was then selected and after some fine tuning a combination of it and the original method were tested to gauge its full abilities. The results show that the combination of methods managed to form a scalable solution to the problem with datasets with on average 48 % lower perplexity than a baseline approach being produced. The additional optimization features were shown to reduce the time to run by between 50 and 60%. i Acknowledgements Many thanks to my supervisor Miles Osbourne for his advice and guidance and to my colleges whose opinions helped me gain a full perspective on my work. Also to my proof readers for dealing with countless unnecessary commas. ii Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. SVM HeaderParse 0.2
CITATIONS 28 found ParsCit 1.0
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University