Results 1 -
1 of
1
Unsupervised Mining of Lexical Variants from Noisy Text
"... The amount of data produced in usergenerated content continues to grow at a staggering rate. However, the text found in these media can deviate wildly from the standard rules of orthography, syntax and even semantics and present significant problems to downstream applications which make use of this ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The amount of data produced in usergenerated content continues to grow at a staggering rate. However, the text found in these media can deviate wildly from the standard rules of orthography, syntax and even semantics and present significant problems to downstream applications which make use of this noisy data. In this paper we present a novel unsupervised method for extracting domainspecific lexical variants given a large volume of text. We demonstrate the utility of this method by applying it to normalize text messages found in the online social media service, Twitter, into their most likely standard English versions. Our method yields a 20 % reduction in word error rate over an existing state-of-theart approach. 1

