Results 1 -
1 of
1
Normalization of Non-Standard Words: WS '99 Final Report
- Hopkins University
, 1999
"... All areas of language and speech technology must deal, in one way or another, with real text. Real text is messy: many things one nds in text | numbers, abbreviations, dates, currency amounts, acronyms . . . | are not standard words in that one cannot nd their properties by looking them up in a ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
All areas of language and speech technology must deal, in one way or another, with real text. Real text is messy: many things one nds in text | numbers, abbreviations, dates, currency amounts, acronyms . . . | are not standard words in that one cannot nd their properties by looking them up in a dictionary or deriving them morphologically from words that are in a dictionary, nor can one nd their pronunciation by an application of \letter-to-sound" rules. For many applications, such non-standard words | NSW's | need to be normalized, or in other words converted into standard words. Since the correct normalization of a given token often depends upon both the local context and the type (genre) of text one is dealing with, \text-normalization" is in general a very hard problem. Typical technology for text-normalization mostly involves sets of ad hoc rules tuned to handle one or two genres of text (often newspaper-style text), with the expected result that the techniques, do...

