Results 1 -
3 of
3
Normalization of Non-Standard Words
- WS'99 Final Report
, 1999
"... In addition to ordinary words and names, real text contains non-standard “words” (NSWs), including numbers, abbreviations, dates, currency amounts and acronyms. Typically, one cannot find NSWs in a dictionary, nor can one find their pronunciation by an application of ordinary “letter-to-sound ” rule ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
In addition to ordinary words and names, real text contains non-standard “words” (NSWs), including numbers, abbreviations, dates, currency amounts and acronyms. Typically, one cannot find NSWs in a dictionary, nor can one find their pronunciation by an application of ordinary “letter-to-sound ” rules. Non-standard words also have a greater propensity than ordinary words to be ambiguous with respect to their interpretation or pronunciation. In many applications, it is desirable to “normalize ” text by replacing the NSWs with the contextually appropriate ordinary word or sequence of words. Typical technology for text normalization involves sets of ad hoc rules tuned to handle one or two genres of text (often newspaper-style text) with the expected result that the techniques do not usually generalize well to new domains. The purpose of the work reported here is to take some initial steps towards addressing deficiencies in previous approaches to text normalization. We developed a taxonomy of NSWs on the basis of four rather distinct text
Normalization of Non-Standard Words: WS '99 Final Report
- Hopkins University
, 1999
"... All areas of language and speech technology must deal, in one way or another, with real text. Real text is messy: many things one nds in text | numbers, abbreviations, dates, currency amounts, acronyms . . . | are not standard words in that one cannot nd their properties by looking them up in a ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
All areas of language and speech technology must deal, in one way or another, with real text. Real text is messy: many things one nds in text | numbers, abbreviations, dates, currency amounts, acronyms . . . | are not standard words in that one cannot nd their properties by looking them up in a dictionary or deriving them morphologically from words that are in a dictionary, nor can one nd their pronunciation by an application of \letter-to-sound" rules. For many applications, such non-standard words | NSW's | need to be normalized, or in other words converted into standard words. Since the correct normalization of a given token often depends upon both the local context and the type (genre) of text one is dealing with, \text-normalization" is in general a very hard problem. Typical technology for text-normalization mostly involves sets of ad hoc rules tuned to handle one or two genres of text (often newspaper-style text), with the expected result that the techniques, do...
Why Rose is the Rose: On the use of definite articles in proper names
"... The goal of this paper is to examine the use of definite articles with proper names, both cross-linguistically and intra-linguistically and provide a morpho-syntactic analysis of it. The first question to consider is whether article absence or article presence is the default case. The second questio ..."
Abstract
- Add to MetaCart
The goal of this paper is to examine the use of definite articles with proper names, both cross-linguistically and intra-linguistically and provide a morpho-syntactic analysis of it. The first question to consider is whether article absence or article presence is the default case. The second question is when and how the alternative arises.

