Results 1 - 10
of
2,866
Text Normalization for the . . .
- G.A. VOUROS AND T. PANAYIOTOPOULOS (EDS.): SETN 2004, LNAI 3025, PP. 390--399
, 2004
"... In this paper we present a novel approach, called "Text to Pronunciation (TtP)", for the proper normalization of Non-Standard Words (NSWs) in unrestricted texts. The methodology deals with inflection issues for the consistency of the NSWs with the syntactic structure of the utterances t ..."
Abstract
- Add to MetaCart
In this paper we present a novel approach, called "Text to Pronunciation (TtP)", for the proper normalization of Non-Standard Words (NSWs) in unrestricted texts. The methodology deals with inflection issues for the consistency of the NSWs with the syntactic structure of the utterances
Text Normalization system for Bangla
"... This paper describes a process of text normalization system of Bangla language (exonym: Bengali) by identifying the semiotic classes from Bangla text corpus. After identifying the semiotic classes a set of rules were written for tokenization and verbalization. This study is important for Text-To-Spe ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper describes a process of text normalization system of Bangla language (exonym: Bengali) by identifying the semiotic classes from Bangla text corpus. After identifying the semiotic classes a set of rules were written for tokenization and verbalization. This study is important for Text
Hindi Text Normalization
"... All areas of language and speech technology, directly or indirectly, require handling of real (unrestricted) text. For example, Text-to-Speech systems directly need to work on real text, whereas Automatic Speech Recognition systems depend on language models that are trained on text. This paper repor ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
reports our ongoing effort on Hindi Text Normalization. In that, a novel approach to text normalization, wherein tokenization and initial token classification are combined into one stage followed by a second level of token sense disambiguation, is described. Tokenization and initial token classification
Text Normalization And Speech Recognition In French
- Proc. ESCA Eurospeech'97
, 1997
"... In this paper we present a quantitative investigation into the impact of text normalization on lexica and language models for speech recognition in French. The text normalization process defines what is considered to be a word by the recognition system. Depending on this definition we can measure di ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
In this paper we present a quantitative investigation into the impact of text normalization on lexica and language models for speech recognition in French. The text normalization process defines what is considered to be a word by the recognition system. Depending on this definition we can measure
Pivoted Document Length Normalization
- SIGIR'96
, 1996
"... Automatic information retrieval systems have to deal with documents of varying lengths in a text collection. Document length normalization is used to fairly retrieve documents of all lengths. In this study, we ohserve that a normalization scheme that retrieves documents of all lengths with similar c ..."
Abstract
-
Cited by 477 (16 self)
- Add to MetaCart
Automatic information retrieval systems have to deal with documents of varying lengths in a text collection. Document length normalization is used to fairly retrieve documents of all lengths. In this study, we ohserve that a normalization scheme that retrieves documents of all lengths with similar
Document Centered Approach to Text Normalization
- SIGIR 2000
, 2000
"... In this paper we present an approach to tackle three important problems of text normalization: sentence boundary disambiguation, disambiguation of capital-ized words when they are used in positions where cap-italization is expected, and identification of abbrevi-ations. The main/eature of our approa ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
In this paper we present an approach to tackle three important problems of text normalization: sentence boundary disambiguation, disambiguation of capital-ized words when they are used in positions where cap-italization is expected, and identification of abbrevi-ations. The main/eature of our
A multilingual text normalization approach
- In 2nd Less-Resourced Languages workshop, 5th Language & Technology Conference, Poznàn
, 2011
"... The creation of text corpora requires a sequence of processing steps in order to constitute, normalize, and then to directly exploit it by a given application. This paper presents a generic approach for text normalization and concentrates on the aspects of methodology and linguistic engineering, whi ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The creation of text corpora requires a sequence of processing steps in order to constitute, normalize, and then to directly exploit it by a given application. This paper presents a generic approach for text normalization and concentrates on the aspects of methodology and linguistic engineering
An extensive empirical study of feature selection metrics for text classification
- J. of Machine Learning Research
, 2003
"... Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison ..."
Abstract
-
Cited by 496 (15 self)
- Add to MetaCart
in different situations. The results reveal that a new feature selection metric we call ‘Bi-Normal Separation ’ (BNS), outperformed the others by a substantial margin in most situations. This margin widened in tasks with high class skew, which is rampant in text classification problems and is particularly
Adaptive Parser-Centric Text Normalization
"... Text normalization is an important first step towards enabling many Natural Language Processing (NLP) tasks over informal text. While many of these tasks, such as parsing, perform the best over fully grammatically correct text, most existing text normalization approaches narrowly define the task in ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Text normalization is an important first step towards enabling many Natural Language Processing (NLP) tasks over informal text. While many of these tasks, such as parsing, perform the best over fully grammatically correct text, most existing text normalization approaches narrowly define the task
Accurate Methods for the Statistics of Surprise and Coincidence
- COMPUTATIONAL LINGUISTICS
, 1993
"... Much work has been done on the statistical analysis of text. In some cases reported in the literature, inappropriate statistical methods have been used, and statistical significance of results have not been addressed. In particular, asymptotic normality assumptions have often been used unjustifiably ..."
Abstract
-
Cited by 1057 (1 self)
- Add to MetaCart
Much work has been done on the statistical analysis of text. In some cases reported in the literature, inappropriate statistical methods have been used, and statistical significance of results have not been addressed. In particular, asymptotic normality assumptions have often been used
Results 1 - 10
of
2,866