• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 2,866
Next 10 →

Text Normalization for the . . .

by Gerasimos Xydas, Georgios Karberis, Georgios Kouroupertroglou - G.A. VOUROS AND T. PANAYIOTOPOULOS (EDS.): SETN 2004, LNAI 3025, PP. 390--399 , 2004
"... In this paper we present a novel approach, called "Text to Pronunciation (TtP)", for the proper normalization of Non-Standard Words (NSWs) in unrestricted texts. The methodology deals with inflection issues for the consistency of the NSWs with the syntactic structure of the utterances t ..."
Abstract - Add to MetaCart
In this paper we present a novel approach, called "Text to Pronunciation (TtP)", for the proper normalization of Non-Standard Words (NSWs) in unrestricted texts. The methodology deals with inflection issues for the consistency of the NSWs with the syntactic structure of the utterances

Text Normalization system for Bangla

by Firoj Alam, S. M. Murtoza Habib, Mumit Khan
"... This paper describes a process of text normalization system of Bangla language (exonym: Bengali) by identifying the semiotic classes from Bangla text corpus. After identifying the semiotic classes a set of rules were written for tokenization and verbalization. This study is important for Text-To-Spe ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
This paper describes a process of text normalization system of Bangla language (exonym: Bengali) by identifying the semiotic classes from Bangla text corpus. After identifying the semiotic classes a set of rules were written for tokenization and verbalization. This study is important for Text

Hindi Text Normalization

by K. Panchapagesan, Partha Pratim Talukdar, N. Sridhar Krishna, Kalika Bali , A. G. Ramakrishnan
"... All areas of language and speech technology, directly or indirectly, require handling of real (unrestricted) text. For example, Text-to-Speech systems directly need to work on real text, whereas Automatic Speech Recognition systems depend on language models that are trained on text. This paper repor ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
reports our ongoing effort on Hindi Text Normalization. In that, a novel approach to text normalization, wherein tokenization and initial token classification are combined into one stage followed by a second level of token sense disambiguation, is described. Tokenization and initial token classification

Text Normalization And Speech Recognition In French

by Gilles Adda, Martine Adda-decker, Jean-luc Gauvain, Lori Lamel - Proc. ESCA Eurospeech'97 , 1997
"... In this paper we present a quantitative investigation into the impact of text normalization on lexica and language models for speech recognition in French. The text normalization process defines what is considered to be a word by the recognition system. Depending on this definition we can measure di ..."
Abstract - Cited by 9 (5 self) - Add to MetaCart
In this paper we present a quantitative investigation into the impact of text normalization on lexica and language models for speech recognition in French. The text normalization process defines what is considered to be a word by the recognition system. Depending on this definition we can measure

Pivoted Document Length Normalization

by Amit Singhal, Chris Buckley, Mandar Mitra - SIGIR'96 , 1996
"... Automatic information retrieval systems have to deal with documents of varying lengths in a text collection. Document length normalization is used to fairly retrieve documents of all lengths. In this study, we ohserve that a normalization scheme that retrieves documents of all lengths with similar c ..."
Abstract - Cited by 477 (16 self) - Add to MetaCart
Automatic information retrieval systems have to deal with documents of varying lengths in a text collection. Document length normalization is used to fairly retrieve documents of all lengths. In this study, we ohserve that a normalization scheme that retrieves documents of all lengths with similar

Document Centered Approach to Text Normalization

by Andrei Mikheev - SIGIR 2000 , 2000
"... In this paper we present an approach to tackle three important problems of text normalization: sentence boundary disambiguation, disambiguation of capital-ized words when they are used in positions where cap-italization is expected, and identification of abbrevi-ations. The main/eature of our approa ..."
Abstract - Cited by 26 (0 self) - Add to MetaCart
In this paper we present an approach to tackle three important problems of text normalization: sentence boundary disambiguation, disambiguation of capital-ized words when they are used in positions where cap-italization is expected, and identification of abbrevi-ations. The main/eature of our

A multilingual text normalization approach

by Brigitte Bigi - In 2nd Less-Resourced Languages workshop, 5th Language & Technology Conference, Poznàn , 2011
"... The creation of text corpora requires a sequence of processing steps in order to constitute, normalize, and then to directly exploit it by a given application. This paper presents a generic approach for text normalization and concentrates on the aspects of methodology and linguistic engineering, whi ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
The creation of text corpora requires a sequence of processing steps in order to constitute, normalize, and then to directly exploit it by a given application. This paper presents a generic approach for text normalization and concentrates on the aspects of methodology and linguistic engineering

An extensive empirical study of feature selection metrics for text classification

by George Forman, Isabelle Guyon, André Elisseeff - J. of Machine Learning Research , 2003
"... Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison ..."
Abstract - Cited by 496 (15 self) - Add to MetaCart
in different situations. The results reveal that a new feature selection metric we call ‘Bi-Normal Separation ’ (BNS), outperformed the others by a substantial margin in most situations. This margin widened in tasks with high class skew, which is rampant in text classification problems and is particularly

Adaptive Parser-Centric Text Normalization

by Congle Zhang, Tyler Baldwin, Howard Ho, Benny Kimelfeld, Yunyao Li
"... Text normalization is an important first step towards enabling many Natural Language Processing (NLP) tasks over informal text. While many of these tasks, such as parsing, perform the best over fully grammatically correct text, most existing text normalization approaches narrowly define the task in ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Text normalization is an important first step towards enabling many Natural Language Processing (NLP) tasks over informal text. While many of these tasks, such as parsing, perform the best over fully grammatically correct text, most existing text normalization approaches narrowly define the task

Accurate Methods for the Statistics of Surprise and Coincidence

by Ted Dunning - COMPUTATIONAL LINGUISTICS , 1993
"... Much work has been done on the statistical analysis of text. In some cases reported in the literature, inappropriate statistical methods have been used, and statistical significance of results have not been addressed. In particular, asymptotic normality assumptions have often been used unjustifiably ..."
Abstract - Cited by 1057 (1 self) - Add to MetaCart
Much work has been done on the statistical analysis of text. In some cases reported in the literature, inappropriate statistical methods have been used, and statistical significance of results have not been addressed. In particular, asymptotic normality assumptions have often been used
Next 10 →
Results 1 - 10 of 2,866
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University