Results 1 -
1 of
1
Building a Corpus-based Historical Portuguese Dictionary: Challenges and Opportunities
"... sandra @ icmc.usp.br ABSTRACT: Historical corpora are important resources for different areas. Philology, Human Language Technology, Literary Studies, History, and Lexicography are some that benefit from them. However, compiling historical corpora is different from compiling contemporary corpora. Co ..."
Abstract
- Add to MetaCart
sandra @ icmc.usp.br ABSTRACT: Historical corpora are important resources for different areas. Philology, Human Language Technology, Literary Studies, History, and Lexicography are some that benefit from them. However, compiling historical corpora is different from compiling contemporary corpora. Corpus designers have to deal with several characteristics inherent in historical texts, such as: absence of a spelling standard, pervasive use of abbreviations plus their spelling variations, lack of space between words, irregular use of hyphenation, nonstandard typographical symbols. This paper addresses the challenges posed in processing the corpus designed for the Historical Dictionary of Brazilian Portuguese (HDBP) project, which is composed of texts from the sixteenth through the beginning of the nineteenth century, and the solutions found to support the compilation of a Historical Portuguese dictionary based on this corpus. RÉSUMÉ: Les corpus historiques sont des ressources importantes pour différents domaines: a Philologie, la Technologie du Langage Humain, les Études Littéraires, l’Histoire et la Lexicographie en tirent profit. Toutefois, la compilation des corpus historiques est différente de la compilation des corpus contemporains. Les concepteurs de corpus doivent faire face à des problèmes inhérents aux textes historiques, tels que: l’absence d'une norme orthographique, l'utilisation généralisée des abréviations en plus de leurs variantes orthographiques, le manque d'espace entre les mots, l'utilisation irrégulière des traits d'union, les symbols typographiques non standard. Ce document aborde les défis posés dans le traitement des corpus conçus po ur le Dictionnaire Historique du Portugais Brésilien (DHPB), qui est composé de textes du XVIe jusqu'au début du XIXe siècle, et les solutions trouvées pour appuyer la compilation d'um dictionnaire du portugais historique basé sur ce corpus. KEY WORDS: historical corpora, corpora processing, historical dictionaries, Brazilian history

