Results 1 - 10
of
11
Unit Selection Without A Phoneme Set
- in Proceedings of the IEEE TTS Workshop
, 2002
"... With most human languages having less than 1 million speakers it is unlikely that standard commercial systems will be able to justify supporting the vast majority of so-called "minority " languages. In our continuing task of providing tools for building synthetic voices in currently unsupported lang ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
With most human languages having less than 1 million speakers it is unlikely that standard commercial systems will be able to justify supporting the vast majority of so-called "minority " languages. In our continuing task of providing tools for building synthetic voices in currently unsupported languages, this paper describes a number of experiments in building synthetic voices without requiring specific phonetic knowledge of the target languages. Even when a language is well studied defining an appropriate phoneme set is never easy. The work presented here shows the adequacy of unit selection synthesis techniques when no explicit phoneme set is available.
A computational phonetic model for indian language scripts
- In Constraints on Spelling Changes: Fifth International Workshop on Writing Systems
, 2006
"... In spite of South Asia being one of the richest areas in terms of linguistic diversity, South Asian languages have a lot in common. For example, most of the major Indian languages use scripts which are derived from the ancient Brahmi script, have more or less the same arrangement of alphabet, are hi ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
In spite of South Asia being one of the richest areas in terms of linguistic diversity, South Asian languages have a lot in common. For example, most of the major Indian languages use scripts which are derived from the ancient Brahmi script, have more or less the same arrangement of alphabet, are highly phonetic in nature and are very well organised. We have used this fact to build a computational phonetic model of Brahmi origin scripts. The phonetic model mainly consists of a model of phonology (including some orthographic features) based on a common alphabet of these scripts, numerical values assigned to these features, a stepped distance function (SDF), and an algorithm for aligning strings of feature vectors. The SDF is used to calculate the phonetic and orthographic similarity of two letters. The model can be used for applications like spell checking, predicting spelling/dialectal variation, text normalization, finding rhyming words, and identifying cognate words across languages. Some initial experiments have been done on this and the results seem encouraging. 1
Normalization of Non-Standard Words: WS '99 Final Report
- Hopkins University
, 1999
"... All areas of language and speech technology must deal, in one way or another, with real text. Real text is messy: many things one nds in text | numbers, abbreviations, dates, currency amounts, acronyms . . . | are not standard words in that one cannot nd their properties by looking them up in a ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
All areas of language and speech technology must deal, in one way or another, with real text. Real text is messy: many things one nds in text | numbers, abbreviations, dates, currency amounts, acronyms . . . | are not standard words in that one cannot nd their properties by looking them up in a dictionary or deriving them morphologically from words that are in a dictionary, nor can one nd their pronunciation by an application of \letter-to-sound" rules. For many applications, such non-standard words | NSW's | need to be normalized, or in other words converted into standard words. Since the correct normalization of a given token often depends upon both the local context and the type (genre) of text one is dealing with, \text-normalization" is in general a very hard problem. Typical technology for text-normalization mostly involves sets of ad hoc rules tuned to handle one or two genres of text (often newspaper-style text), with the expected result that the techniques, do...
A formal computational analysis of indic scripts
- In International Symposium on Indic Scripts: Past and Future
, 2003
"... The Brahmi-derived Indic scripts occupy a special place in the study of writing systems. They are alphasyllabic scripts (Bright, 1996a) (though Daniels (1996) prefers the term abugida), meaning that they are basically segmental in that almost all segments are represented in ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The Brahmi-derived Indic scripts occupy a special place in the study of writing systems. They are alphasyllabic scripts (Bright, 1996a) (though Daniels (1996) prefers the term abugida), meaning that they are basically segmental in that almost all segments are represented in
2007), More Accurate Fuzzy Text Search for Languages Using Abugida Scripts
- Improving Non-English Web Searching (iNEWS07) SIGIR07 Workshop
, 2007
"... Text search is a key step in any kind of information access. For doing it effectively, we can use knowledge about the concerned writing systems. Methods based on such knowledge can give significantly better results for searching text, at least for some languages. This can improve information retriev ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Text search is a key step in any kind of information access. For doing it effectively, we can use knowledge about the concerned writing systems. Methods based on such knowledge can give significantly better results for searching text, at least for some languages. This can improve information retrieval in particular and information access in general. In this paper, we present a method for fuzzy text search for languages which use Abugida scripts, e.g. Hindi, Bengali, Telugu, Amharic, Thai etc. We use characteristics of a writing system for fuzzy search and are able to take care of spelling variation, which is very common in these languages. Our method shows an improvement in F-measure of up to 30% over scaled edit distance.
Hanzi, Concept and Computation: A Preliminary Survey of Chinese Characters as a Knowledge Resource in NLP
"... There are many people to whom I owe a debt of thanks for their support, for the completion of my thesis and supported me in science as well in privacy during this time. First, I would like to sincerely thank my advisor, Prof. Dr Erhard Hin-richs, under whose influence the work here was initiated dur ..."
Abstract
- Add to MetaCart
There are many people to whom I owe a debt of thanks for their support, for the completion of my thesis and supported me in science as well in privacy during this time. First, I would like to sincerely thank my advisor, Prof. Dr Erhard Hin-richs, under whose influence the work here was initiated during my fruit-ful stay in Germany. Without his continuous and invaluable support, this work could not have been completed. I would also like to thank Prof. Dr. Eschbach-Szabo for reading this thesis and offering constructive comments. Besides my advisors, I am deeply grateful to the rest of my thesis commit-tee: Frank Richter and Fritz Hamm, for their kindly support and interesting questions. A special thanks goes to Lothar Lemnitzer, who proofread the thesis carefully and gave insightful comments. I would like to thank my parents for their life-long love and support. Last but not least, I also owe a lot of thanks to my lovely wife Hsiao-Wen, my
Is Isomorphy the Missing Link Between Phonology and Orthography?
, 2002
"... This paper examines the nature of orthography (spelling conventions) and how the different kinds of linguistic information associated with a word, principally its phonological and morphological content, contribute to its written form. A summary is given of various arguments presented in the literatu ..."
Abstract
- Add to MetaCart
This paper examines the nature of orthography (spelling conventions) and how the different kinds of linguistic information associated with a word, principally its phonological and morphological content, contribute to its written form. A summary is given of various arguments presented in the literature as to why certain languages' orthographies do not transparently reflect pronunciation. A specific set of data (French verb inection) is presented, and a theory of how morphological features affect the orthographic structure of the data is developed. A small lexicon is implemented to test these ideas, and the combined use of finite state transduction and default inheritance in the lexicon is discussed.
Phonological Reconstruction of a Dead Language Using the Gradual Learning Algorithm
"... This paper discusses the reconstruction of the Elamite language’s phonology from its orthography using the Gradual Learning Algorithm, which was re-purposed to “learn” underlying phonological forms from surface orthography. Practical issues are raised regarding the difficulty of mapping between orth ..."
Abstract
- Add to MetaCart
This paper discusses the reconstruction of the Elamite language’s phonology from its orthography using the Gradual Learning Algorithm, which was re-purposed to “learn” underlying phonological forms from surface orthography. Practical issues are raised regarding the difficulty of mapping between orthography and phonology, and Optimality Theory’s neglected Lexicon Optimization module is highlighted. 1
Efficient Morphological Parsing with a Weighted Finite State Transducer
, 2008
"... This article describes a highly optimized algorithm and implementation of a deterministic weighted finite state transducer for morphological analysis. We show how various functionalities can be integrated into one machine, without sacrificing performance or flexibility, and and still maintaining app ..."
Abstract
- Add to MetaCart
This article describes a highly optimized algorithm and implementation of a deterministic weighted finite state transducer for morphological analysis. We show how various functionalities can be integrated into one machine, without sacrificing performance or flexibility, and and still maintaining applicability to various languages. The annotation schema used in this implementation maximizes interoperability and compatibility by using a direct mapping of tags from the GOLD ontology of linguistic concepts and features, providing possible extended processing scenarios. Povzetek: Opisana je morfološka analiza za hrvaški jezik. 1

