Results 1 -
4 of
4
Empirical properties of multilingual phone-to-word transduction,” in
, 2007
"... This paper explores the error-robustness of phone-to-word transduction across a variety of languages. We implement a noisy channel model in which a phonetic input stream is corrupted by an error model, and then transduced back to words using the inverse error model and linguistic constraints. By con ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper explores the error-robustness of phone-to-word transduction across a variety of languages. We implement a noisy channel model in which a phonetic input stream is corrupted by an error model, and then transduced back to words using the inverse error model and linguistic constraints. By controlling the error level, we are able to measure the sensitivity of different languages to degradation in the phonetic input stream. This analysis is carried further to measure the importance of each phone in each language individually. We study Arabic, Chinese, English, German and Spanish, and find that they behave similarly in this paradigm: in each case, a phone error produces about 1.4 word errors, and frequently incorrect phones matter slightly less than others. In the absence of phone errors, transduced word errors are still present, and we use the conditional entropy of words given phones to explain the observed behavior. Index Terms — Speech recognition, phonetic decoding, transduction, multilingual, ASR
Evaluation of Phone Lattice Based Speech Decoding
"... Previously, we proposed a flexible two-layered speech recogniser architecture, called FLaVoR. In the first layer an unconstrained, task independent phone recogniser generates a phone lattice. Only in the second layer the task specific lexicon and language model are applied to decode the phone lattic ..."
Abstract
- Add to MetaCart
Previously, we proposed a flexible two-layered speech recogniser architecture, called FLaVoR. In the first layer an unconstrained, task independent phone recogniser generates a phone lattice. Only in the second layer the task specific lexicon and language model are applied to decode the phone lattice and produce a word level recognition result. In this paper, we present a further evaluation of the FLaVoR architecture. The performance of a classical single-layered architecture and the FLaVoR architecture are compared on two recognition tasks, using the same acoustic, lexical and language models. On the large vocabulary Wall Street Journal 5k and 20k benchmark tasks, the two-layered architecture resulted in slightly but not significantly better word error rates. On a reading error detection task for a reading tutor for children, the FLaVoR architecture clearly outperformed the single-layered architecture. Index Terms: ASR architecture, phone lattice decoding, system assessment 1.
Automatic Assessment of Children’s Reading Level
"... In this paper, an automatic system for the assessment of reading in children is described and evaluated. The assessment is based on a reading test with 40 words, presented one by one to the child by means of a computerized reading tutor. The score that expresses the child’s reading performance is ca ..."
Abstract
- Add to MetaCart
In this paper, an automatic system for the assessment of reading in children is described and evaluated. The assessment is based on a reading test with 40 words, presented one by one to the child by means of a computerized reading tutor. The score that expresses the child’s reading performance is calculated as the total time needed to read the 40 words divided by the number of correctly read words. In each grade, children are classified in 5 groups based on their score as provided by human annotators. We show that when the score for a child is assessed automatically using a speech recognizer, a classification can be obtained with a substantial agreement (Cohen’s Kappa over 0.6) with the human classification. As all children in the experiments were classified either correctly or in an adjoining group, we can conclude that the proposed system can provide large time gains in current manual classification procedures. Index Terms: computer aided language learning, reading assessment, ASR for children.
Automatic Voice Onset Time Estimation from Reassignment Spectra
"... We describe an algorithm to automatically estimate the voice onset time (VOT) of plosives. The VOT is the time delay between the burst onset and the start of periodicity when it is followed by a voiced sound. Since the VOT is affected by factors like place of articulation and voicing it can be used ..."
Abstract
- Add to MetaCart
We describe an algorithm to automatically estimate the voice onset time (VOT) of plosives. The VOT is the time delay between the burst onset and the start of periodicity when it is followed by a voiced sound. Since the VOT is affected by factors like place of articulation and voicing it can be used for inference of these factors. The algorithm uses the reassignment spectrum of the speech signal, a high resolution time-frequency representation which simplifies the detection of the acoustic events in a plosive. The performance of our algorithm is evaluated on a subset of the TIMIT database by comparison with manual VOT measurements. On average, the difference is smaller than 10 ms for 76.1 % and smaller than 20 ms for 91.4 % of the plosive segments. We also provide analysis statistics of the VOT of /b/, /d/, /g/, /p/, /t / and /k / and experimentally verify some sources of variability. Finally, to illustrate possible applications, we integrate the automatic VOT estimates as an additional feature in an HMM-based speech recognition system and show a small but statistically significant improvement in phone recognition rate.

