Results 1 -
6 of
6
Developing an automatic assessment tool for children’s oral reading
- in Proc. ICSLP
, 2006
"... Automation of oral reading assessment and of feedback in a reading tutor is a very challenging task. This paper describes our research aiming at developing such automated systems. First topic is the recording and annotation of CHOREC, the Flemish database of children’s oral reading we develop in ord ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Automation of oral reading assessment and of feedback in a reading tutor is a very challenging task. This paper describes our research aiming at developing such automated systems. First topic is the recording and annotation of CHOREC, the Flemish database of children’s oral reading we develop in order to characterize oral reading processes statistically. Next, we propose a classification of both oral reading strategies and errors, which provides the basis of the envisaged assessment and feedback. Finally, experimental results show that our two-layered recognition system is able to provide high reading miscue detection rates, while only few correctly read words are erroneously tagged as miscue. Index Terms: reading assessment, database annotation, speech technology, education.
Recording Speech of Children, Non-Natives and Elderly People for HLT Applications: the JASMIN-CGN Corpus
"... Within the framework of the Dutch-Flemish programme STEVIN, the JASMIN-CGN (Jongeren, Anderstaligen en Senioren in Mens-machine Interactie – Corpus Gesproken Nederlands) project was carried out, which was aimed at collecting speech of children, non-natives and elderly people. The JASMIN-CGN project ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Within the framework of the Dutch-Flemish programme STEVIN, the JASMIN-CGN (Jongeren, Anderstaligen en Senioren in Mens-machine Interactie – Corpus Gesproken Nederlands) project was carried out, which was aimed at collecting speech of children, non-natives and elderly people. The JASMIN-CGN project is an extension of the Spoken Dutch Corpus (CGN) along three dimensions. First, by collecting a corpus of contemporary Dutch as spoken by children of different age groups, elderly people and non-natives with different mother tongues, an extension along the age and mother tongue dimensions was achieved. In addition, we collected speech material in a communication setting that was not envisaged in the CGN: human-machine interaction. One third of the data was collected in Flanders and two thirds in the Netherlands. In this paper we report on our experiences in collecting this corpus and we describe some of the important decisions that we made in the attempt to combine efficiency and high quality.
Evaluation of Phone Lattice Based Speech Decoding
"... Previously, we proposed a flexible two-layered speech recogniser architecture, called FLaVoR. In the first layer an unconstrained, task independent phone recogniser generates a phone lattice. Only in the second layer the task specific lexicon and language model are applied to decode the phone lattic ..."
Abstract
- Add to MetaCart
Previously, we proposed a flexible two-layered speech recogniser architecture, called FLaVoR. In the first layer an unconstrained, task independent phone recogniser generates a phone lattice. Only in the second layer the task specific lexicon and language model are applied to decode the phone lattice and produce a word level recognition result. In this paper, we present a further evaluation of the FLaVoR architecture. The performance of a classical single-layered architecture and the FLaVoR architecture are compared on two recognition tasks, using the same acoustic, lexical and language models. On the large vocabulary Wall Street Journal 5k and 20k benchmark tasks, the two-layered architecture resulted in slightly but not significantly better word error rates. On a reading error detection task for a reading tutor for children, the FLaVoR architecture clearly outperformed the single-layered architecture. Index Terms: ASR architecture, phone lattice decoding, system assessment 1.
FAST SPEAKER ADAPTATION USING NON-NEGATIVE MATRIX FACTORIZATION
"... This paper describes a new method for fast speaker adaptation in large vocabulary recognition systems. As in most HMM-based recognizers, the observation densities are modeled as a weighted sum of Gaussian densities. Instead of adapting the means of the Gaussian densities, which is typically done, th ..."
Abstract
- Add to MetaCart
This paper describes a new method for fast speaker adaptation in large vocabulary recognition systems. As in most HMM-based recognizers, the observation densities are modeled as a weighted sum of Gaussian densities. Instead of adapting the means of the Gaussian densities, which is typically done, the weights for the Gaussian densities in the states are adapted. By applying non-negative matrix factorization (NMF) in the proposed method, very fast adaptation was achieved. Experiments on the Wall Street Journal benchmark recognition task show relative improvements between 5 % and 15%, while the adaptation converges within 0.2 seconds. Analysis of the latent speakers found by NMF learns that these latent speakers reflect the gender of the speaker most prominently, even when vocal tract length normalization is used, and that they reflect the speaker’s age more clearly than the speaker’s regional influences or dialect. Index Terms — Speech recognition, adaptive systems, speaker adaptation, matrix decomposition, non-negative matrix factorization. feature2 feature2 feature2 orig. feature2 spec. orig. feature2 feature1 feature1 feature1 feature2 feature2 feature1 feature1 feature1 feature2 feature2 move+spec. move orig. feature1 feature1 feature1 Fig. 1. Types of adaption in acoustic modeling 1.
Automatic Assessment of Children’s Reading Level
"... In this paper, an automatic system for the assessment of reading in children is described and evaluated. The assessment is based on a reading test with 40 words, presented one by one to the child by means of a computerized reading tutor. The score that expresses the child’s reading performance is ca ..."
Abstract
- Add to MetaCart
In this paper, an automatic system for the assessment of reading in children is described and evaluated. The assessment is based on a reading test with 40 words, presented one by one to the child by means of a computerized reading tutor. The score that expresses the child’s reading performance is calculated as the total time needed to read the 40 words divided by the number of correctly read words. In each grade, children are classified in 5 groups based on their score as provided by human annotators. We show that when the score for a child is assessed automatically using a speech recognizer, a classification can be obtained with a substantial agreement (Cohen’s Kappa over 0.6) with the human classification. As all children in the experiments were classified either correctly or in an adjoining group, we can conclude that the proposed system can provide large time gains in current manual classification procedures. Index Terms: computer aided language learning, reading assessment, ASR for children.
Speaker normalization for template based speech recognition
"... Vocal Tract Length Normalization (VTLN) has been shown to be an efficient speaker normalization tool for HMM based systems. In this paper we show that it is equally efficient for a template based recognition system. Template based systems, while promising, have as potential drawback that templates m ..."
Abstract
- Add to MetaCart
Vocal Tract Length Normalization (VTLN) has been shown to be an efficient speaker normalization tool for HMM based systems. In this paper we show that it is equally efficient for a template based recognition system. Template based systems, while promising, have as potential drawback that templates maintain all non phonetic details apart from the essential phonemic properties; i.e. they retain information on speaker and acoustic recording circumstances. This may lead to a very inefficient usage of the database. We show that after VTLN significantly more speakers- also from opposite gender- contribute templates to the matching sequence compared to the non-normalized case. In experiments on the Wall Street Journal database this leads to a relative word error rate reduction of 10%. Index Terms: template based speech recognition, speaker normalization,

