Results 1 -
4 of
4
Phoneme confusions in human and automatic speech recognition
- in Proc. Interspeech
, 2007
"... A comparison between automatic speech recognition (ASR) and human speech recognition (HSR) is performed as prerequisite for identifying sources of errors and improving feature extraction in ASR. HSR and ASR experiments are carried out with the same logatome database which consists of nonsense syllab ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A comparison between automatic speech recognition (ASR) and human speech recognition (HSR) is performed as prerequisite for identifying sources of errors and improving feature extraction in ASR. HSR and ASR experiments are carried out with the same logatome database which consists of nonsense syllables. Two different kinds of signals are presented to human listeners: First, noisy speech samples are converted to Mel-frequency cepstral coefficients which are resynthesized to speech, with information about voicing and fundamental frequency being discarded. Second, the original signals with added noise are presented, which is used to evaluate the loss of information caused by the process of resynthesis. The analysis also covers the degradation of ASR caused by dialect or accent and shows that different error patterns emerge for ASR and HSR. The information loss induced by the calculation of ASR features has the same effect as a deteriation of the SNR by 10 dB. Index Terms: human speech recognition, automatic speech recognition, dialect, accent, phoneme confusions, MFCC
Voorwoord
"... Een eindwerk schrijven is niet gemakkelijk. Iedereen die er ooit één heeft moeten maken, zal dit in zekere mate kunnen bevestigen. De afgelopen maanden heb ik menige namiddag doorgebracht met processen opstarten, resultaten interpreteren, tekst schrijven en herschrijven,... Het is dan ook met een ge ..."
Abstract
- Add to MetaCart
Een eindwerk schrijven is niet gemakkelijk. Iedereen die er ooit één heeft moeten maken, zal dit in zekere mate kunnen bevestigen. De afgelopen maanden heb ik menige namiddag doorgebracht met processen opstarten, resultaten interpreteren, tekst schrijven en herschrijven,... Het is dan ook met een gepast gevoel van opluchting dat ik ten slotte de laatste loodjes leg in de vorm van dit voorwoord. Het resultaat van al mijn inspanningen is het eindwerk dat momenteel voor u ligt. Ik hoop dat het voor de lezer even leerzaam zal zijn als het voor mij is geweest. Nu dit eindwerk en, inderdaad, ook mijn tijd als student stilaan ten einde loopt, lijkt dit mij een gepast moment om achterom te kijken en een klein bedankje te sturen in de richting van degenen die mij in de loop van de tijd gesteund hebben. Om te beginnen een dikke merci aan mijn ouders voor hun onbuigzame morele steun op de moeilijkere momenten van het student-zijn. Een dikke merci ook voor de rest van mijn familie en ook voor mijn vrienden hier in Leuven, waarmee het altijd plezant feesten was. En ten slotte nog voor alle mensen die ik hier vergeten ben te noemen: ook een dikke merci.
WebVoice: A Toolkit for Perceptual Insights into Speech Processing
"... Feature extraction and modeling techniques for speech processing are often complex. Understanding a new technique theoretically can be difficult for a novice, just as it is difficult for a practitioner to find the best parameter settings and/or combination of methods for a new task or data. In this ..."
Abstract
- Add to MetaCart
Feature extraction and modeling techniques for speech processing are often complex. Understanding a new technique theoretically can be difficult for a novice, just as it is difficult for a practitioner to find the best parameter settings and/or combination of methods for a new task or data. In this paper, a novel approach and a corresponding software toolkit for facilitating both education and experimentation in speech processing is presented: listening to the results of feature extraction and modeling is made possible via resynthesis of intermediate pattern recognition results. The software is made publicly available as a web service called WebVoice with accompanying user interfaces for ease of use. 1.
Unfolding Speaker Clustering Potential: A Biomimetic Approach
"... Speaker clustering is the task of grouping a set of speech utterances into speaker-specific classes. The basic techniques for solving this task are similar to those used for speaker verification and identification. The hypothesis of this paper is that the techniques originally developed for speaker ..."
Abstract
- Add to MetaCart
Speaker clustering is the task of grouping a set of speech utterances into speaker-specific classes. The basic techniques for solving this task are similar to those used for speaker verification and identification. The hypothesis of this paper is that the techniques originally developed for speaker verification and identification are not sufficiently discriminative for speaker clustering. However, the processing chain for speaker clustering is quite large – there are many potential areas for improvement. The question is: where should improvements be made to improve the final result? To answer this question, this paper takes a biomimetic approach based on a study with human participants acting as an automatic speaker clustering system. Our findings are twofold: it is the stage of modeling that has the highest potential, and information with respect to the temporal succession of frames is crucially missing. Experimental results with our implementation of a speaker clustering system incorporating our findings and applying it on TIMIT data show the validity of our approach.

