Results 1 -
5 of
5
Heterogeneous Acoustic Measurement And Multiple Classifiers For Speech Recognition
, 1998
"... The acoustic-phonetic modeling component of most current speech recognition systems calculates a small set of homogeneous frame-based measurements at a single, #xed time-frequency resolution. This thesis presents evidence indicating that recognition performance can be signi#cantly improved through a ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
The acoustic-phonetic modeling component of most current speech recognition systems calculates a small set of homogeneous frame-based measurements at a single, #xed time-frequency resolution. This thesis presents evidence indicating that recognition performance can be signi#cantly improved through a contrasting approach using more detailed and more diverse acoustic measurements, which we refer to as heterogeneous measurements.
The Use of Speaker Correlation Information for Automatic Speech Recognition
, 1998
"... This dissertation addresses the independence of observations assumption whichis typically made by today's automatic speech recognition systems. This assumption ignores within-speaker correlations which are known to exist. The assumption clearly damages the recognition ability of standard speaker in ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
This dissertation addresses the independence of observations assumption whichis typically made by today's automatic speech recognition systems. This assumption ignores within-speaker correlations which are known to exist. The assumption clearly damages the recognition ability of standard speaker independent systems, as can seen by the severe drop in performance exhibited by systems between their speaker dependent mode and their speaker independent mode. The typical solution to this problem is to apply speaker adaptation to the models of the speaker independent system. This approach is examined in this thesis with the explicit goal of improving the rapid adaptation capabilities of the system by incorporating within-speaker correlation information into the adaptation process. This is achieved through the creation of an adaptation technique called referencespeaker weighting and in the development of a speaker clustering technique called speaker cluster weighting. However, speaker adaptation is just one way in which the independence assumption can be attacked. This dissertation also introduces a novel speech recognition technique called consistency modeling. This technique utilizes a priori knowledge about the within-speaker correlations which exist between di#erent phonetic events for the purpose of incorporating speaker constraintinto a speech recognition system without explicitly applying speaker adaptation. These new techniques are implemented within a segment-based speech recognition system and evaluation results are reported on the DARPA Resource Management recognition task.
Techniques for modelling Phonological Processes in Automatic Speech Recognition
, 2001
"... Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does not exceed 29,500 words and includes no more than 40 figures. 1 Systems which automatically transcribe carefully dictated speech are now commercially available, but their performance degrades dramatically when the speaking style of users becomes more relaxed or conversational. This dissertation focuses on techniques that aim to improve the robustness of statistical speech transcription systems to conversational speaking styles. The dissertation shows first that the performance degradation occuring as speech becomes more conversational is severe and is partially attributable to differences in the acoustic realizations of sentences. Hypothesizing that the quantifiably wider range of
Experiments in automatic meeting transcription using JRTK
- In Proceedings of ICASSP'98
"... In this paper we describe our early exploration of automatic recognition of conversational speech in meetings for use in automatic summarizers and browsers to produce meeting minutes effectively and rapidly. To achieve optimal performance we started from two different baseline English recognizers ad ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
In this paper we describe our early exploration of automatic recognition of conversational speech in meetings for use in automatic summarizers and browsers to produce meeting minutes effectively and rapidly. To achieve optimal performance we started from two different baseline English recognizers adapted to meeting conditions and tested resulting performance. The data were found to be highly disfluent (conversational human to human speech), noisy (due to lapel microphones and environment), and overlapped with background noise, resulting in error rates comparable so far to those on the CallHome conversational database (40-50% WER). A meeting browser is presented that allows the user to search and skim through highlights from a meeting efficiently despite the recognition errors. 1.
Speaker-Based Segmentation and Adaptation in Automatic Speech Recognition
, 2007
"... in the projects New adaptive and learning methods in speech recognition and New methods and applications for speech technology. I thank professor Erkki Oja for supervising the thesis. I thank my instructor docent Mikko Kurimo for the opportunity to work in the speech group and for the valuable advic ..."
Abstract
- Add to MetaCart
in the projects New adaptive and learning methods in speech recognition and New methods and applications for speech technology. I thank professor Erkki Oja for supervising the thesis. I thank my instructor docent Mikko Kurimo for the opportunity to work in the speech group and for the valuable advice he has given. This work would not have been possible without the prior work done in the speech group, and thus, I have the current and former speech group members to thank. I would like to take this opportunity to especially thank Janne Pylkkönen, Teemu Hirsimäki and Vesa Siivola who have helped me with all those various problems that I have encountered during my time in the laboratory. Also, Kalle Palomäki is to thank for the time he has taken to read this thesis and for his comments that helped to improve the work. I thank Tommi, and I thank my friends who shared a cup of coffee with me when I needed their kind words for encouragement.

