Results 1 -
4 of
4
Statistical Trajectory Models for Phonetic Recognition
, 1994
"... The main goal of this work is to develop an alternative methodology for acoustic-- phonetic modelling of speech sounds. The approach utilizes a segment--based framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Te ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
The main goal of this work is to develop an alternative methodology for acoustic-- phonetic modelling of speech sounds. The approach utilizes a segment--based framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Temporal behavior is modelled explicitly by creating dynamic tracks of the acoustic attributes used to represent the waveform, and by estimating the spatio--temporal correlation structure of the resulting errors. The tracks serve as templates from which synthetic segments of the acoustic attributes are generated. Scoring of an hypothesized phonetic segment is then based on the error between the measured acoustic attributes and the synthetic segments generated for each phonetic model.
Is N-Best Dead
- In Proceedings of the Human Language Technology Workshop
, 1994
"... We developed a faster search algorithm that avoids the use of the N-Best paradigm until after more powerful knowledge sources have been used. We found, however, that there was little or no decrease in word errors. We then showed that the use of the N-Best paradigm is still essential for the use of s ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
We developed a faster search algorithm that avoids the use of the N-Best paradigm until after more powerful knowledge sources have been used. We found, however, that there was little or no decrease in word errors. We then showed that the use of the N-Best paradigm is still essential for the use of still more powerful knowledge sources, and for several other purposes that are outlined in the paper. 1.
A Public Domain Decoder For Large Vocabulary Conversational Speech Recognition
, 1999
"... The high cost of the infrastructure required to conduct state-of-the-art speech recognition research prevents many small research groups from evaluating new ideas on large-scale tasks. To overcome this barrier, we are developing an Internet-based speechto -text (STT) toolkit. In this paper, we prese ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The high cost of the infrastructure required to conduct state-of-the-art speech recognition research prevents many small research groups from evaluating new ideas on large-scale tasks. To overcome this barrier, we are developing an Internet-based speechto -text (STT) toolkit. In this paper, we present the core component of this system: a decoder that uses a one-pass time-synchronous Viterbi-based search algorithm called trace projection. This decoder can support efficient lattice rescoring using cross-word triphones, lexical trees and n-gram grammars. The decoder performance in terms of CPU and memory usage is on par with commercial systems of its kind. Preliminary evaluations on the SWITCHBOARD (SWB) corpus have yielded a word error rate of 39%. 1. INTRODUCTION A speech-to-text (STT) system conceptually consists of three subsystems --- an acoustic processor which converts the speech signal into a sequence of feature vectors modeled using Hidden Markov Models (HMMs); a linguistic proc...
Automatic Speech Recognition of Native Dutch Speakers with Different Age and Gender
, 2007
"... Humans are capable of estimating speaker ages by only hearing them speak. It is also well known from the field of phonetics that speaker age influences the speech signal. This has however not yet been researched for the Dutch language. In this research, the influence of age on speech is researched f ..."
Abstract
- Add to MetaCart
Humans are capable of estimating speaker ages by only hearing them speak. It is also well known from the field of phonetics that speaker age influences the speech signal. This has however not yet been researched for the Dutch language. In this research, the influence of age on speech is researched for both genders separately and compared with the gender differences, using Perceptual Minimum Variance Distortionless Response features. The influences of age are minimal for these features but greater than the differences between speech from different genders. Different spectral features are influenced for different phonemes. It seems unlikely that adapting speech recognizers using Perceptual Minimum Variance Distortionless Response features will lead to much improvement. Furthermore, this thesis describes the process of creating a Dutch automated speech recognition system, using the Sonic large vocabulary continuous speech recognition system as a basis. The system achieves a recognition rate of 64.6% on the broadcast news task from the N-Best project. The porting process is

