Results 1  10
of
18
Automatic Person Verification Using Speech and Face Information
, 2003
"... Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one pre ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. This scheme suffers from a major drawback: only the validity of the combination of a certain possession (the ATM card) and certain knowledge (the password) is verified. The ATM card can be lost or stolen, and the password can be compromised. Thus new verification methods have emerged, where the password has either been replaced by, or used in addition to, biometrics such as the person's speech, face image or fingerprints. Apart from the ATM example described above, biometrics can be applied to other areas, such as telephone & internet based banking, airline reservations & checkin, as well as forensic work and law enforcement applications. Biometric systems
On adaptive decision rules and decision parameter adaptation for automatic speech recognition
 Proc. IEEE
, 2000
"... Recent advances in automatic speech recognition are accomplished by designing a plugin maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
Recent advances in automatic speech recognition are accomplished by designing a plugin maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximumlikelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for highperformance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine
OnLine Adaptive Learning Of The Correlated Continuous Density Hidden Markov Models For Speech Recognition
 IEEE Trans. on Speech and Audio Processing
"... We extend our previously proposed quasiBayes adaptive learning framework to cope with the correlated continuous density hidden Markov models with Gaussian mixture state observation densities in which all mean vectors are assumed to be correlated and have a joint prior distribution. A successive app ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
We extend our previously proposed quasiBayes adaptive learning framework to cope with the correlated continuous density hidden Markov models with Gaussian mixture state observation densities in which all mean vectors are assumed to be correlated and have a joint prior distribution. A successive approximation algorithm is proposed to implement the correlated mean vectors' updating. As an example, by applying the method to online speaker adaptation application, the algorithm is experimentally shown to be asymptotic convergent as well as being able to enhance the efficiency and the effectiveness of the Bayes learning by taking into account the correlation information between different models. The technique can be used to cope with the timevarying nature of some acoustic and environmental variabilities, including mismatches caused by changing speakers, channels, transducers, environments and so on.
Using SelfOrganizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models
, 1997
"... This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the col ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the collected speech samples is a difficult task because of the natural variation in the speech. Two neural computing paradigms, the SelfOrganizing Map (SOM) and the Learning Vector Quantization (LVQ) are used in the experiments to improve the recognition performance of the models. A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vect...
Online Bayesian Estimation of Transition Probabilities for Markovian Jump Systems
 IEEE TRANSACTIONS ON SIGNAL PROCESSING
, 2004
"... Markovian jump systems (MJSs) evolve in a jumpwise manner by switching among simpler models, according to a finite Markov chain, whose parameters are commonly assumed known. This paper addresses the problem of state estimation of MJS with unknown transition probability matrix (TPM) of the embedded ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Markovian jump systems (MJSs) evolve in a jumpwise manner by switching among simpler models, according to a finite Markov chain, whose parameters are commonly assumed known. This paper addresses the problem of state estimation of MJS with unknown transition probability matrix (TPM) of the embedded Markov chain governing the jumps. Under the assumption of a timeinvariant but random TPM, an approximate recursion for the TPMs posterior probability density function (PDF) within the Bayesian framework is obtained. Based on this recursion, four algorithms for online minimum meansquare error (MMSE) estimation of the TPM are derived. The first algorithm (for the case of a twostate Markov chain) computes the MMSE estimate exactly, if the likelihood of the TPM is linear in the transition probabilities. Its computational load is, however, increasing with the data length. To limit the computational cost, three alternative algorithms are further developed based on different approximation techniquestruncation of high order moments, quasiBayesian approximation, and numerical integration, respectively. The proposed
Temporal and spatial data mining with secondorder hidden markov models
 Sigayret (eds), Fourth International Conference on Knowledge Discovery and Discrete Mathematics  Journées de l’informatique Messine  JIM’2003
, 2003
"... In the frame of designing a knowledge discovery system, we have developed stochastic models based on highorder hidden Markov models. These models are capable to map sequences of data into a Markov chain in which the transitions between the states depend on the n previous states according to the ord ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
In the frame of designing a knowledge discovery system, we have developed stochastic models based on highorder hidden Markov models. These models are capable to map sequences of data into a Markov chain in which the transitions between the states depend on the n previous states according to the order of the model. We study the process of achieving information extraction from spatial and temporal data by means of an unsupervised classification. We use therefore a French national database related to the land use of a region, namedTer Uti, which describes the land use both in the spatial and temporal domain. Landuse categories (wheat, corn, forest,...) are logged every year on each site regularly spaced in the region. They constitute a temporal sequence of images in which we look for spatial and temporal dependencies. The temporal segmentation of the data is done by means of a secondorder Hidden Markov Model (HMM2) that appears to have very good capabilities to locate stationary segments, as shown in our previous work in speech recognition. The spatial classification is performed by defining a fractal scanning of the images with the help of a HilbertPeano curve that introduces a total order on the sites, preserving the relation of neighborhood between the sites. We show that the HMM2 performs a classification that is meaningful for the agronomists. Spatial and temporal classification may be achieved simultaneously by means of a 2 levels HMM2 that measures the a posteriori probability to map a temporal sequence of images onto a set of hidden classes.
Enhancements to TransformationBased Speaker Adaptation: Principal Component and InterClass Maximum Likelihood Linear Regression
, 2000
"... iii Abstract In this thesis we improve speech recognition accuracy by obtaining better estimation of linear transformation functions with a small amount of adaptation data in speaker adaptation. The major contributions of this thesis are the developments of two new adaptation algorithms to improve ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
iii Abstract In this thesis we improve speech recognition accuracy by obtaining better estimation of linear transformation functions with a small amount of adaptation data in speaker adaptation. The major contributions of this thesis are the developments of two new adaptation algorithms to improve maximum likelihood linear regression. The first one is called principal component MLLR (PCMLLR), and it reduces the variance of the estimate of the MLLR matrix using principal component analysis. The second one is called interclass MLLR, and it utilizes relationships among different transformation functions to achieve more reliable estimates of MLLR parameters across multiple classes. The main idea of PCMLLR is that if we estimate the MLLR matrix in the eigendomain, the variances of the components of the estimates are inversely proportional to their eigenvalues. Therefore we can select more reliable components to reduce the variances of the resulting estimates and to improve speech recognition accuracy. PCMLLR eliminates highly variable components and chooses the principal components corresponding to the largest eigenvalues. If all the component are used, PCMLLR becomes the same as conventional MLLR. Choosing fewer principal components increases the bias of the estimates which can reduce recognition accuracy. To compensate for this problem, we developed weighted principal component MLLR (WPCMLLR). Instead of eliminating some of the components, all the components in WPCMLLR are used after applying weights that minimize the mean square error. The component corresponding to a larger eigenvalue has a larger weight than the component corresponding to a smaller eigenvalue. As more adaptation data become available, the benefits from these methods may become smaller because ...
Use of model transformations for distributed speech recognition
 in ISCA ITRWorkshop 2001 (Adaptation Methods for Speech Recognition), (SophiaAntipolis
, 2001
"... Due to bandwidth limitations, the speech recognizer in distributed speech recognition (DSR) applications has to use encoded speech – either traditional speech encoding or speech encoding optimized for recognition. The penalty incurred in reducing the bitrate is degradation in speech recognition perf ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Due to bandwidth limitations, the speech recognizer in distributed speech recognition (DSR) applications has to use encoded speech – either traditional speech encoding or speech encoding optimized for recognition. The penalty incurred in reducing the bitrate is degradation in speech recognition performance. The diversity of the applications using DSR implies that a variety of speech encoders can be used to compress speech. By treating the encoder variability as a mismatch we propose using model transformation to reduce the speech recognition performance degradation. The advantage of using model transformation is that only a single model set needs to be trained at the server, which can be adapted on the fly to the input speech data. We were able to reduce the word error rate by 61.9 %, 63.3 % and 56.3 % for MELP, GSM and MFCCencoded data, respectively, by using MAP adaptation, which shows the generality of our proposed scheme. 1.
Crosslingual adaptation of semicontinuous HMMs using acoustic regression classes and subsimplex projection
 COST278 and ISCA Tutorial and Research Workshop (ITRW) On Applied Spoken Language Interaction in Distributed Environments
"... With the demand on providing automatic speech recognition (ASR) systems for many markets the question of porting an ASR system to a new language is of practical interest. Transferring already existing hidden Markov models (HMM) from a source to the target language is seen as a key step to cope with ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
With the demand on providing automatic speech recognition (ASR) systems for many markets the question of porting an ASR system to a new language is of practical interest. Transferring already existing hidden Markov models (HMM) from a source to the target language is seen as a key step to cope with this task. Typically, such a crosslingual model adaptation task consists of a three step procedure. It starts by polyphone decision tree specialisation (PDTS), specialising the phoneticacoustic decision tree of the source models to the target language. In a second step initial target language models are predicted out of the adjusted decision tree. Finally, the predicted acoustic models are adapted to the target language using a limited amount of target data. In this work we focus on the final model adaptation step in the case of a system architecture employing semicontinuous HMMs (SCHMM). In contrast to continuous density HMMs (CDHMM), adaptation techniques for SCHMMs are not as well developed. In particular, no powerful transformation based adaptation method for adjusting the information bearing mixture weights of the common prototype densities is onhand. To overcome this problem we introduce a novel adaptation scheme for SCHMM. The method relies on the projection of retrained model parameters to a solution subsimplex which is obtained through acoustic regression classes derived from the decision tree of the source models. The performance of the procedure is demonstrated by the transfer of multilingual SpanishEnglishGerman models to Slovenian and to French. In the full paper, reference results for a standard maximum likelihood linear regression (MLLR) approach are given too. 1.
Bayesian Adaptation of Speech Recognizers to Field Speech Data
 In: Proc. ICSLP'96
, 1996
"... This work studies a Bayesian (or Maximum A Posteriori MAP) approach to the adaptation of Continuous Density Hidden Markov Models (CDHMMs) to a specific condition of a speech recognition application. In order to improve the model robustness, CDHMMs formerly trained from laboratory data are then adapt ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This work studies a Bayesian (or Maximum A Posteriori MAP) approach to the adaptation of Continuous Density Hidden Markov Models (CDHMMs) to a specific condition of a speech recognition application. In order to improve the model robustness, CDHMMs formerly trained from laboratory data are then adapted using context dependent field utterances. Two specific problems have to be faced when using the MAP approach: the estimation of the a priori distribution parameters and the lack of field adaptation data for some distributions of the CDHMM.