#### DMCA

## MANUSCRIPT, SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Uncertain LDA: Including observation uncertainties in discriminative transforms

### Citations

5875 | A tutorial on hidden Markov models and selected applications in speech recognition
- Rabiner
- 1989
(Show Context)
Citation Context ...rices are treated as between-class and within-class scatters. 5 EXPERIMENTS FOR AUTOMATIC SPEECH RECOGNITION For automatic speech recognition, hidden Markov models have proven to be highly successful =-=[38]-=-, [39]. In HMMbased speech recognition, all relevant phonetic or subphonetic units are described by left-to-right Markov models, which describe the evolution of acoustic observations within the phonet... |

3777 |
Introduction to Statistical Pattern Recognition.
- Fukunaga
- 1990
(Show Context)
Citation Context ...he classes in a dataset, the selection of less than K − 1 dimensions in LDA for data projection does not guarantee to preserve the distance between classes from a classification perspective for K > 2 =-=[18]-=-. The application of the direct distance matrix (DDM) as a generalization of the between-class scatter matrix has been suggested to address this issue [19]. In LDA, the distance function used for obta... |

1568 |
The use of multiple measurements in taxonomic problems
- Fisher
- 1936
(Show Context)
Citation Context ...rms its conventional LDA counterpart. 1 INTRODUCTION L INEAR discriminant analysis (LDA) is one of thesimplest and most used transforms to enhance class separability for multidimensional observations =-=[1]-=-, [2]. Conventional LDA assumes that each class follows a normal distribution and classes share the same covariance structure (are homoscedastic) [3]. Although these assumptions do not generally hold ... |

1010 | Speaker verification using adapted gaussian mixture models. Digital signal processing
- Reynolds, Quatieri, et al.
- 2000
(Show Context)
Citation Context ...linear discriminant analysis (PLDA) [36] to obtain a likelihood ratio in comparing an enrolment and test utterance. The acoustic feature distribution is captured by a universal background model (UBM) =-=[52]-=- and the subspace modeling techniques developed in the joint factor analysis approach [60] are utilized. A schematic block diagram of the state-of-theart speaker recognition system as used for experim... |

640 |
RASTA processing of speech
- Hermansky, Morgan
- 1994
(Show Context)
Citation Context ...rs [51], [71]. The speech activity detector in [72] is employed to discard nonspeech frames and Quantile-based cepstral dynamics normalization [73] with low-pass temporal filtering adopted from RASTA =-=[74]-=- is applied on the final features. A gender-dependent universal background model (UBM) with 1024 components is trained using a subset of NIST SRE 2004–2006, Switchboard I and II, Switchboard cellular ... |

465 | Regularized discriminant analysis.
- Friedman
- 1989
(Show Context)
Citation Context ...mes unstable, a problem known as small sample size [8]. Dealing with ill-posed covariance estimation in finding discriminant directions has been addressed as a challenging problem [9]. Regularization =-=[10]-=- and Bayesian estimation [11] of covariance models have been discussed in the literature to overcome this issue. It is also possible to obtain non-linear class separation using subclass discriminant a... |

335 | Generalized Discriminant Analysis Using a Kernel Approach
- Baudat, Anouar
- 2000
(Show Context)
Citation Context ...ariance models have been discussed in the literature to overcome this issue. It is also possible to obtain non-linear class separation using subclass discriminant analysis and the kernel trick in LDA =-=[12]-=-. When each class is composed of several partitions, subclass discriminant analysis [13] aims to maximize the distance between class means and the subclass means in the same class at the same time. Co... |

315 | Front-End Factor Analysis For Speaker Verification
- Dehak, Kenny, et al.
- 2010
(Show Context)
Citation Context ...rcularly symmetric complex Gaussian distributed, it is possible to use for example Wiener filtering to arrive at an uncertain spectral representation [30], [31]. In modern speaker recognition systems =-=[32]-=-, each speaker is represented by a so-called i-vector, which is the mean of the a posteriori distribution considering the available speech material for a given speaker. Following the same principle as... |

262 | Semi-tied covariance matrices for hidden Markov models,” - Gales - 1999 |

214 | Discriminant analysis by gaussian mixtures
- Hastie, Tibshirani
- 1996
(Show Context)
Citation Context ...iminant analysis to unequal covariance matrices and non-normal distributions for classes leads to heteroscedastic LDA [14], quadratic discriminant analysis [2], [15] and mixture discriminant analysis =-=[16]-=-. A distance preserving dimensionality reduction transform maps the D-dimensional data samples to a ddimensional space (d < D) subject to the constraint that nearby data samples are mapped to nearby l... |

207 |
Small sample size effects in statistical pattern recognition: Recommendations for practitioners.
- Raudys, Jain
- 1991
(Show Context)
Citation Context ...is with the Institute of Communication Acoustics, RuhrUniversität Bochum, Germany (email: dorothea.kolossa@rub.de). sample covariance estimation becomes unstable, a problem known as small sample size =-=[8]-=-. Dealing with ill-posed covariance estimation in finding discriminant directions has been addressed as a challenging problem [9]. Regularization [10] and Bayesian estimation [11] of covariance models... |

156 | An Overview of Text-independent Speaker Recognition: From Features to Supervectors
- Kinnunen, Li
- 2010
(Show Context)
Citation Context ...UTOMATIC SPEAKER VERIFICATION Automatic speaker verification, the task of accepting or rejecting an identity claim given an utterance of a speaker, has received lots of attention in the last 20 years =-=[55]-=-. One of the main reasons is the support of the National Institute of Standards and Technology (NIST) by organizing a series of benchmarks, the speaker recognition evaluations (SREs) [56] starting in ... |

131 | Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition, - Kumar, Andreou - 1998 |

121 | Probabilistic Linear Discriminant Analysis for Inferences About Identity
- Prince
(Show Context)
Citation Context ...ikelihood of Gaussian classes and is equivalent to linear regression using the class labels. Probabilistic LDA (PLDA) is a general method that can accomplish a wide variety of recognition tasks [35], =-=[36]-=-. PLDA is formulated by a generative model, where the j-th observation from i-th class is expressed as xij = m + Fhi + Gcij + ij . (15) In Equation 15, m stands for the expected value of all data. Th... |

113 | I.: Channel Compensation for SVM Speaker Recognition
- Solomonoff, Quillen, et al.
- 2004
(Show Context)
Citation Context ...ognition task [57], [58]. Most of the research in the speaker recognition area was devoted to finding robust modeling techniques, capable of handling channel and inter-session variability [32], [53], =-=[59]-=-. The state-of-the-art method is now using a low-rank vector, the so-called i-vector, to represent an utterance based on total variability subspace modeling [32] and probabilistic linear discriminant ... |

108 |
Linear discriminant analysis for improved large vocabulary continuous speech recognition,”
- Haeb-Umbach, Ney
- 1992
(Show Context)
Citation Context ... generally hold in practice, this conventional approach, briefly reviewed in Section 2, and its variants have been found useful in many applications including automatic speech and speaker recognition =-=[4]-=-– [7]. Dimensionality reduction is a usual pre-processing stage to make the input data more suitable for the modeling stage. Though statistical modeling techniques, like Gaussian mixture models, are q... |

106 | Muliclass linear dimension reduction by weighted pairwise fisher criteria.
- Loog, Duin, et al.
- 2001
(Show Context)
Citation Context ... from a classification perspective for K > 2 [18]. The application of the direct distance matrix (DDM) as a generalization of the between-class scatter matrix has been suggested to address this issue =-=[19]-=-. In LDA, the distance function used for obtaining the DDM could be chosen as the Chernoff distance [20] or as its multi-class generalization, Matusita’s separability measure [21], [22]. In this paper... |

104 |
Espy-Wilson, "Analysis of I-vector Length Normalization in Speaker Recognition Systems
- Garcia-Romero, Y
- 2011
(Show Context)
Citation Context ...Signal Claimed Identity Feature Extraction [51] UBM [52] Factorize GMM meansupervectors [32] Variable-length utterance→ Fixed-length, low-rank i-vector LDA/ULDA CenteringWCCN [53]Lenght Normalization =-=[54]-=-PLDA [36]recognition score i-vector Uncertainty of i-vector Fig. 5: Schematic block diagram of a typical state-of-the-art speaker verification system [51], [55]. Abbreviations stand for; UBM: universa... |

95 | Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion,
- Loog, Duin
- 2004
(Show Context)
Citation Context ...ted to aid in modeling the classes more effectively. The extension of linear discriminant analysis to unequal covariance matrices and non-normal distributions for classes leads to heteroscedastic LDA =-=[14]-=-, quadratic discriminant analysis [2], [15] and mixture discriminant analysis [16]. A distance preserving dimensionality reduction transform maps the D-dimensional data samples to a ddimensional space... |

95 |
The HTK Book (for HTK Version 3.4
- Young, Evermann, et al.
- 2006
(Show Context)
Citation Context ...is uncertainty is then transformed into the feature domain using STFT uncertainty propagation (STFT-UP) [25], yielding the desired uncertain feature description. The hidden Markov model toolkit (HTK) =-=[48]-=- is used to implement the ASR experiments. The training and test scripts were provided for the CHiME challenge [46], and these were also used for the presented experiments. Multiconditional training [... |

87 |
The utilization of multiple measurements in problems of biological classification.
- Rao
- 1948
(Show Context)
Citation Context ...ass separability for multidimensional observations [1], [2]. Conventional LDA assumes that each class follows a normal distribution and classes share the same covariance structure (are homoscedastic) =-=[3]-=-. Although these assumptions do not generally hold in practice, this conventional approach, briefly reviewed in Section 2, and its variants have been found useful in many applications including automa... |

84 | Joint Factor Analysis of Speaker and Session Variability : Theory and Algorithms
- Kenny
- 2005
(Show Context)
Citation Context ...e can explain the UBM as ∑M m=1 wmN (xl;µm,Σm). Let’s show the stacked µms as the mean supervector u and form Σ̃ as a block diagonal matrix with Σms as its entries. The factor analysis approach [32], =-=[68]-=- represents an utterance with respective acoustic features denoted by X by a location in high-dimensional space of u + Tφ. The rectangular matrix T characterize the low-rank subspace including inter- ... |

81 | Within-Glass Covariance Normalization for SVM-Based Speaker Recognition
- Hatch, Kajarekar, et al.
- 2006
(Show Context)
Citation Context ...er recognition task [57], [58]. Most of the research in the speaker recognition area was devoted to finding robust modeling techniques, capable of handling channel and inter-session variability [32], =-=[53]-=-, [59]. The state-of-the-art method is now using a low-rank vector, the so-called i-vector, to represent an utterance based on total variability subspace modeling [32] and probabilistic linear discrim... |

70 | The second chimespeech separation and recognition challenge: Datasets, tasks and baselines,”
- Vincent, Barker, et al.
- 2013
(Show Context)
Citation Context ...d Bin Lay Place Set Color blue green white red Preposition with at on by Letter A–Z Number 1–9 zero Adverb soon again now please Fig. 3: Task grammar defined in the CHiME speech recognition challenge =-=[43]-=-. An example sentence would be Lay blue at A five please. 5.1 Experimental Setup Experiments were carried out using the CHiME multichannel robust ASR task [46]. This task simulates human-machine inter... |

59 | Subclass discriminant analysis
- Zhu, Martinez
(Show Context)
Citation Context ...possible to obtain non-linear class separation using subclass discriminant analysis and the kernel trick in LDA [12]. When each class is composed of several partitions, subclass discriminant analysis =-=[13]-=- aims to maximize the distance between class means and the subclass means in the same class at the same time. Compared to principal component analysis, classdependent dimensionality reduction is expec... |

58 | Uncertainty decoding with SPLICE for noise robust speech recognition
- Droppo, Deng, et al.
- 2002
(Show Context)
Citation Context ... be directly observed but a probabilistic model relating x and some observed variable y is available. This model can be in the form of a likelihood distribution p(y|x) as e.g. in uncertainty decoding =-=[24]-=-, a posterior distribution p(x|y) as e.g. in uncertainty propagation [25], or a joint distribution p(x, y) as e.g. in joint uncertainty decoding [26]. See [27] for review of the topic. As it is shown ... |

55 | Speaker and Session Variability in GMM-Based Speaker Verification
- Kenny, Boulianne, et al.
- 2007
(Show Context)
Citation Context ...lment and test utterance. The acoustic feature distribution is captured by a universal background model (UBM) [52] and the subspace modeling techniques developed in the joint factor analysis approach =-=[60]-=- are utilized. A schematic block diagram of the state-of-theart speaker recognition system as used for experiments in this paper is shown in Figure 5. MANUSCRIPT, SUBMITTED TO IEEE TRANSACTIONS ON PAT... |

54 | The Application of Hidden Markov Models in speech Recognition. Foundation and trends in Signal Processing.
- Gales, Young
- 2008
(Show Context)
Citation Context ...are treated as between-class and within-class scatters. 5 EXPERIMENTS FOR AUTOMATIC SPEECH RECOGNITION For automatic speech recognition, hidden Markov models have proven to be highly successful [38], =-=[39]-=-. In HMMbased speech recognition, all relevant phonetic or subphonetic units are described by left-to-right Markov models, which describe the evolution of acoustic observations within the phonetic uni... |

50 | Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametri model of speech distortion
- Deng, Droppo, et al.
- 2005
(Show Context)
Citation Context ...s Speech Signal Feature Extraction [25] LDA/ULDA Modified Imputation [42] HMM [44] Words Uncertainty of observations Speech Signal Feature Extraction [25] LDA/ULDA HMM with uncertainty decoding [44], =-=[45]-=- Words Uncertainty of observations Fig. 4: A block diagram of a typical state-of-the-art automatic speech recognition (ASR) system. Three flows are indicating three different configurations for employ... |

47 |
Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System
- Burget, Matejka, et al.
- 2007
(Show Context)
Citation Context ...rally hold in practice, this conventional approach, briefly reviewed in Section 2, and its variants have been found useful in many applications including automatic speech and speaker recognition [4]– =-=[7]-=-. Dimensionality reduction is a usual pre-processing stage to make the input data more suitable for the modeling stage. Though statistical modeling techniques, like Gaussian mixture models, are quite ... |

42 |
Bayesian Speaker Verification with Heavy-Tailed Priors
- Kenny
- 2010
(Show Context)
Citation Context ...ionality. On the other hand, PLDA it mostly targeted to accomplish the recognition tasks taking place in the latent variable space. In particular cases of face recognition [36] or speaker recognition =-=[37]-=-, the PLDA model parameters are learned from a disjoint set of training data, and the trained system forwarded to deal with matching examples of novel classes. As the Equations 13 and 14 are implying,... |

37 |
The elements of statistical learning, volume 1
- Hastie, Tibshirani, et al.
- 2009
(Show Context)
Citation Context ...ts conventional LDA counterpart. 1 INTRODUCTION L INEAR discriminant analysis (LDA) is one of thesimplest and most used transforms to enhance class separability for multidimensional observations [1], =-=[2]-=-. Conventional LDA assumes that each class follows a normal distribution and classes share the same covariance structure (are homoscedastic) [3]. Although these assumptions do not generally hold in pr... |

32 | M.J.F.Gales, “Adaptive training with joint uncertainty decoding for robust recognition of noisy data,” in
- Liao
- 2007
(Show Context)
Citation Context ...istribution p(y|x) as e.g. in uncertainty decoding [24], a posterior distribution p(x|y) as e.g. in uncertainty propagation [25], or a joint distribution p(x, y) as e.g. in joint uncertainty decoding =-=[26]-=-. See [27] for review of the topic. As it is shown in Figure 1, the uncertainty in an observation could be a result of several factors which collectively result in deviating from optimal representatio... |

26 |
Probabilistic linear discriminant analysis.
- Ioffe
- 2006
(Show Context)
Citation Context ... the likelihood of Gaussian classes and is equivalent to linear regression using the class labels. Probabilistic LDA (PLDA) is a general method that can accomplish a wide variety of recognition tasks =-=[35]-=-, [36]. PLDA is formulated by a generative model, where the j-th observation from i-th class is expressed as xij = m + Fhi + Gcij + ij . (15) In Equation 15, m stands for the expected value of all da... |

22 | Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environment
- Bořil, Hansen
- 2009
(Show Context)
Citation Context ...a coefficients, resulting in 60-dimensional feature vectors [51], [71]. The speech activity detector in [72] is employed to discard nonspeech frames and Quantile-based cepstral dynamics normalization =-=[73]-=- with low-pass temporal filtering adopted from RASTA [74] is applied on the final features. A gender-dependent universal background model (UBM) with 1024 components is trained using a subset of NIST S... |

20 | Bayesian quadratic discriminant analysis,”
- Srivastava, Gupta, et al.
- 2007
(Show Context)
Citation Context ... as small sample size [8]. Dealing with ill-posed covariance estimation in finding discriminant directions has been addressed as a challenging problem [9]. Regularization [10] and Bayesian estimation =-=[11]-=- of covariance models have been discussed in the literature to overcome this issue. It is also possible to obtain non-linear class separation using subclass discriminant analysis and the kernel trick ... |

18 |
PLDA for speaker verification with utterances of arbitrary duration
- Kenny, Stafylakis, et al.
- 2013
(Show Context)
Citation Context ...aker is represented by a so-called i-vector, which is the mean of the a posteriori distribution considering the available speech material for a given speaker. Following the same principle as in [29], =-=[33]-=-, [34], we carry out speaker recognition experiments in which we take into account the uncertainty in mean statistics estimation. MANUSCRIPT, SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MAC... |

18 |
Speaker Verification in noise using a stochastic version of the weighted Viterbi algorithm”.
- Yoma, Villar
- 2002
(Show Context)
Citation Context ...proposed [40]. One of the best-known of these uncertainty-ofobservation-methods is uncertainty decoding (UD). This method replaces the likelihood at each state by the expected likelihood according to =-=[41]-=- bq(xl) UD = E(bq(xl) = M∑ m=1 wqm · N ( µl;µqm,Σqm + Σl ) . (18) This is equivalent to adding the variance of the feature posterior to the variance of the currently considered state output probabilit... |

17 |
On information and distance measures, error bounds, and feature selection
- Chen
- 1976
(Show Context)
Citation Context ...s a generalization of the between-class scatter matrix has been suggested to address this issue [19]. In LDA, the distance function used for obtaining the DDM could be chosen as the Chernoff distance =-=[20]-=- or as its multi-class generalization, Matusita’s separability measure [21], [22]. In this paper we are addressing the task of finding linear discriminant directions when instead of a point estimate f... |

16 |
Eds., Techniques for Noise Robustness in Automatic Speech Recognition,
- Virtanen, Singh, et al.
- 2012
(Show Context)
Citation Context ...he Fourier coefficients of speech and noise are circularly symmetric complex Gaussian distributed, it is possible to use for example Wiener filtering to arrive at an uncertain spectral representation =-=[30]-=-, [31]. In modern speaker recognition systems [32], each speaker is represented by a so-called i-vector, which is the mean of the a posteriori distribution considering the available speech material fo... |

16 | Duration Mismatch Compensation for I-vector based Speaker Recognition Systems,” in
- Hasan, Saeidi, et al.
- 2013
(Show Context)
Citation Context ...gnition is posed as dealing with variable (short) utterance duration and several techniques have been proposed recently to compensate for this factor in the context of i-vector extraction [33], [34], =-=[64]-=-–[67]. The i-vectors extracted using a sufficient amount of speech follow a standard normal distribution. However, as studied recently in [33], [34], this is not the case any more when a considerable ... |

12 |
Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques
- Kolossa, Klimas, et al.
- 2005
(Show Context)
Citation Context ... variance of the feature posterior to the variance of the currently considered state output probability distribution. An alternative uncertainty-of-observation-method, termed modified imputation (MI) =-=[42]-=-, in effect splits the likelihood computation into two steps: In the first step, the most likely value of the hidden variable x∗l for the m-th Gaussian of state q is found via x̂MIqml = (Σ −1 l + Σ −1... |

12 | Incorporating auditory feature uncertainties in robust speaker identification - Shao, Srinivasan, et al. - 2007 |

11 | Modelling non-stationary noise with spectral factorisation in automatic speech recognition,” Computer Speech & Language
- Hurmalainen, Gemmeke, et al.
- 2012
(Show Context)
Citation Context ...LDA transform before using uncertainty decoding or modified imputation in the test stage [49]. In line with the HTK models, we use 250 speech states (4–10 states per word) to label speech basis atoms =-=[50]-=-. In estimating the LDA and ULDA transforms, we associate the training acoustic features of each speaker to 250 classes to arrive at speaker-dependent discriminative transforms. 5.2 Experimental Resul... |

10 |
et al., “I4U submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification
- Saedi, Lee, et al.
- 2013
(Show Context)
Citation Context ...as used for experiments in this paper is shown in Figure 5. MANUSCRIPT, SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 7 Speech Signal Claimed Identity Feature Extraction =-=[51]-=- UBM [52] Factorize GMM meansupervectors [32] Variable-length utterance→ Fixed-length, low-rank i-vector LDA/ULDA CenteringWCCN [53]Lenght Normalization [54]PLDA [36]recognition score i-vector Uncerta... |

10 |
Knowing the non-target speakers: the effect of the i-vector population for PLDA training in speaker recognition,” in
- Leeuwen, Saeidi
- 2013
(Show Context)
Citation Context ...workshop. Before NIST SRE’12, the task was solely defined as speaker detection, whereas in SRE’12, the performance metric and evaluation condition resembles an open set speaker recognition task [57], =-=[58]-=-. Most of the research in the speaker recognition area was devoted to finding robust modeling techniques, capable of handling channel and inter-session variability [32], [53], [59]. The state-of-the-a... |

9 |
Audiovisual speech recognition with missing or unreliable data
- Kolossa, Zeiler, et al.
- 2009
(Show Context)
Citation Context ...EEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 6 Configuration Conventional (Conv) Modified Imputation (MI) Uncertainty Decoding (UD) Speech Signal Feature Extraction [25] LDA/ULDA HMM =-=[44]-=- Words Uncertainty of observations Speech Signal Feature Extraction [25] LDA/ULDA Modified Imputation [42] HMM [44] Words Uncertainty of observations Speech Signal Feature Extraction [25] LDA/ULDA HMM... |

9 |
Leeuwen, “Quality measure functions for calibration of speaker recognition system in various duration conditions,” Accepted to
- Mandasari, Saeidi, et al.
- 2013
(Show Context)
Citation Context ... extracted from frames of 30 ms windowed speech every 15 ms, appended with the frame energy and concatenated with delta and delta-delta coefficients, resulting in 60-dimensional feature vectors [51], =-=[71]-=-. The speech activity detector in [72] is employed to discard nonspeech frames and Quantile-based cepstral dynamics normalization [73] with low-pass temporal filtering adopted from RASTA [74] is appli... |

8 | A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data
- Kinnunen, Rajan
- 2013
(Show Context)
Citation Context ...ed speech every 15 ms, appended with the frame energy and concatenated with delta and delta-delta coefficients, resulting in 60-dimensional feature vectors [51], [71]. The speech activity detector in =-=[72]-=- is employed to discard nonspeech frames and Quantile-based cepstral dynamics normalization [73] with low-pass temporal filtering adopted from RASTA [74] is applied on the final features. A gender-dep... |

8 | Covariance modelling for noise-robust speech recognition
- Dalen, Gales
- 2008
(Show Context)
Citation Context ...ass labels. Additionally, we have recently proposed a noiseadaptive LDA (NALDA) [75], a computationally less expensive approximation to the predictive heteroscedastic LDA (HLDA) approach described in =-=[76]-=-. NALDA can take the uncertainty of the acoustic features into account in the feature extraction process. This is achieved by MANUSCRIPT, SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE... |

7 |
Text-dependent speaker recognition using PLDA with uncertainty propagation
- Stafylakis, Kenny, et al.
- 2013
(Show Context)
Citation Context ...s represented by a so-called i-vector, which is the mean of the a posteriori distribution considering the available speech material for a given speaker. Following the same principle as in [29], [33], =-=[34]-=-, we carry out speaker recognition experiments in which we take into account the uncertainty in mean statistics estimation. MANUSCRIPT, SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE I... |

7 |
Integration of Short-Time Fourier Domain Speech Enhancement and Observation Uncertainty Techniques for Robust Automatic Speech Recognition,
- Astudillo
- 2010
(Show Context)
Citation Context ... filter was used. The noise was estimated from a fixed blocking matrix, nulling the broadside direction [31]. To further improve performance, an uncertainty-propagation-based MMSEMFCC estimator [25], =-=[47]-=- was used for the feature extraction. Cepstral mean subtraction was applied as a final pre-processing stage. The posterior distribution associated to a Wiener filter in the short-time Fourier transfor... |

6 |
Robust speech recognition of uncertain or missing data
- Kolossa, Haeb-Umbach
- 2011
(Show Context)
Citation Context ...n p(y|x) as e.g. in uncertainty decoding [24], a posterior distribution p(x|y) as e.g. in uncertainty propagation [25], or a joint distribution p(x, y) as e.g. in joint uncertainty decoding [26]. See =-=[27]-=- for review of the topic. As it is shown in Figure 1, the uncertainty in an observation could be a result of several factors which collectively result in deviating from optimal representation. The foc... |

6 |
Uncertainty propagation in front end factor analysis for noise robust speaker recognition
- Yu, Liu, et al.
- 2014
(Show Context)
Citation Context ...ort speech signals, to name a few important ones. The effect of noise on acoustic features can be modeled by an uncertainty of short-time features and then propagated to the following modeling stages =-=[61]-=-–[63]. The problem of incomplete observations in speaker recognition is posed as dealing with variable (short) utterance duration and several techniques have been proposed recently to compensate for t... |

6 |
Robust Speaker Identification in Noisy and Reverberant Conditions
- Zhao, Wang, et al.
- 2014
(Show Context)
Citation Context ...peech signals, to name a few important ones. The effect of noise on acoustic features can be modeled by an uncertainty of short-time features and then propagated to the following modeling stages [61]–=-=[63]-=-. The problem of incomplete observations in speaker recognition is posed as dealing with variable (short) utterance duration and several techniques have been proposed recently to compensate for this f... |

6 | Improving short utterance based i-vector speaker recognition using source and utterance-duration normalization techniques, in: Interspeech 2013 - Kanagasundaram, Dean, et al. - 2013 |

6 |
Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques
- Kanagasundaram, Dean, et al.
(Show Context)
Citation Context ...on is posed as dealing with variable (short) utterance duration and several techniques have been proposed recently to compensate for this factor in the context of i-vector extraction [33], [34], [64]–=-=[67]-=-. The i-vectors extracted using a sufficient amount of speech follow a standard normal distribution. However, as studied recently in [33], [34], this is not the case any more when a considerable amoun... |

6 |
Probabilistic linear discriminant analysis of i-vector posterior distributions
- Cumani, Plchot, et al.
- 2013
(Show Context)
Citation Context ...ization of latent variable Φ. The zero- and first-order statistics N (m) X and f (m) X for an utterance can be calculated with respect to m-th component of UBM and as it is shown in [33], [34], [68], =-=[69]-=-, assuming a normal distribution for the i-vector φ ∼ N (µφ,Σφ), the point estimate µφ and related uncertainty Σφ is given by µφ = ΣφT T Σ̃−1fX , (21) TABLE 2: Number of speakers, speech segments and ... |

6 | Probabilistic linear discriminant analysis for acoustic modeling,”
- Lu, Renals
- 2014
(Show Context)
Citation Context ...-vectors as in Equation 22 for finding ULDA directions. We carried out uncertainty propagation through i-vector post-processing steps and included uncertainty decoding in PLDA [33], [34], [49], [69], =-=[70]-=-. 6.2 Experimental Setup In feature extraction stage, 19 Mel-frequency cepstral coefficients are extracted from frames of 30 ms windowed speech every 15 ms, appended with the frame energy and concaten... |

5 | Minimax i-vector extractor for short duration speaker verification - Hautamäki, Cheng, et al. - 2013 |

4 |
A MMSE estimator in mel-cepstral domain for robust large vocabulary automatic speech recognition using uncertainty propagation
- Astudillo, Orglmeister
(Show Context)
Citation Context ...ved variable y is available. This model can be in the form of a likelihood distribution p(y|x) as e.g. in uncertainty decoding [24], a posterior distribution p(x|y) as e.g. in uncertainty propagation =-=[25]-=-, or a joint distribution p(x, y) as e.g. in joint uncertainty decoding [26]. See [27] for review of the topic. As it is shown in Figure 1, the uncertainty in an observation could be a result of sever... |

2 |
Accounting for the uncertainty of speech estimates in the complex domain for minimummean square error speech enhancement
- Astudillo, Kolossa, et al.
(Show Context)
Citation Context ...a sequence of independent random variables each described by a Gaussian posterior distribution. In real world conditions, these posteriors can be attained e.g. from a previous signal enhancement step =-=[28]-=- or errors in estimation [29]. In this work we perform robust automatic speech recognition and speaker recognition. In noise robust speech recognition, the additive noise is considered as the source o... |

2 |
On the use of i-vector posterior distributions in probabilistic linear discriminant analysis
- Cumani, Plchot, et al.
- 2014
(Show Context)
Citation Context ...dom variables each described by a Gaussian posterior distribution. In real world conditions, these posteriors can be attained e.g. from a previous signal enhancement step [28] or errors in estimation =-=[29]-=-. In this work we perform robust automatic speech recognition and speaker recognition. In noise robust speech recognition, the additive noise is considered as the source of the uncertainty. By assumin... |

2 |
recognition evaluation. http://www.nist.gov/itl/iad/mig/ sre.cfm
- speaker
(Show Context)
Citation Context ...ast 20 years [55]. One of the main reasons is the support of the National Institute of Standards and Technology (NIST) by organizing a series of benchmarks, the speaker recognition evaluations (SREs) =-=[56]-=- starting in 1996. For each SRE, the task, the data and the evaluation metrics are supplied by NIST and after submission of recognition scores by participating sites, researchers share thoughts in a f... |

1 |
The elements of statistical learning, volume 10
- Hastie, Tibshirani, et al.
- 2013
(Show Context)
Citation Context ...ance estimation becomes unstable, a problem known as small sample size [8]. Dealing with ill-posed covariance estimation in finding discriminant directions has been addressed as a challenging problem =-=[9]-=-. Regularization [10] and Bayesian estimation [11] of covariance models have been discussed in the literature to overcome this issue. It is also possible to obtain non-linear class separation using su... |

1 | Distance Preserving Dimension Reduction for Manifold Learning, chapter 56
- Park, Zha, et al.
(Show Context)
Citation Context ...ionality reduction transform maps the D-dimensional data samples to a ddimensional space (d < D) subject to the constraint that nearby data samples are mapped to nearby lowdimensional representations =-=[17]-=-. Considering K as the number of the classes in a dataset, the selection of less than K − 1 dimensions in LDA for data projection does not guarantee to preserve the distance between classes from a cla... |

1 |
Chernoff distance and relief feature selection
- Peng, Seetharaman
- 2012
(Show Context)
Citation Context ...o address this issue [19]. In LDA, the distance function used for obtaining the DDM could be chosen as the Chernoff distance [20] or as its multi-class generalization, Matusita’s separability measure =-=[21]-=-, [22]. In this paper we are addressing the task of finding linear discriminant directions when instead of a point estimate for an observation, a probabilistic description is available. We achieve suc... |

1 |
A heteroscedastic extension of LDA based on multi-class matusita affinity
- Mahanta, Plataniotis
- 1921
(Show Context)
Citation Context ...ess this issue [19]. In LDA, the distance function used for obtaining the DDM could be chosen as the Chernoff distance [20] or as its multi-class generalization, Matusita’s separability measure [21], =-=[22]-=-. In this paper we are addressing the task of finding linear discriminant directions when instead of a point estimate for an observation, a probabilistic description is available. We achieve such a pr... |

1 |
Dealing with uncertainty: A survey of theories and practices
- Li, Chen, et al.
- 2013
(Show Context)
Citation Context ...presented as xl ∼ N (µl,Σl). 3 UNCERTAINTY-OF-OBSERVATION TECHNIQUES The concept of uncertainty exists in many branches of science and there are different techniques employed to deal with uncertainty =-=[23]-=-. Uncertainty-of-observation techniques concern the application of machine learning algorithms to situations in which the input signal x can not be directly observed but a probabilistic model relating... |

1 |
et al. Integration of beamforming and uncertaintyof-observation techniques for robust ASR in multi-source environments
- Astudillo
- 2013
(Show Context)
Citation Context ...rier coefficients of speech and noise are circularly symmetric complex Gaussian distributed, it is possible to use for example Wiener filtering to arrive at an uncertain spectral representation [30], =-=[31]-=-. In modern speaker recognition systems [32], each speaker is represented by a so-called i-vector, which is the mean of the a posteriori distribution considering the available speech material for a gi... |

1 | Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation
- Saeidi, Alku
- 2015
(Show Context)
Citation Context ...ord recognition error rates for each configuration using LDA or ULDA for dimensionality reduction. LDA or the ULDA transform before using uncertainty decoding or modified imputation in the test stage =-=[49]-=-. In line with the HTK models, we use 250 speech states (4–10 states per word) to label speech basis atoms [50]. In estimating the LDA and ULDA transforms, we associate the training acoustic features ... |

1 |
Noise-adaptive LDA: A new approach for speech recognition under observation uncertainty
- Kolossa, Zeiler, et al.
(Show Context)
Citation Context ...iability ŜT = ŜB + ŜW is used to find eigenvectors representing the directions of largest variance irrespective of class labels. Additionally, we have recently proposed a noiseadaptive LDA (NALDA) =-=[75]-=-, a computationally less expensive approximation to the predictive heteroscedastic LDA (HLDA) approach described in [76]. NALDA can take the uncertainty of the acoustic features into account in the fe... |