## • Linguistic Differences e.g.

### BibTeX

@MISC{Gales_•linguistic,

author = {Mark Gales and Tomato In Rp/american English},

title = {• Linguistic Differences e.g.},

year = {}

}

### OpenURL

### Abstract

– linear transform-based adaptation / adaptive training

### Citations

599 | Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language
- Leggetter, Woodland
- 1995
(Show Context)
Citation Context ...April 2009 6Model-Based Approaches to Speaker and Environment Adaptation Form of the Adaptation Transform • Dominant form for LVCSR are ML-based linear transformations – MLLR adaptation of the means =-=[5]-=- µ (s) = A (s) µ + b (s) – MLLR adaptation of the covariance matrices [6, 7] – Constrained MLLR adaptation [7] Σ (s) = H (s) ΣH (s)T µ (s) = A (s) µ + b (s) ; Σ (s) = A (s) ΣA (s)T • Forms may be comb... |

409 | Maximum likelihood linear transformations for HMM-based speech recognition
- Gales
- 1998
(Show Context)
Citation Context ... Form of the Adaptation Transform • Dominant form for LVCSR are ML-based linear transformations – MLLR adaptation of the means [5] µ (s) = A (s) µ + b (s) – MLLR adaptation of the covariance matrices =-=[6, 7]-=- – Constrained MLLR adaptation [7] Σ (s) = H (s) ΣH (s)T µ (s) = A (s) µ + b (s) ; Σ (s) = A (s) ΣA (s)T • Forms may be combined into a hierarchy [8] e.g. CMLLR → MLLRMEAN Cambridge University Enginee... |

173 |
Speaker-independent isolated word recognition using dynamic features of speech spectrum
- FURUI
- 1986
(Show Context)
Citation Context ...009 32Model-Based Approaches to Speaker and Environment Adaptation Delta and Delta-Delta Parameters • Aim to ‘reduce’ HMM conditional independence assumptions – standard to add delta and delta-delta =-=[26]-=- parameters yt = ⎡ ⎣ ys t ∆y s t ∆ 2 y s t ⎤ ⎦ ; ∆y s t = ∑n i=1 wi ( s yt+i − ys ) t−i 2 ∑n i=1 w2 i • Two versions used to represent the impact of noise on these [27, 28] ∆y s t ≈ ∂ys t ∂t OR ∆ys t ... |

168 |
Hidden Markov Model Decomposition of Speech and Noise
- Varga, Moore
- 1990
(Show Context)
Citation Context ...M Clean Speech HMM • Each speech/noise pair considered – yields final component • Also multiple-states possible 1 2 3 a b Speech State − N components Noise State − M components – 3-D Viterbi decoding =-=[31]-=- • Iterative schemes also possible: – iterative PMC [29] – Algonquin [22] 1a 2a 1b 3a 2b 3b Corrupted−Speech State − NxM components Model Combination • Commonly used configuration: – single state – si... |

145 | A compact model for speaker-adaptive training
- Anastasakos
- 1996
(Show Context)
Citation Context ...t forms of canonical model: (b) Adaptive System • Multi-Style: adaptation converts a general system to a specific condition; • Adaptive: adaptation converts a “neutral” system to a specific condition =-=[10, 7]-=- Cambridge University Engineering Department Tsinghua University April 2009 9Model-Based Approaches to Speaker and Environment Adaptation Adaptive Training Transform Model Speaker 1 Speaker 1 Data Sp... |

91 | Speech Recognition in Noisy Environments
- Moreno
- 1996
(Show Context)
Citation Context ...rsity Engineering Department Tsinghua University April 2009 35Model-Based Approaches to Speaker and Environment Adaptation Vector Taylor Series • Vector Taylor Series (VTS) one popular approximation =-=[32, 30]-=- – Taylor series expansion about “current” parameter values – for these expression ignore impact of convolutional distortion – mismatch function approximated using first order series y s t ≈ µ s x + f... |

82 |
Model-Based Techniques for Noise Robust Speech Recognition
- Gales
- 1995
(Show Context)
Citation Context ...ent Adaptation Model-Based Compensation • Could retrain system using noise-corrupted training data – need to have all training data available and corrupt it with noise – slow - single-pass retraining =-=[29]-=- a faster approximation • Model-based compensation approximates SPR [29] µ (m) y = E{y|m}; Σ (m) y ( = diag E{yy T |m} − µ (m) y µ (m)T ) y • Due to non-linearities no closed form solution - approxima... |

80 | HMM adaptation using vector Taylor series for noisy speech recognition
- Acero, Deng, et al.
- 2000
(Show Context)
Citation Context ...noise” observations and combine – Log-Add: only transform the mean – Log-Normal: sum of two log-normal variables approximately log-normal – Vector Taylor series: first or higher order expansions used =-=[30]-=- • Referred to here as predictive schemes - model parameters implicitly found – contrast to adaptive speaker transforms - explicit parameter estimation Cambridge University Engineering Department Tsin... |

66 |
Maximum a-posteriori estimation for multivariate Gaussian mixture observations of Markov chains
- Gauvain, Lee
- 1994
(Show Context)
Citation Context ...ity April 2009 5Model-Based Approaches to Speaker and Environment Adaptation Form of the Adaptation Transform • There are a number of standard forms in the literature [1]. • Maximum A-Posteriori MAP =-=[2]-=- adaptation: general “robust” estimation – in simplest form only adapts “seen” components • Speaker Clustering: Gender-dependent (GD) models are the simplest from: – often estimated using MAP adaptati... |

58 | Cluster adaptive training of hidden Markov models
- Gales
- 2000
(Show Context)
Citation Context ...form only adapts “seen” components • Speaker Clustering: Gender-dependent (GD) models are the simplest from: – often estimated using MAP adaptation with speaker-independent priors EigenVoices[3], CAT =-=[4]-=- are more complex forms. • Vocal Tract Length Normalisation: motivated from physiological perspective • Linear Transform Adaptation: dominant form for LVCSR – will be the focus of this part of the tal... |

51 |
The HTK Book, version 3.4
- Young, Evermann, et al.
- 2006
(Show Context)
Citation Context ...s) – MLLR adaptation of the covariance matrices [6, 7] – Constrained MLLR adaptation [7] Σ (s) = H (s) ΣH (s)T µ (s) = A (s) µ + b (s) ; Σ (s) = A (s) ΣA (s)T • Forms may be combined into a hierarchy =-=[8]-=- e.g. CMLLR → MLLRMEAN Cambridge University Engineering Department Tsinghua University April 2009 7Model-Based Approaches to Speaker and Environment Adaptation ML and MAP Linear Transforms • Transfor... |

43 |
Probabilistic optimum filtering for robust speech recognition
- Newneyer, Weintraub
- 1994
(Show Context)
Citation Context ...(yt − µ (r) y ) = A (r) yt + b (r) – joint distribution estimated using stereo data can be estimated using model-based compensation schemes [32, 35] – various forms/variants possible: SPLICE [36], POF=-=[37]-=-, VTS-based [32, 38] Cambridge University Engineering Department Tsinghua University April 2009 39Model-Based Approaches to Speaker and Environment Adaptation Uncertainty Decoding 15 (1) qt x t y t n... |

40 |
Mixture-model adaptation for
- Foster, Kuhn
- 2007
(Show Context)
Citation Context ...simplest form only adapts “seen” components • Speaker Clustering: Gender-dependent (GD) models are the simplest from: – often estimated using MAP adaptation with speaker-independent priors EigenVoices=-=[3]-=-, CAT [4] are more complex forms. • Vocal Tract Length Normalisation: motivated from physiological perspective • Linear Transform Adaptation: dominant form for LVCSR – will be the focus of this part o... |

38 | Uncertainty decoding with SPLICE for noise robust speech recognition
- Droppo, Acero, et al.
- 2002
(Show Context)
Citation Context ... 2 3 4 5 6 7 8 O • All the model-based approaches are computationally expensive – scales linearly with # components (100K+ for LVCSR systems) • Need to model the conditional distribution p(yt|xt, nt) =-=[39, 22, 33]-=- – select form to allow efficient compensation/decoding (if possible) Cambridge University Engineering Department Tsinghua University April 2009 40Model-Based Approaches to Speaker and Environment Ad... |

36 | Uncertainty decoding for noise robust speech recognition
- Liao
- 2007
(Show Context)
Citation Context ...e the noise model parameters, µn, µh, Σn, are not known – need to be estimated from test data – simplest approach - use VAD and start/end frames to estimate noise • Also possible to use ML estimation =-=[32, 33, 24]-=- { ˆµn, ˆµh, ˆ } Σn = argmax µn,µh,Σn {p(y1, . . . , yT |µn, µh, Σn; λx)} • VTS approximation yields simple approach to find µn, µh – first/second-order approaches to find Σn – simple statistics for a... |

33 |
Speaker adaptation for continuous density hmms: a review
- Woodland
(Show Context)
Citation Context ...ring Department Tsinghua University April 2009 5Model-Based Approaches to Speaker and Environment Adaptation Form of the Adaptation Transform • There are a number of standard forms in the literature =-=[1]-=-. • Maximum A-Posteriori MAP [2] adaptation: general “robust” estimation – in simplest form only adapts “seen” components • Speaker Clustering: Gender-dependent (GD) models are the simplest from: – of... |

28 | Iterative Unsupervised Adaptation Using Maximum Likelihood Linear Regression
- Woodland, Pye, et al.
- 1996
(Show Context)
Citation Context ...nsform Speaker Transform Transform • Two iterative loops for estimation: 1. estimate hypothesis given transform 2. update complete-dataset given transform and hypothesis referred to as Iterative MLLR =-=[11]-=- • For supervised training hypothesis is known • Confidence-scores can also be used – confidence-based MLLR [12] Cambridge University Engineering Department Tsinghua University April 2009 12Model-Bas... |

26 | Adaptive Training with Joint Uncertainty Decoding for Robust Recognition of Noise Data
- Liao, Gales
- 2007
(Show Context)
Citation Context ...stigated – generic transforms: MLLR, CMLLR, CAT – noise targeted transform: Noisy CMLLR • Perform adaptive training with VTS and JUD – Joint Adaptive Training examined on Broadcast News transcription =-=[43]-=- – interested in applying VTS-adaptive training/JAT in lower SNR conditions Cambridge University Engineering Department Tsinghua University April 2009 47Model-Based Approaches to Speaker and Environm... |

23 | Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation
- Tsakalidis, Doumpiotis, et al.
- 2005
(Show Context)
Citation Context ...rtment Tsinghua University April 2009 18Model-Based Approaches to Speaker and Environment Adaptation Discriminative Linear Transforms • Linear transforms can be trained using discriminative criteria =-=[14, 15]-=- – estimation using minimum phone error (MPE) training W (s) d = arg min W { ∑ P (H|O (s) ; W)L(H, H (s) } ) . H • For unsupervised adaptation discriminative linear transforms (DLTs) not used – estima... |

22 | Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise
- Deng, Droppo, et al.
- 2004
(Show Context)
Citation Context ...t possible to get simple expression for all parameterisations • This has assumed sufficient smoothing to remove all “cross” terms – some sites use interaction likelihoods or phase-sensitive functions =-=[22, 23]-=- – given x s t, h s and n s t there is a distribution y s t ∼ N (x s t + h s t + f(x s t, n s t, h s ), Φ) Cambridge University Engineering Department Tsinghua University April 2009 30Model-Based App... |

19 | Predictive linear transforms for noise robust speech recognition
- Gales, Dalen
(Show Context)
Citation Context ...matched-bound approximation (K terms independent of A, b) KL(p||˜p) ≤ − MX m=1 c (m) y Z N (y; µ (m) y , Σ (m) y “ ) log N (Ay + b; µ (m) x – a framework for estimating “predictive” linear transforms =-=[41]-=- , Σ (m) ” x ) dy + K Cambridge University Engineering Department Tsinghua University April 2009 43Model-Based Approaches to Speaker and Environment Adaptation Predictive CMLLR • For schemes like CML... |

18 | Factor analysis invariant to linear transformations of data
- Gopinath, Ramabhadran, et al.
(Show Context)
Citation Context ...hes to Speaker and Environment Adaptation Noisy CMLLR and Factor Analysis • The estimation of/adaptive training of NCMLLR related to: – shared factor analysis approach for covariance matrix modelling =-=[20]-=- – EM-based VTS adaptive training for canonical model estimation [21] • All treat “clean” speech as a latent variable – posterior distribution depends on the form being examined – update for canonical... |

18 | Issues with uncertainty decoding for noise robust automatic speech recognition
- Liao, Gales
(Show Context)
Citation Context ...tereo data can also be used) • Product of Gaussians is an un-normalised Gaussian, so p(yt|m, r) = |A (r) |N (A (r) yt + b (r) ; µ (m) , Σ (m) + Σ (r) b ) – r is normally determined by the component m =-=[40]-=- – contrast to MMSE where GMM built in acoustic space to determine r Cambridge University Engineering Department Tsinghua University April 2009 41Model-Based Approaches to Speaker and Environment Ada... |

15 |
Speaker Adaptation Using Lattice-based MLLR
- Uebel, Woodland
- 2001
(Show Context)
Citation Context ...2. update complete-dataset given transform and hypothesis referred to as Iterative MLLR [11] • For supervised training hypothesis is known • Confidence-scores can also be used – confidence-based MLLR =-=[12]-=- Cambridge University Engineering Department Tsinghua University April 2009 12Model-Based Approaches to Speaker and Environment Adaptation Lattice-Based MLLR • For unsupervised adaptation hypothesis ... |

15 |
Irrelevant variability normalization based hmm training using vts approximation of an explicit model of environmental distortions
- Huo, Hu
- 2007
(Show Context)
Citation Context ...ysis • The estimation of/adaptive training of NCMLLR related to: – shared factor analysis approach for covariance matrix modelling [20] – EM-based VTS adaptive training for canonical model estimation =-=[21]-=- • All treat “clean” speech as a latent variable – posterior distribution depends on the form being examined – update for canonical models: ˆµ (m) = ∑H ∑T h=1 t=1 γ(mh) ∑H ∑T h=1 t=1 γ(mh) t t E {st|o... |

15 |
Robust speech recognition in noise - performance of the IBM continuous speech recognizer on the ARPA noise spoke task
- Gopinath
- 1995
(Show Context)
Citation Context ...rd to add delta and delta-delta [26] parameters yt = ⎡ ⎣ ys t ∆y s t ∆ 2 y s t ⎤ ⎦ ; ∆y s t = ∑n i=1 wi ( s yt+i − ys ) t−i 2 ∑n i=1 w2 i • Two versions used to represent the impact of noise on these =-=[27, 28]-=- ∆y s t ≈ ∂ys t ∂t OR ∆ys t = D ⎡ ⎣ ys t−1 y s t y s t+1 – the second is more accurate, but more statistics required to be stored – need to compensate all model parameters for best performance • For e... |

14 |
Speech Recognition in Adverse Environments: A Probabilistic Approach
- Kristjansson
- 2002
(Show Context)
Citation Context ...t possible to get simple expression for all parameterisations • This has assumed sufficient smoothing to remove all “cross” terms – some sites use interaction likelihoods or phase-sensitive functions =-=[22, 23]-=- – given x s t, h s and n s t there is a distribution y s t ∼ N (x s t + h s t + f(x s t, n s t, h s ), Φ) Cambridge University Engineering Department Tsinghua University April 2009 30Model-Based App... |

12 | Extended vts for noise-robust speech recognition
- Dalen, Gales
- 2009
(Show Context)
Citation Context ...rd to add delta and delta-delta [26] parameters yt = ⎡ ⎣ ys t ∆y s t ∆ 2 y s t ⎤ ⎦ ; ∆y s t = ∑n i=1 wi ( s yt+i − ys ) t−i 2 ∑n i=1 w2 i • Two versions used to represent the impact of noise on these =-=[27, 28]-=- ∆y s t ≈ ∂ys t ∂t OR ∆ys t = D ⎡ ⎣ ys t−1 y s t y s t+1 – the second is more accurate, but more statistics required to be stored – need to compensate all model parameters for best performance • For e... |

9 |
Maximum a-posterior linear regression with elliptical symmetric matrix variate priors
- Chou
- 1999
(Show Context)
Citation Context ...ith hypothesis H) W (s) ml = arg max W { p(O (s) } |H; W) – where W (s) [ ml = A (s) ml b (s) ] ml – however not robust to limited training data • Including transform prior, p(W), to get MAP estimate =-=[9]-=- W (s) { map = arg max p(O W (s) } |H; W)p(W) – for MLLR Gaussian is a Gaussian prior for the auxiliary function – CMLLR prior more challenging ... • Both approaches rely on expectation-maximisation (... |

9 | Bayesian adaptive inference and adaptive training
- Yu, Gales
- 2007
(Show Context)
Citation Context ...Environment Adaptation Adaptive Training From Bayesian Perspective q t q t+1 qt qt+1 Wt Wt+1 o t o t+1 o t o t+1 (e) Standard HMM (f) Adaptive HMM • Observation additionally dependent on transform Wt =-=[13]-=- – transform same for each homogeneous block (Wt = Wt+1) – adaptation integrated into acoustic model - instantaneous adaptation • Need to known the prior transform distribution p(W) (as in MAP scheme)... |

8 | Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement
- Stouten, Hamme, et al.
(Show Context)
Citation Context ... yx Σ (r) xx ]) E{xt|yt, r} = µ (r) x + Σ (r) xy Σ (r)-1 yy (yt − µ (r) y ) = A (r) yt + b (r) – joint distribution estimated using stereo data can be estimated using model-based compensation schemes =-=[32, 35]-=- – various forms/variants possible: SPLICE [36], POF[37], VTS-based [32, 38] Cambridge University Engineering Department Tsinghua University April 2009 39Model-Based Approaches to Speaker and Environ... |

7 | Unsupervised discriminative adaptation using discriminative mapping transforms
- Yu, Gales, et al.
- 2008
(Show Context)
Citation Context ...m without the problems: – train all speaker-specific parameters using ML training – train speaker-independent parameters using MPE training • Applying this to linear transforms yields (as one option) =-=[17]-=- [ A (s) ml µ (s) = Ad b (s) ] ml ( A (s) ml µ + b (s) ) ml = Adµ (s) ml + bd + bd – W (s) ml = - speaker-specific ML transform – Wd = [Ad bd] - speaker-independent MPE transform • Yields a composite ... |

7 | Incremental predictive and adaptive noise compensation
- Flego, Gales
- 2009
(Show Context)
Citation Context ... may be inaccurate transform parameters estimated noise model estimated - large numbers of parameters - small number of parameters • Obvious approach is to combine the two in a fashion similar to MAP =-=[42]-=-: – limited data predictive approaches used – increased data adaptive approaches used • Count smoothing simple approach to use (parent transforms also possible) k (r) pai = ∑ k (r) pci m∈rr γ(m) x + τ... |

6 |
Mean and variance adaptation within
- Gales, Woodland
- 1996
(Show Context)
Citation Context ... Form of the Adaptation Transform • Dominant form for LVCSR are ML-based linear transformations – MLLR adaptation of the means [5] µ (s) = A (s) µ + b (s) – MLLR adaptation of the covariance matrices =-=[6, 7]-=- – Constrained MLLR adaptation [7] Σ (s) = H (s) ΣH (s)T µ (s) = A (s) µ + b (s) ; Σ (s) = A (s) ΣA (s)T • Forms may be combined into a hierarchy [8] e.g. CMLLR → MLLRMEAN Cambridge University Enginee... |

6 |
Discriminative Adaptive Training Using The
- Wang, Woodland
- 2003
(Show Context)
Citation Context ...scriminative linear transforms (DLTs) not used – estimation highly sensitive to errors in supervision hypothesis – more costly to estimate transform than ML training • Not used for discriminative SAT =-=[16]-=-, standard procedure 1. perform standard ML-training (ML-SI) 2. perform ML SAT training updating models and transforms (ML-SAT) 3. estimate MPE-models given the ML-transforms (MPE-SAT) Cambridge Unive... |

5 | Adaptive training using discriminative mapping transforms
- Raut, Yu, et al.
- 2008
(Show Context)
Citation Context ... • Quantity of training data vast compared to available speaker-specific data – use large number of base-classes – in these experiments 1000 base-classes used • Can also be used for adaptive training =-=[18]-=- – closer to full discriminative adaptive training Cambridge University Engineering Department Tsinghua University April 2009 21Model-Based Approaches to Speaker and Environment Adaptation Discrimina... |

5 |
A Acero, “HMM adaptation with joint compensation of additive and convolutive distortions via vector Taylor series
- Li, Deng, et al.
- 2007
(Show Context)
Citation Context ...o of corrupted speech magnitude to clean speech magnitude 10 8 6 4 2 0 −20 −15 −10 −5 0 5 10 15 20 Signal−to−Noise Ratio (dB) • magnitude (α = 1, γ = 1) • power (α = 0, γ = 2) • α = 2.5 (AURORA tuned =-=[24]-=-) • γ = 0.75 (AURORA tuned [25]) • γ = 1.0 used in this work Cambridge University Engineering Department Tsinghua University April 2009 31Model-Based Approaches to Speaker and Environment Adaptation ... |

5 | Discriminative classifiers with generative kernels for noise robust speech recognition
- Gales, Flego
(Show Context)
Citation Context ... to clean speech magnitude 10 8 6 4 2 0 −20 −15 −10 −5 0 5 10 15 20 Signal−to−Noise Ratio (dB) • magnitude (α = 1, γ = 1) • power (α = 0, γ = 2) • α = 2.5 (AURORA tuned [24]) • γ = 0.75 (AURORA tuned =-=[25]-=-) • γ = 1.0 used in this work Cambridge University Engineering Department Tsinghua University April 2009 31Model-Based Approaches to Speaker and Environment Adaptation 14 12 power magnitude gamma=0.7... |

2 |
Robust speech recognition in time-varying environments
- Stouten
- 2006
(Show Context)
Citation Context ...= A (r) yt + b (r) – joint distribution estimated using stereo data can be estimated using model-based compensation schemes [32, 35] – various forms/variants possible: SPLICE [36], POF[37], VTS-based =-=[32, 38]-=- Cambridge University Engineering Department Tsinghua University April 2009 39Model-Based Approaches to Speaker and Environment Adaptation Uncertainty Decoding 15 (1) qt x t y t n t q (2) t 8 7 p(o|x... |

1 | Noisy CMLLR for noise-robust speech recognition
- Kim, Gales
(Show Context)
Citation Context ...MLLR • Linear transforms described are general – hierarchies allow very complex forms to be used – interesting to examine forms aimed at particular tasks • Noisy CMLLR is aimed at noise-robust speech =-=[19]-=- recognition p(ot; µ (m) , Σ (m) , A, b, Σb) = |A|N (Aot + b; µ (m) , Σ (m) +Σb) – has the same form as a model-based compensation scheme (JUD) • Similar to CMLLR, but with an additional bias on the v... |