## MANUSCRIPT, SUBMITTED TO IEEE SIGNAL PROCESSING LETTERS. 1 Regularized All-Pole Models for Speaker Verification Under Noisy Environments

### BibTeX

@MISC{Hanilçi_manuscript,submitted,

author = {Cemal Hanilçi and Tomi Kinnunen and Figen Ertas and Rahim Saeidi and Jouni Pohjalainen and Paavo Alku},

title = {MANUSCRIPT, SUBMITTED TO IEEE SIGNAL PROCESSING LETTERS. 1 Regularized All-Pole Models for Speaker Verification Under Noisy Environments},

year = {}

}

### OpenURL

### Abstract

Abstract—Regularization of linear prediction based melfrequency cepstral coefficient (MFCC) extraction in speaker verification is considered. Commonly, MFCCs are extracted from the discrete Fourier transform (DFT) spectrum of speech frames. In this paper, DFT spectrum estimate is replaced with the recently proposed regularized linear prediction (RLP) method. Regularization of temporally weighted variants, weighted LP (WLP) and stabilized WLP (SWLP) which have earlier shown success in speech and speaker recognition, is also introduced. A novel type of double autocorrelation (DAC) lag windowing is also proposed to enhance robustness. Experiments on the NIST 2002 corpus indicate that regularized all-pole methods (RLP, RWLP and RSWLP) yield large improvement on recognition accuracy under additive factory and babble noise conditions in terms of both equal error rate (EER) and minimum detection cost function (MinDCF). Index Terms—Speaker verification, spectrum estimation, linear prediction, regularized linear prediction. I.

### Citations

625 | Speaker verification using adapted gaussian mixture models,” Digital signal processing
- Reynolds, Quatieri, et al.
- 2000
(Show Context)
Citation Context ...e extraction (front-end) and pattern matching (back-end). In pattern matching, features extracted from a given speech input are compared to the claimed speaker’s model. Gaussian mixture models (GMMs) =-=[2]-=- and support vector machines (SVMs) are two popular backends, while mel-frequency cepstral coefficients (MFCCs) are commonly used as acoustic features. MFCCs are generally obtained from the discrete F... |

413 |
prediction: A tutorial review
- MAKHOUL, “Linear
- 1975
(Show Context)
Citation Context ...mpensation of speaker models [3] and score normalization [4] are commonly applied. In [5], the present authors extracted MFCCs from parametric all-pole spectral models based on linear prediction (LP) =-=[6]-=- and its temporally weighted extensions [7]. This led to increased speaker verification accuracy over the standard DFT method under additive noise contamination. A possible explanation for this is tha... |

199 |
Score normalization for text-independent speaker verification systems
- Auckenthaler, Carey, et al.
- 2000
(Show Context)
Citation Context ...he work of Rahim Saeidi was funded by the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 238803. compensation of speaker models [3] and score normalization =-=[4]-=- are commonly applied. In [5], the present authors extracted MFCCs from parametric all-pole spectral models based on linear prediction (LP) [6] and its temporally weighted extensions [7]. This led to ... |

76 |
Speech Enhancement: Theory and Practice
- Loizou
- 2007
(Show Context)
Citation Context ... for score normalization. Two gender-dependent background models and cohort models for Tnorm with 512 Gaussians are trained using the NIST 2001 SRE corpus. Power spectral subtraction (as described in =-=[15]-=-) is used as a pre-processing step in the signal domain to suppress additive noise. The MFCC features are extracted from 30 ms Hamming windowed speech frames every 15 ms. Magnitude spectrum estimation... |

58 | An Overview of Text-Independent Speaker Recognition: From Features to Supervectors
- Kinnunen, Li
(Show Context)
Citation Context ...ex Terms—Speaker verification, spectrum estimation, linear prediction, regularized linear prediction. I. INTRODUCTION SPEAKER verification aims to verify speaker’s identity from a given speech signal =-=[1]-=-. A speaker verification system consists of two modules: feature extraction (front-end) and pattern matching (back-end). In pattern matching, features extracted from a given speech input are compared ... |

52 |
Joint factor analysis versus eigenchannels in speaker recognition
- Kenny, Boulianne, et al.
- 2007
(Show Context)
Citation Context ...ojects 132129 and 127345). The work of Rahim Saeidi was funded by the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 238803. compensation of speaker models =-=[3]-=- and score normalization [4] are commonly applied. In [5], the present authors extracted MFCCs from parametric all-pole spectral models based on linear prediction (LP) [6] and its temporally weighted ... |

39 |
The Elements of Statistical Learning, Springer Series in Statistics
- Hastie, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ...zation of these all-pole models. In the field of pattern recognition, regularization techniques are commonly used for trading off between training and test errors to enhance classifier generalization =-=[8]-=- but they have been much less studied for feature extraction and speech parameterization [9]. Regularized LP (RLP) [9] is a parametric spectral modeling method motivated from a speech coding point of ... |

12 | Temporally weighted linear prediction features for tackling additive noise in speaker verification
- Saeidi, Pohjalainen, et al.
- 2010
(Show Context)
Citation Context ...unded by the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 238803. compensation of speaker models [3] and score normalization [4] are commonly applied. In =-=[5]-=-, the present authors extracted MFCCs from parametric all-pole spectral models based on linear prediction (LP) [6] and its temporally weighted extensions [7]. This led to increased speaker verificatio... |

12 |
The Short-Time Modified Coherence Representation and Noisy Speech Recognition
- Mansour, Juang
- 1989
(Show Context)
Citation Context ...x. In [11] and [9] the authors used, respectively, Blackman and boxcar windows to compute F matrix. We compare these two windows and, additionally, also the Hamming window in speaker verification. In =-=[12]-=-, [13], [14], it was shown that the so-called double autocorrelation (DAC) sequence can be used for robust estimation of spectral envelope in the presence of additive noise. Thus, besides the differen... |

9 |
Robust signal selection for linear prediction analysis of voiced speech
- Ma, Kamp, et al.
- 1993
(Show Context)
Citation Context ...Given the predictor coefficients, ak, the LP spectrum is obtained by SLP(f) = B. Temporally Weighted All-pole Models 1 |1+ ∑p k=1ake−j2πfk 2. (3) | In contrast to LP, weighted linear prediction (WLP) =-=[10]-=- determines the predictor coefficients by minimizing a temporally weighted energy of the prediction error, E = ∑ ne2 (n)Ψn ∑ = n (x(n)+∑ p k=1bkx(n−k)) 2Ψn, whereΨn is a time-domain weighting function... |

6 |
Stabilized weighted linear prediction
- Magi, Pohjalainen, et al.
- 2009
(Show Context)
Citation Context ...normalization [4] are commonly applied. In [5], the present authors extracted MFCCs from parametric all-pole spectral models based on linear prediction (LP) [6] and its temporally weighted extensions =-=[7]-=-. This led to increased speaker verification accuracy over the standard DFT method under additive noise contamination. A possible explanation for this is that low-order all-pole models, due to smaller... |

4 |
Regularized linear prediction of speech
- Ekman, Kleijn, et al.
- 2008
(Show Context)
Citation Context ...ues are commonly used for trading off between training and test errors to enhance classifier generalization [8] but they have been much less studied for feature extraction and speech parameterization =-=[9]-=-. Regularized LP (RLP) [9] is a parametric spectral modeling method motivated from a speech coding point of view for tackling a known problem in that field, over-sharpening of formants. RLP penalizes ... |

3 |
Kleijn, “Regularized Linear Prediction All-Pole Models
- Murthi, Bastiaan
- 2000
(Show Context)
Citation Context ...ral envelope gets smoother and asλ → 0, it reduces to conventional LP, WLP or SWLP depending on the way the autocorrelation is computed. We consider different window functions to compute F matrix. In =-=[11]-=- and [9] the authors used, respectively, Blackman and boxcar windows to compute F matrix. We compare these two windows and, additionally, also the Hamming window in speaker verification. In [12], [13]... |

2 |
Autocorrelation and double autocorrelation based spectral representations for a noisy word recognition systems
- Shimamura, Nguyen
(Show Context)
Citation Context ...[11] and [9] the authors used, respectively, Blackman and boxcar windows to compute F matrix. We compare these two windows and, additionally, also the Hamming window in speaker verification. In [12], =-=[13]-=-, [14], it was shown that the so-called double autocorrelation (DAC) sequence can be used for robust estimation of spectral envelope in the presence of additive noise. Thus, besides the different wind... |

1 |
Degraded word recognition based on segmental signal-to-noise ratio weighting
- Kobatake, Matsunoo
- 1994
(Show Context)
Citation Context ...nd [9] the authors used, respectively, Blackman and boxcar windows to compute F matrix. We compare these two windows and, additionally, also the Hamming window in speaker verification. In [12], [13], =-=[14]-=-, it was shown that the so-called double autocorrelation (DAC) sequence can be used for robust estimation of spectral envelope in the presence of additive noise. Thus, besides the different window fun... |