Results 1 -
4 of
4
Accessing the Spoken Word
, 2005
"... Spoken word audio collections cover many domains, including radio and television broadcasts, oral narratives, governmental proceedings, lectures, and telephone conversations. The collection, access and preservation of such data is stimulated by political, economic, cultural and educational needs. Th ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Spoken word audio collections cover many domains, including radio and television broadcasts, oral narratives, governmental proceedings, lectures, and telephone conversations. The collection, access and preservation of such data is stimulated by political, economic, cultural and educational needs. This paper outlines the major issues in the field, reviews the current state of technology, examines the rapidly changing policy issues relating to privacy and copyright, and presents issues relating to the collection and preservation of spoken audio content.
Robust Feature Extraction and Acoustic Modeling at Multitel: Experiments on the Aurora Databases
- in Proc. of Interspeech’03
, 2003
"... This paper intends to summarize some of the robust feature extraction and acoustic modeling technologies used at Multitel, together with their assessment on some of the ETSI Aurora reference tasks. Ongoing work and directions for further research are also presented. For feature extraction (FE), we a ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper intends to summarize some of the robust feature extraction and acoustic modeling technologies used at Multitel, together with their assessment on some of the ETSI Aurora reference tasks. Ongoing work and directions for further research are also presented. For feature extraction (FE), we are using PLP coefficients. Additive and convolutional noise are addressed using a cascade of spectral subtraction and temporal trajectory filtering. For acoustic modeling (AM), artificial neural networks (ANNs) are used for estimating the HMM state probabilities. At the junction of FE and AM, the multi-band structure provides a way to address the needs of robustness by targeting both processing levels. Robust features within sub-bands can be extracted using a form of discriminant analysis. In this work, this is obtained using sub-band ANN acoustic models. The robust sub-band features are then used for the estimation of state probabilities. These systems are evaluated on the Aurora tasks in comparison to the existing ETSI features. Our baseline system has similar performance than the ETSI advanced features coupled with the HTK back-end. On the Aurora 3 tasks, the multi-band system outperforms the best ETSI results with an average reduction of the word error rate of about 62 % with respect to the baseline ETSI system and of about 18 % with respect to the advanced ETSI system. This confirm previous positive experience with the multi-band architecture on other databases. 1.
Single Trial Estimation of Evoked Potentials using Gaussian Mixture Models with Integrated Noise Component
- in Dorffner G., et al.(eds.), Artificial Neural Networks - ICANN 2001, International Conference
, 2001
"... Gaussian Mixture Models with integrated noise component are a method developed for speech analysis to estimate signals hidden in background noise. We apply this technique to estimate single trial evoked potentials which are buried in noise up to five times stronger than the signal. An empirical ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Gaussian Mixture Models with integrated noise component are a method developed for speech analysis to estimate signals hidden in background noise. We apply this technique to estimate single trial evoked potentials which are buried in noise up to five times stronger than the signal. An empirical study using artificial data is presented and results are shown to compare favourably to other techniques for single trial estimation. 1
Segmentation Of A Speech Waveform According To Glottal Timing Information Using A Standard Autoregressive-HMM
, 2000
"... This report presents a method to segment voiced speech according to the underlying behaviour of the glottis, using the time-domain waveform alone, from which statistics concerning the glottis and pitch can be determined. Segmentation utilises spectral changepoints in the speech waveform at the sub p ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This report presents a method to segment voiced speech according to the underlying behaviour of the glottis, using the time-domain waveform alone, from which statistics concerning the glottis and pitch can be determined. Segmentation utilises spectral changepoints in the speech waveform at the sub pitch period level which demarcate glottal closure and opening instants for example. Spectral changepoints are identied by rstly modelling voiced speech using a 3 state autoregressive hidden Markov model (AR-HMM) where each state corresponds to a dierent glottal phase, and then determining the optimal state sequence and hence segmentation using the bounded state duration (BSD) Viterbi algorithm. The algorithm has the Liljencrants-Fant (LF) glottal model as a theoretical basis. AR-HMM parameters are estimated using two techniques which are compared: expectation-maximisation (EM) and Gibb's sampler-based Markov chain Monte Carlo (MCMC). The validity and robustness of the algorithm is tested...

