Results 1 - 10
of
22
Robust speaker recognition in noisy conditions
- IEEE TRANS. AUDIO, SPEECH LANG. PROCESS
, 2007
"... ..."
(Show Context)
Model Transformation For Robust Speaker Recognition From Telephone Data
- in ICASSP-97
, 1997
"... In the context of automatic speaker recognition, we propose a model transformation technique that renders speaker models more robust to acoustic mismatches and to data scarcity by appropriately increasing their variances. We use a stereo database containing speech recorded simultaneously under diffe ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
In the context of automatic speaker recognition, we propose a model transformation technique that renders speaker models more robust to acoustic mismatches and to data scarcity by appropriately increasing their variances. We use a stereo database containing speech recorded simultaneously under different acoustic conditions to derive a synthetic variance distribution. This distribution is then used to modify the variances of other speaker models from other telephone databases. The technique is illustrated with experiments conducted on a locally collected database and on the NIST'95 and '96 subsets of the Switchboard Corpus. 1. INTRODUCTION Many applications of speaker identification systems (speaker-ID for short) assume that the users access the system remotely. Typically, the channel involved in the communication is that of the telephone. Because the handset and the line can vary from call to call, there is often an acoustic mismatch between the data collected to train the speaker mo...
A Real-Time Text- Independent Speaker Identification System
- Proc of the 12th Intl Conference on Image Analysis and Processing
, 2003
"... The paper presents a real-time speaker identification system based on the analysis of the audio track of a video stream. The system has been employed in the context of automatic video segmentation. It uses features evaluated in both domains of time and frequency. Their combined use significantly imp ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
The paper presents a real-time speaker identification system based on the analysis of the audio track of a video stream. The system has been employed in the context of automatic video segmentation. It uses features evaluated in both domains of time and frequency. Their combined use significantly improved the performance of the system.
Speaker identification in the presence of room reverberation
- in Proc. IEEE Biometrics Symp
, 2007
"... ABSTRACT Speaker identification (SI) systems based on Gaussian Mixture Models (GMMs) have demonstrated high levels of accuracy when both training and testing signals are acquired in near ideal conditions. These same systems when trained and tested with signals acquired under non-ideal channels such ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
ABSTRACT Speaker identification (SI) systems based on Gaussian Mixture Models (GMMs) have demonstrated high levels of accuracy when both training and testing signals are acquired in near ideal conditions. These same systems when trained and tested with signals acquired under non-ideal channels such as telephone have been shown to have markedly lower accuracy levels. In this paper, we consider a reverberant test environment and its impact on SI. We measure the degradation in SI accuracy when the system is trained with clean signals but tested with reverberant signals. Next, we propose a method whereby training signals are first filtered with a family of reverberation filters prior to construction of speaker models; the reverberation filters are designed to approximate expected test room reverberation. Reverberant test signals are then scored against the family of speaker models and identification is made. Our research demonstrates that by approximating test room reverberation in the training signals, the channel mismatch problem can be reduced and SI accuracy increased.
New Filter Structure based on Admissible Wavelet Packet Transform for Text-Independent Speaker Identification
"... Abstract — Identical acoustic features like Mel frequency cepstral Coefficients (MFCC)and Linear predictive cepstral coefficients (LPCC) are being widely used for different tasks like speech recognition and speaker recognition, whereas the requirement of speaker recognition is different than that of ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract — Identical acoustic features like Mel frequency cepstral Coefficients (MFCC)and Linear predictive cepstral coefficients (LPCC) are being widely used for different tasks like speech recognition and speaker recognition, whereas the requirement of speaker recognition is different than that of speech recognition. In MFCC feature representation, the Mel frequency scale is used to get a high resolution in low frequency region, and a low resolution in high frequency region. This kind of processing is good for obtaining stable phonetic information, but not suitable for speaker features that are located in high frequency regions. Further MFCC uses short time Fourier transform (STFT), which has fixed time-frequency resolution. Considering above facts, in this paper we have proposed a new filter structure based on admissible wavelet packet transform for text-independent speaker identification. Multiresolution capabilities of wavelet packet transform are used to derive the new features. The performance of the proposed features is evaluated using the most commonly used Gaussian mixture model (GMM) as well as the continuous density hidden Markov model (CDHMM) classifiers. Improved speaker identification rate is obtained using the proposed features compared to the MFCC and other Wavelet transform based features. Further the results show that CDHMM works better than the GMM for small number of mixture densities. Identification accuracy of 99.76 % is achieved by conducting the experiments on TIMIT database.
Foreign accent conversion through voice morphing
"... We present a voice morphing strategy that can be used to generate a continuum of accent transformations between a foreign speaker and a native speaker. The approach performs a cepstral decomposition of speech into spectral slope and spectral detail. Accent conversions are then generated by combining ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
We present a voice morphing strategy that can be used to generate a continuum of accent transformations between a foreign speaker and a native speaker. The approach performs a cepstral decomposition of speech into spectral slope and spectral detail. Accent conversions are then generated by combining the spectral slope of the foreign speaker with a morph of the spectral detail of the native speaker. Spectral morphing is achieved by representing the spectral detail through pulse density modulation and averaging pulses in a pair-wise fashion. The technique is evaluated on parallel recordings from two ARCTIC speakers using objective measures of acoustic quality, speaker identity and foreign accent that have been recently shown to correlate with perceptual results from listening tests. Index Terms: voice morphing, accent conversion. 1.
Text Dependent and Text Independent Speaker Verification Systems. Technology and Applications
, 2003
"... This paper discusses the differences in Text Dependent and Text Independent Speaker Verification Systems. It shows the basic principles behind these technologies. Some most common applications are reviewed. 1 ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper discusses the differences in Text Dependent and Text Independent Speaker Verification Systems. It shows the basic principles behind these technologies. Some most common applications are reviewed. 1
Robust Speaker Recognition in Unknown Noisy Conditions
, 2005
"... This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of spe ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the Internet. While the technologies promise an additional biometric layer of security to protect the user, the practical implementation of such systems faces many challenges. One of these is environmental noise. Due to the mobile nature of such systems, the noise sources can be highly time-varying and potentially unknown. This raises the requirement for noise robustness in the absence of information of the noise. This paper describes a method, named universal compensation (UC), that combines multi-condition training and the missing-feature method to model noises with unknown temporal-spectral characteristics. Multi-condition training is conducted using simulated noisy data with limited noise varieties, providing a “coarse ” compensation for the noise, and the missing-feature method refines the compensation by ignoring noise variations outside the given training conditions, thereby reducing the training and testing mismatch. This paper is focused on several issues relating to the implementation of the UC model for real-world applications. These include the generation
FOREIGN ACCENT CONVERSION THROUGH VOICE MORPHING
"... We present a voice morphing strategy that can be used to generate a continuum of accent transformations between a foreign speaker and a native speaker. The approach performs a cepstral decomposition of speech into spectral slope and spectral detail. Accent conversions are then generated by combining ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
We present a voice morphing strategy that can be used to generate a continuum of accent transformations between a foreign speaker and a native speaker. The approach performs a cepstral decomposition of speech into spectral slope and spectral detail. Accent conversions are then generated by combining the spectral slope of the foreign speaker with a morph of the spectral detail of the native speaker. Spectral morphing is achieved by representing the spectral detail through pulse density modulation and averaging pulses in a pair-wise fashion. The technique is validated on parallel recordings from two ARCTIC speakers using both objective and subjective measures of acoustic quality, speaker identity and foreign accent. Index Terms — voice morphing, accent conversion.