Results 1 -
8 of
8
Articulatory Feature-Based Conditional Pronunciation Modeling for Speaker
- Speech Communication
, 2006
"... Because of the differences in education background, accents, etc., different persons have their unique way of pronunciation. This paper exploits the pronunciation characteristics of speakers and proposes a new conditional pronunciation modeling (CPM) technique for speaker verification. The proposed ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Because of the differences in education background, accents, etc., different persons have their unique way of pronunciation. This paper exploits the pronunciation characteristics of speakers and proposes a new conditional pronunciation modeling (CPM) technique for speaker verification. The proposed technique aims to establish a link between articulatory properties (e.g., manners and places of articulation) and phoneme sequences produced by a speaker. This is achieved by aligning two articulatory feature (AF) streams with a phoneme sequence determined by a phoneme recognizer, and formulating the probabilities of articulatory classes conditioned on the phonemes as speaker-dependent probabilistic models. The scores obtained from the AF-based pronunciation models are then fused with those obtained from a spectral-based speaker verification system, with the frame-by-frame fused scores weighted by the confidence of the pronunciation models. Evaluations based on the SPIDRE corpus demonstrate that AF-based CPM systems can recognize speakers even with short utterances and are readily combined with spectral-based systems to further enhance the reliability of speaker verification.
Speaker identification in the presence of room reverberation
- in Proc. IEEE Biometrics Symp
, 2007
"... ABSTRACT Speaker identification (SI) systems based on Gaussian Mixture Models (GMMs) have demonstrated high levels of accuracy when both training and testing signals are acquired in near ideal conditions. These same systems when trained and tested with signals acquired under non-ideal channels such ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
ABSTRACT Speaker identification (SI) systems based on Gaussian Mixture Models (GMMs) have demonstrated high levels of accuracy when both training and testing signals are acquired in near ideal conditions. These same systems when trained and tested with signals acquired under non-ideal channels such as telephone have been shown to have markedly lower accuracy levels. In this paper, we consider a reverberant test environment and its impact on SI. We measure the degradation in SI accuracy when the system is trained with clean signals but tested with reverberant signals. Next, we propose a method whereby training signals are first filtered with a family of reverberation filters prior to construction of speaker models; the reverberation filters are designed to approximate expected test room reverberation. Reverberant test signals are then scored against the family of speaker models and identification is made. Our research demonstrates that by approximating test room reverberation in the training signals, the channel mismatch problem can be reduced and SI accuracy increased.
Speaker Verification Using Adapted Articulatory Featurebased Conditional Pronunciation Modeling
- in Proc. ICASSP 2005
, 2005
"... This paper proposes a speaker verification system based on articulatory feature-based conditional pronunciation modeling (AFCPM). The system captures the pronunciation characteristics of speakers by modeling the linkage between the actual phones produced by the speakers and the state of articulation ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
This paper proposes a speaker verification system based on articulatory feature-based conditional pronunciation modeling (AFCPM). The system captures the pronunciation characteristics of speakers by modeling the linkage between the actual phones produced by the speakers and the state of articulations during speech production. The speaker models, which consist of conditional probabilities of two articulatory classes, are adapted from a set of universal background models (UBMs) via MAP adaptation. This creates a direct coupling between the speaker and background models, which prevents over-fitting the speaker models when the amount of speaker data is limited. Experimental results demonstrate that MAP adaptation not only enhances the discriminative power of the speaker models but also improves their robustness against handset mismatches. Results also show that fusing the scores derived from an AFCPM-based system and a conventional spectral-based system achieves an error rate that is significantly lower than that can be achieved by the individual systems. This suggests that AFCPM and spectral features are complementary to each other. 1.
SPEAKER IDENTIFICATION IN ROOM REVERBERATION USING GMM-UBM
"... Speaker recognition systems tend to degrade if the training and testing conditions differ significantly. Such situations may arise due to the use of different microphones, telephone and mobile handsets or different acoustic conditions. Recently, the effect of the room acoustics on speaker identifica ..."
Abstract
- Add to MetaCart
(Show Context)
Speaker recognition systems tend to degrade if the training and testing conditions differ significantly. Such situations may arise due to the use of different microphones, telephone and mobile handsets or different acoustic conditions. Recently, the effect of the room acoustics on speaker identification (SI) has been investigated and it has been shown that a loss in accuracy results when using clean training and reverberated testing signals. Various techniques like dereverberation, use of multiple microphones, compensations have been proposed to minimize/alleviate the mismatch thereby increasing the SI accuracies. In this paper, we propose to use a Gaussian mixture model-Universal background model (GMM-UBM), with the multiple speaker model approach previously proposed, to compensate for the acoustical mismatch. By using this approach, the SI accuracies have improved over the conventional GMM based SI systems in the presence of room reverberation. Index Terms — Speaker recognition, Identification 1.
Adaptive articulatory feature-based conditional pronunciation
, 2005
"... modeling for speaker verification ..."
(Show Context)
ADAPTIVE CONDITIONAL PRONUNCIATION MODELING USING ARTICULATORY FEATURES FOR SPEAKER VERIFICATION
"... This paper proposes an articulatory feature-based conditional pronunciation modeling (AFCPM) technique for speaker verification. The technique models the pronunciation behaviors of speakers by creating a link between the actual phones produced by the speakers and the state of articulations during sp ..."
Abstract
- Add to MetaCart
(Show Context)
This paper proposes an articulatory feature-based conditional pronunciation modeling (AFCPM) technique for speaker verification. The technique models the pronunciation behaviors of speakers by creating a link between the actual phones produced by the speakers and the state of articulations during speech production. Speaker models consisting of conditional probabilities of two articulatory classes are adapted from a set of universal background models (UBMs) using MAP adaptation technique. This adaptation approach aims to prevent over-fitting the speaker models when the amount of speaker data is insufficient for a direct estimation. Experimental results show that the adaptation technique can enhance the discriminating power of speaker models by establishing a tighter coupling between speaker models and the UBMs. Results also show that fusing the scores derived from an AFCPM-based system and a conventional spectral-based system achieves a significantly lower error rate than that of the individual systems. This suggests that AFCPM and spectral features are complementary to each other. 1.
Speaker Verification via Articulatory Feature-based Conditional Pronunciation Modeling with Vowel and Consonant Mixture Models
"... Articulatory feature-based conditional pronunciation modeling (AFCPM) aims to capture the pronunciation characteristics of speakers by modeling the linkage between the states of articulation during speech production and the actual phones produced by a speaker. Previous AFCPM systems use one discrete ..."
Abstract
- Add to MetaCart
(Show Context)
Articulatory feature-based conditional pronunciation modeling (AFCPM) aims to capture the pronunciation characteristics of speakers by modeling the linkage between the states of articulation during speech production and the actual phones produced by a speaker. Previous AFCPM systems use one discrete density function for each phoneme to model the pronunciation characteristics of speakers. This paper proposes using a mixture of discrete density functions for AFCPM. In particular, the pronunciation characteristics of each phoneme is modeled by two density functions: one responsible for describing the articulatory features that are more relevant to vowels and the other for consonants. Verification scores are the weighted sum of the outputs of the two models. To enhance the resolution of the pronunciation models, four articulatory properties (front-back, liprounding, place of articulation, and manner of articulation) are used for pronunciation modeling. The proposed AFCPM is applied to a speaker verification task. Results show that using four articulatory features achieves a lower error rate as compared to using two features (manner and place of articulation) only. It was also found that dividing the articulatory properties into two groups is an effective means of solving the data-sparseness problem encountered in the training phase of AFCPM systems. 1.
Speaker
"... verification based on the fusion of speech acoustics and inverted articulatory signals ..."
Abstract
- Add to MetaCart
(Show Context)
verification based on the fusion of speech acoustics and inverted articulatory signals