Results 1 - 10
of
23
Spoofing countermeasures for the protection of automatic speaker recognition from attacks with artificial signals
- in Proc. 13th Interspeech
, 2012
"... Certain short intervals in a speech signal X, e.g. those corresponding to voiced regions, give rise to higher scores or likelihoods than others and the chances of a spoofing attack succeeding can thus be increased by concentrating on a short interval or sequence of frames in X = {x1,..., xm} which g ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
(Show Context)
Certain short intervals in a speech signal X, e.g. those corresponding to voiced regions, give rise to higher scores or likelihoods than others and the chances of a spoofing attack succeeding can thus be increased by concentrating on a short interval or sequence of frames in X = {x1,..., xm} which gives rise to the highest score. Let T = {t1,..., tn} be such an interval short enough so that all frames in the interval provoke high scores, but long enough so that relevant dynamic information (e.g. delta and acceleration coefficients) can be captured and/or modeled. In order to produce a sample of significant duration, T can be replicated and concatenated any number of times to produce an audio signal of arbitrary length. In practice, the resulting concatehal-00783789,
On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals
, 2012
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case
- in APSIPA ASC
, 2012
"... Abstract—Voice conversion technique, which modifies one speaker’s (source) voice to sound like another speaker (target), presents a threat to automatic speaker verification. In this paper, we first present new results of evaluating the vulnerability of current state-of-the-art speaker verification s ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
(Show Context)
Abstract—Voice conversion technique, which modifies one speaker’s (source) voice to sound like another speaker (target), presents a threat to automatic speaker verification. In this paper, we first present new results of evaluating the vulnerability of current state-of-the-art speaker verification systems: Gaussian mixture model with joint factor analysis (GMM-JFA) and probabilistic linear discriminant analysis (PLDA) systems, against spoofing attacks. The spoofing attacks are simulated by two voice conversion techniques: Gaussian mixture model based conversion and unit selection based conversion. To reduce false acceptance rate caused by spoofing attack, we propose a general anti-spoofing attack framework for the speaker verification systems, where a converted speech detector is adopted as a post-processing module for the speaker verification system’s acceptance decision. The detector decides whether the accepted claim is human speech or converted speech. A subset of the core task in the NIST SRE 2006 corpus is used to evaluate the vulnerability of speaker verification system and the performance of converted speech detector. The results indicate that both conversion techniques can increase the false acceptance rate of GMM-JFA and PLDA system, while the converted speech detector can reduce the false acceptance rate
I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry
"... Voice imitation is mimicry of another speaker’s voice characteristics and speech behavior. Professional voice mimicry can create entertaining, yet realistic sounding target speaker renditions. As mimicry tends to exaggerate prosodic, idiosyncratic and lexical behavior, it is unclear how modern spect ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
(Show Context)
Voice imitation is mimicry of another speaker’s voice characteristics and speech behavior. Professional voice mimicry can create entertaining, yet realistic sounding target speaker renditions. As mimicry tends to exaggerate prosodic, idiosyncratic and lexical behavior, it is unclear how modern spectral-feature automatic speaker verification systems respond to mimicry “attacks”. We study the vulnerability of two well-known speaker recognition systems, traditional Gaussian mixture model – universal background model (GMM-UBM) and a state-of-the-art i-vector classifier with cosine scoring. The material consists of one professional Finnish imitator impersonating five wellknown Finnish public figures. In a carefully controlled setting, mimicry attack does slightly increase the false acceptance rate for the i-vector system, but generally this is not alarmingly large in comparison to voice conversion or playback attacks. Index Terms: Voice imitation, speaker recognition, mimicry attack
Spoofing and countermeasures for automatic speaker verification
"... It is widely acknowledged that most biometric systems are vulnerable to spoofing, also known as imposture. While vulnerabilities and countermeasures for other biometric modalities have been widely studied, e.g. face verification, speaker verification systems remain vulnerable. This paper describes s ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
(Show Context)
It is widely acknowledged that most biometric systems are vulnerable to spoofing, also known as imposture. While vulnerabilities and countermeasures for other biometric modalities have been widely studied, e.g. face verification, speaker verification systems remain vulnerable. This paper describes some specific vulnerabilities studied in the literature and presents a brief survey of recent work to develop spoofing countermeasures. The paper concludes with a discussion on the need for standard datasets, metrics and formal evaluations which are needed to assess vulnerabilities to spoofing in realistic scenarios without prior knowledge. Index Terms: spoofing, imposture, automatic speaker verification 1.
Synthetic speech detection using temporal modulation feature”,
- Proceedings of ICASSP2013,
, 2013
"... ABSTRACT Voice conversion and speaker adaptation techniques present a threat to current state-of-the-art speaker verification systems. To prevent such spoofing attack and enhance the security of speaker verification systems, the development of anti-spoofing techniques to distinguish synthetic and h ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
ABSTRACT Voice conversion and speaker adaptation techniques present a threat to current state-of-the-art speaker verification systems. To prevent such spoofing attack and enhance the security of speaker verification systems, the development of anti-spoofing techniques to distinguish synthetic and human speech is necessary. In this study, we continue the quest to discriminate synthetic and human speech. Motivated by the facts that current analysis-synthesis techniques operate on frame level and make the frame-by-frame independence assumption, we proposed to adopt magnitude/phase modulation features to detect synthetic speech from human speech. Modulation features derived from magnitude/phase spectrum carry long-term temporal information of speech, and may be able to detect temporal artifacts caused by the frame-by-frame processing in the synthesis of speech signal. From our synthetic speech detection results, the modulation features provide complementary information to magnitude/phase features. The best detection performance is obtained by fusing phase modulation features and phase features, yielding an equal error rate of 0.89%, which is significantly lower than the 1.25% of phase features and 10.98% of MFCC features.
Comparison of Human Listeners and Speaker Verification Systems Using Voice Mimicry Data
"... In this work, we compare the performance of human listeners and two well known speaker verification systems in presence of voice mimicry. Our focus is to gain insights on how well human listeners recognize speakers when mimicry data is in-cluded and compare it to the overall performance of state-of- ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
In this work, we compare the performance of human listeners and two well known speaker verification systems in presence of voice mimicry. Our focus is to gain insights on how well human listeners recognize speakers when mimicry data is in-cluded and compare it to the overall performance of state-of-the-art speaker verification systems, a traditional Gaussian mix-ture model-universal background model (GMM-UBM) and an i-vector based classifier with cosine scoring. We have found that for the studied material in Finnish language, the mimicry attack was able to slightly increase the error rate in a range acceptable for the general performance of the system (EER from 9 to 11%). Our data reveals that enhancing the audio material by minimiz-ing the differences of data collected in different environments improves the accuracy of the speaker verification systems even in the presence of mimicked speech. The performance of the human listening panel shows that successfully imitated speech is difficult to recognize, even more difficult to recognize a per-son who is intentionally trying to modify his or her own voice. The average listener made 8 errors from 34 selected trials while the automatic systems had 6 error in the same set. 1.
Joint speaker verification and anti-spoofing in the i-vector space
- IEEE Trans. on Information Forensics and Security
, 2015
"... Abstract—Any biometric recognizer is vulnerable to spoofing attacks and hence voice biometric, also called automatic speaker verification (ASV), is no exception; replay, synthesis and conver-sion attacks all provoke false acceptances unless countermeasures are used. We focus on voice conversion (VC) ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Any biometric recognizer is vulnerable to spoofing attacks and hence voice biometric, also called automatic speaker verification (ASV), is no exception; replay, synthesis and conver-sion attacks all provoke false acceptances unless countermeasures are used. We focus on voice conversion (VC) attacks considered as one of the most challenging for modern recognition systems. To detect spoofing, most existing countermeasures assume explicit or implicit knowledge of a particular VC system and focus on designing discriminative features. In this work, we explore back-end generative models for more generalized countermeasures. Specifically, we model synthesis-channel subspace to perform speaker verification and anti-spoofing jointly in the i-vector space, which is a well-established technique for speaker modeling. It enables us to integrate speaker verification and anti-spoofing tasks into one system without any fusion techniques. To validate the proposed approach, we study vocoder-matched and vocoder-mismatched ASV and VC spoofing detection on the NIST 2006 speaker recognition evaluation dataset. Promising results are obtained for standalone countermeasures as well as their combination with ASV systems using score fusion and joint approach. Index Terms—speaker recognition, spoofing, voice conversion attack, i-vector, joint verification and anti-spoofing.
Classifiers for Synthetic Speech Detection: A Comparison
"... Automatic speaker verification (ASV) systems are highly vul-nerable against spoofing attacks, also known as imposture. With recent developments in speech synthesis and voice conversion technology, it has become important to detect synthesized or voice-converted speech for the security of ASV systems ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Automatic speaker verification (ASV) systems are highly vul-nerable against spoofing attacks, also known as imposture. With recent developments in speech synthesis and voice conversion technology, it has become important to detect synthesized or voice-converted speech for the security of ASV systems. In this paper, we compare five different classifiers used in speaker recognition to detect synthetic speech. Experimental results conducted on the ASVspoof 2015 dataset show that support vector machines with generalized linear discriminant kernel (GLDS-SVM) yield the best performance on the development set with the EER of 0.12 % whereas Gaussian mixture model (GMM) trained using maximum likelihood (ML) criterion with the EER of 3.01 % is superior for the evaluation set. Index Terms: spoof detection, countermeasures, speaker recognition