@MISC{Uzan_iknow, author = {Lior Uzan and Lior Wolf}, title = {I Know That Voice: Identifying the Voice Actor Behind the Voice}, year = {} }
Share
OpenURL
Abstract
Intentional voice modifications by electronic or non-electronic means challenge automatic speaker recognition systems. Previous work focused on detecting the act of disguise or identifying everyday speakers disguising their voices. Here, we propose a benchmark for the study of voice disguise, by studying the voice variability of profes-sional voice actors. A dataset of 114 actors playing 647 characters is created. It contains 19 hours of captured speech, divided into 29,733 utterances tagged by charac-ter and actor names, which is then further sampled. Text-independent speaker identification of the actors based on a novel benchmark training on a subset of the characters they play, while testing on new unseen characters, shows an EER of 17.1%, HTER of 15.9%, and rank-1 recognition rate of 63.5 % per utterance when training a Convolutional Neural Network on spectrograms generated from the utter-ances. An I-Vector based system was trained and tested on the same data, resulting in 39.7 % EER, 39.4 % HTER, and rank-1 recognition rate of 13.6%. 1.