Results 1 - 10
of
18
Speaker recognition: A tutorial
"... A tutorial on the design and development of automatic speaker-recognition systems is presented. Automatic speaker recognition is the use of a machine to recognize a person from a spoken phrase. These systems can operate in two modes: to identify a particular person or to verify a person’s claimed id ..."
Abstract
-
Cited by 121 (1 self)
- Add to MetaCart
A tutorial on the design and development of automatic speaker-recognition systems is presented. Automatic speaker recognition is the use of a machine to recognize a person from a spoken phrase. These systems can operate in two modes: to identify a particular person or to verify a person’s claimed identity. Speech processing and the basic components of automatic speakerrecognition systems are shown and design tradeoffs are discussed. Then, a new automatic speaker-recognition system is given. This recognizer performs with 98.9 % correct identification. Last, the performances of various systems are compared.
Acoustic-labial speaker verification
- Audio and Video based Person Authentication - AVBPA97, volume LNCS-1206
, 1997
"... defined), by Ben Gold and Nelson Morgan. ..."
Identifying Non-Linguistic Speech Features
- Proc Eurospeech
"... Over the last decade technological advances have been made which enable us to envision real-world applications of speech technologies. It is possible to foresee applications, for example, information centers in public places such as train stations and airports, where the spoken query is to be recogn ..."
Abstract
-
Cited by 24 (13 self)
- Add to MetaCart
Over the last decade technological advances have been made which enable us to envision real-world applications of speech technologies. It is possible to foresee applications, for example, information centers in public places such as train stations and airports, where the spoken query is to be recognized without even prior knowledge of the languagebeing spoken. Other applications may require accurate identification of the speaker for security reasons, including control of access to confidential information or for telephone-based transactions.
Automatic speaker recognition using gaussian mixture speaker models
- The Lincoln Laboratory Journal
, 1995
"... • Speech conveys several levels ofinformation. On a primary level, speech conveys the words or message being spoken, but on a secondary level, speech also reveals information about the speaker. The Speech Systems Technology group at Lincoln Laboratory has developed and experimented with approaches f ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
• Speech conveys several levels ofinformation. On a primary level, speech conveys the words or message being spoken, but on a secondary level, speech also reveals information about the speaker. The Speech Systems Technology group at Lincoln Laboratory has developed and experimented with approaches for automatically recognizing the words being spoken, the language being spoken, and the topic ofa conversation. In this article we present an overview of our research efforts in a fourth area-automatic speaker recognition. We base our approach on a statistical speaker-modeling technique that represents the underlying characteristic sounds ofa person's voice. Using these models, we build speaker recognizers that are computationally inexpensive and capable of recognizing a speaker regardless ofwhat is being said. Performance ofthe systems is evaluated for a wide range ofspeech quality; from clean speech to telephone speech, by using several standard speech corpora. TASKS THAT ARE EASIlY PERFORMED by humans, such as face or speech recognition, prove difficult
Automatic Person Recognition by Using Acoustic and Geometric Features
, 1993
"... The paper describes a multisensorial person identification system: visual and acoustic cues are used jointly for person identification. A simple approach, based on the fusion of the lists of scores produced independently by a speaker recognition system and a face recognition system, is presented. Ex ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
The paper describes a multisensorial person identification system: visual and acoustic cues are used jointly for person identification. A simple approach, based on the fusion of the lists of scores produced independently by a speaker recognition system and a face recognition system, is presented. Experiments are reported which show that integration of visual and acoustic information enhances both performance and reliability of the separate systems. Finally two network architectures, based on radial basis function theory, are proposed to describe integration at different levels of abstraction. Keywords: face recognition, speaker identification, classification 1. Introduction This paper describes an automatic person recognition system 1 which uses both acoustic features, derived from the analysis of a given speech signal, and visual ones, related to distinctive parameters of the face of the person who uttered that speech signal. Visual and acoustic cues are used jointly for person id...
A Phone-based Approach to Non-Linguistic Speech Feature Identification
- Computer Speech and Language
, 1995
"... In this paper we present a general approach to identifying non-linguistic speech features from the recorded signal using phone-based acoustic likelihoods. The basic idea is to process the unknown speech signal by feature-specific phone model sets in parallel, and to hypothesize the feature value ass ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
In this paper we present a general approach to identifying non-linguistic speech features from the recorded signal using phone-based acoustic likelihoods. The basic idea is to process the unknown speech signal by feature-specific phone model sets in parallel, and to hypothesize the feature value associated with the model set having the highest likelihood. This technique is shown to be effective for text-independent gender, speaker, and language identification. Text-independent speaker identification accuracies of 98.8% on TIMIT (168 speakers) and 99.2% on BREF (65 speakers), were obtained with one utterance per speaker, and 100% with 2 utterances for both corpora. Experiments in which speaker-specific models were estimated without using of the phonetic transcriptions for the TIMIT speakers had the same identification accuracies obtained with the use of the transcriptions. French/English language identification is better than 99% with 2s of read, laboratory speech. On spontaneous teleph...
Speaker recognition using hidden Markov models, dynamic time warping and vector quantisation
, 1995
"... This paper evaluates continuous density hidden Markov models (CDHMM), dynamic time warping (DTW) and distortionbased vector quantisation (VQ) for speaker recognition, emphasising the performance of each model structure across incremental amounts of training data. Text-independent (TI) experiments ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
This paper evaluates continuous density hidden Markov models (CDHMM), dynamic time warping (DTW) and distortionbased vector quantisation (VQ) for speaker recognition, emphasising the performance of each model structure across incremental amounts of training data. Text-independent (TI) experiments are performed with VQ and CDHMMs, and text-dependent (TD) experiments are performed with DTW, VQ and CDHMMs. We show for TI speaker recognition, VQ performs better than an equivalent CDHMM with one training version, but is outperformed by CDHMM when trained with ten training versions. For TD experiments we show that DTW outperforms VQ and CDHMMs for sparse amounts of training data, but with more data, the performance of each model is indistinguishable. The performance of the TD procedures is consistently superior to TI, which is attributed to subdividing the speaker recognition problem into smaller speaker-word problems. We also show a large variation in performance across the differen...
Statistical Techniques for Language Recognition: An Introduction and Guide for Cryptanalysts
- Cryptologia
, 1993
"... We explain how to apply statistical techniques to solve several language-recognition problems that arise in cryptanalysis and other domains. Language recognition is important in cryptanalysis because, among other applications, an exhaustive key search of any cryptosystem from ciphertext alone requir ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We explain how to apply statistical techniques to solve several language-recognition problems that arise in cryptanalysis and other domains. Language recognition is important in cryptanalysis because, among other applications, an exhaustive key search of any cryptosystem from ciphertext alone requires a test that recognizes valid plaintext. Written for cryptanalysts, this guide should also be helpful to others as an introduction to statistical inference on Markov chains. Modeling language as a finite stationary Markov process, we adapt a statistical model of pattern recognition to language recognition. Within this framework we consider four welldefined language-recognition problems: 1) recognizing a known language, 2) distinguishing a known language from uniform noise, 3) distinguishing unknown 0th-order noise from unknown 1st-order language, and 4) detecting non-uniform unknown language. For the second problem we give a most powerful test based on the Neyman-Pearson Lemma. For the oth...
Experiments With Speaker Verification Over The Telephone
- Proc. Eurospeech’95
"... In this paper we present a study on speaker verification showing achievable performance levels for both high quality speech and telephone speech and for two operational modes, i.e. textdependent and text-independent speaker verification. A statistical modeling approach is taken, where for text indep ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
In this paper we present a study on speaker verification showing achievable performance levels for both high quality speech and telephone speech and for two operational modes, i.e. textdependent and text-independent speaker verification. A statistical modeling approach is taken, where for text independent verification the talker is viewed as a source of phones, modeled by a fully connected Markov chain, where the lexical and syntactic structures of the language are approximated by local phonotactic constraints. A first series of experiments were carried out on high quality speech from the BREF corpus to validate this approach and resulted in an a posteriori equal error rate of 0.3% in textdependent as well as in text-independent mode. A second series of experiments were carried out on a telephone corpus recorded specifically for speaker verification algorithm development. On this data, the lowest equal error rate is 2.9% for the text-dependent mode when 2 trials are allowed per attempt...

