• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 11 - 20 of 53,575
Next 10 →

Table 5: Speaker verification results using baseline, MLLR, and combined systems. The MLLR SVM systems uses 8+8 transforms (same as last row in Table 3) or 2+2 transforms (last row in Table 4). The last row represents a three-way combined system. SWB-II Fisher SRE-04 SRE-05

in Improvements in MLLRtransform-based speaker recognition
by Andreas Stolcke, Luciana Ferrer, Sachin Kajarekar 2006
"... In PAGE 4: ...3. Baseline system comparison and combination The top part of Table5 gives complete results for our two cep- stral baseline systems, as well as the MLLR systems using 2+2 or 8+8 transforms. We observe that the results across all data sets are quite consistent, and, in particular, SRE-05 results are very similar to those on SRE-04.... In PAGE 4: ... Interestingly, the 2+2-transform MLLR system is competitive with the MFCC GMM system, and beats it in the 8-side condition. The middle and bottom parts of Table5 shows results with combinations of the two MFCC baseline systems with the MLLR systems, using a neural network for combining the sys- tem output scores. The combiner is trained to minimize DCF on the SRE-04 data sets.... In PAGE 4: ... A three-way com- bination does, however, improve over the best two-way system, yielding 22% (for 1-side training) and 10% (for 8-side training) relative EER reduction over the MLLR system by itself. The middle row in Table5 shows that even the 2+2- transform MLLR systems can boost the accuracy of a GMM baseline system signficantly when combine with the latter. This might be of interest if full word recognition is not an option, as... ..."
Cited by 3

Table 1: The performance of the SpkVDep and the SpkVUni systems in percentage (100 male and 100 female speakers)

in Impostor Modelling Techniques For Speaker Verification Based On Probabilistic Neural Networks
by Todor Ganchev, Nikos Fakotakis, George Kokkinakis
"... In PAGE 4: ...The gender-dependent performance results, for the two systems shown in the Table1 , were computed at the EER decision point. The expected superiority of the SpkVDep performance, in comparison with the SpkVUni one, was not observed.... In PAGE 4: ... On the other hand, the idea of using speaker-dependent background codebooks is useful in closed-set scenario, when only a limited number of enrolled speakers (and/or a small number of non-users) are available for constructing the background models. The gap between the performance of the male and fe- male speakers (Figure 2 and Table1 ) could be explained by the traditional difficulties with the female speech the speaker verification systems have, in opposition to the human listeners which have difficulties with the male voices [13]. Individual speech habits are essential cues for speaker verification performed by humans, and there- fore, factors like: pronunciation, prosodic style and word choice, should be taken into account when the speaker models for ASV are build.... ..."

TABLE III SPEAKER VERIFICATION RESULTS USING BASELINE, MLLR-SVM, AND COMBINED SYSTEMS. THE TOP VALUE IN EACH CELL IS THE EER, BELOW IT THE MINUMUM DCF VALUE APPEARS IN NORMAL FONT. FOR SRE-05, THE ACTUAL DCF VALUES USING THRESHOLDS OPTIMIZED ON SRE-04 ARE SHOWN IN boldface. THE MLLR-SVM SYSTEM USES 8+8 TRANSFORMS (SAME AS LAST ROW IN TABLE II). THE LAST ROW REPRESENTS A THREE-WAY COMBINED SYSTEM

in Speaker recognition with session variability normalization based on MLLR adaptation transforms
by Andreas Stolcke, Senior Member, Sachin S. Kajarekar, Luciana Ferrer, Elizabeth Shriberg 1987
Cited by 2

Table 1. Speaker verification results

in A Lognormal Tied Mixture Model Of Pitch For Prosody-Based Speaker Recognition
by M. Kemal Sönmez, Larry Heck, Mitchel Weintraub, Elizabeth Shriberg, M. Kemal, S Larry, Heck Mitchel, Weintraub Elizabeth Shriberg

Table 3: Speaker verification and identification results for the experiments using the GSM EFR encoded parameters.

in Influence Of GSM Speech Coding On The Performance Of Text-Independent Speaker Recognition
by S. Grassi, L. Besacier, A. Dufaux, M. Ansorge, F. Pellandini
"... In PAGE 2: ... Results are given in Tables 3 and 4. In Table3 , line (1) corresponds to the base- line (TIMIT EFR experiment reported from Tables 1 and 2). When extracting features from encoded parameters, we have a frame rate (imposed by the EFR coder) of 20 ms.... In PAGE 3: ... 5.2 Use of Higher Order LPC In Table3 it is observed that increasing the LPC order improves the performance, but only 10-th order LPC is available in the EFR encoded parameters. Different experiments we have carried out let us assume that higher order LPC information leaks in other encoded parameters (LTP lags and gain, and stochastic pulses and gain) and is thus available in the decoded speech, improving recognition.... In PAGE 3: ... We investigated the use of this higher order LPC informa- tion. The goal is to improve upon (14) in Table3 , the best... In PAGE 4: ... Naturally, the performance is still better when extracting features from the original speech. We have improved upon line (14) in Table3 , got close to the baseline for speaker identification and improved upon the baseline for verification. 6.... In PAGE 4: ... The performance is also improved by using LSP parameters instead of cepstral coefficients. The best result we have obtained (line 14 in Table3 ) is slightly worse than the baseline in performance (line 1 in Table 3), but computationally more efficient (amount of feature vectors is halved, and vector dimension is reduced from 16 to 11). Future work should include finding ways of improving the baseline, varying either the speaker recognition system, or the feature extraction.... In PAGE 4: ... The performance is also improved by using LSP parameters instead of cepstral coefficients. The best result we have obtained (line 14 in Table 3) is slightly worse than the baseline in performance (line 1 in Table3 ), but computationally more efficient (amount of feature vectors is halved, and vector dimension is reduced from 16 to 11). Future work should include finding ways of improving the baseline, varying either the speaker recognition system, or the feature extraction.... ..."

Table 1: EERs when the training data is speech recorded through PSTN

in A Bayesian network approach combining pitch and spectral envelope features to reduce channel mismatch in speaker verification and forensic speaker recognition
by Mijail Arcienega, Anil Alexander, Philipp Zimmerman, Andrzej Drygajlo
"... In PAGE 3: ... Mismatched channel conditions were simulated using a) speech recorded through a PSTN, b) speech recorded through a cellular-telephone (GSM) and c) speech recorded in the calling room (Room). Figure 2 and Table1 show the equal error rates (EERs) of a classical GMM-UBM based speaker verification system compared to the Bayesian network system. Table 1: EERs when the training data is speech recorded through PSTN... ..."

Table 3: EERs (in %) and error reduction (error red.) obtained from the MFCC system, AFCPM system and fusion of the two systems. MFCC + AFCPM denotes the fusion of frame-weighted MFCC and AFCPM scores given in (10). Matched (Mismatched) means the enrollment handset is identical to (different from) the verification handsets. The test data from non-target speakers under Matched and Mismatched are identical. All represents the overall EERs obtained from gathering all test data from the target speakers us- ing both matched and mismatched handsets. Note that the MFCC system does not require phoneme alignments.

in Articulatory Feature-Based Conditional Pronunciation Modeling for Speaker
by Verification Ka-Yee Leung, Ka-yee Leung, Man-wai Mak 2006
"... In PAGE 4: ... 5. Results Table3 lists two sets of experimental results. The two experiments are different in the way the phoneme alignments for CPM were ob- tained.... In PAGE 4: ... The two experiments are different in the way the phoneme alignments for CPM were ob- tained. One set of results is summarized under Recognized Align- ment and the other under Forced Alignment in Table3 . The table is divided into three columns: Matched , Mismatched and All .... In PAGE 4: ... This was achieved by forced aligning the phoneme sequences of all enrollment and verification utterances with the transcribed word sequences and lexicon obtained from [8]. The results of using forced phoneme alignments are summarized under the heading Forced Alignment in Table3 . The overall EER is reduced to 22.... In PAGE 4: ...educed to 22.69%. The reduction from 25.83% to 22.69% suggests that the accuracy of phoneme alignments is critical to the verifica- tion performance of the AFCPM system. The experimental results of the MFCC system and the fusion systems are also summarized in Table3 . The fusion weights of the systems were determined from a four-fold cross validation using the test data from target speakers and non-target speakers.... In PAGE 4: ... The results demonstrate the effectiveness of adjusting the contribution of individual frames according to the score confidence in each frame. From the results corresponding to matched handsets and mismatched handsets in Table3 , it is clear that fusion of the spectral-based scores and AFCPM scores plays an important role in reducing the EER under handset mismatched conditions. As the spectral features are less reliable under handset mismatched condi- tions, the speaker information from AFCPM becomes more impor- tant.... ..."
Cited by 4

Table 1. User verification results expressed as equal error rates (%), when forcing the face detector to output a de- tected face, on three systems (face only, speaker only, and multi-modal fusion) under two impostor conditions (known in-set impostors vs. unknown out-of-set impostors).

in Multi-Modal Face And Speaker Identification On A Handheld Device
by Timothy Hazen Eugene, Eugene Weinstein, Ryan Kabir, Alex Park 2003
"... In PAGE 6: ....6.1. Forced Face Detection Results Table1 shows our user verification results for three systems (face ID only, speaker ID only, and our full multi-modal sys- tem) under two different impostor conditions (using only known in-set impostors vs. using only unknown out-of- set impostors).... ..."
Cited by 4

Table 1. User verification results expressed as equal error rates (%), when forcing the face detector to output a de- tected face, on three systems (face only, speaker only, and multi-modal fusion) under two impostor conditions (known in-set impostors vs. unknown out-of-set impostors).

in Multi-Modal Face and Speaker Identification on a Handheld Device
by Timothy J. Hazen, Eugene Weinstein, Ryan Kabir, Alex Park, Bernd Heisele
"... In PAGE 6: ....6.1. Forced Face Detection Results Table1 shows our user verification results for three systems (face ID only, speaker ID only, and our full multi-modal sys- tem) under two different impostor conditions (using only known in-set impostors vs. using only unknown out-of- set impostors).... ..."

Table 1: Performance on the test set of different unimodal verification systems

in Confidence Measures for Multimodal Identity Verification
by Martigny Valais Suisse, Multimodal Identity, Sebastien Marcel, Johnny Mariethoz, Samy Bengio, Christine Marcel
"... In PAGE 10: ...2 Baseline Results In this section, we present two different baseline results. First in Table1 , we show the performance of each modality alone, namely speaker verification (Voice) and face verification (Face). For both, we give results for configurations I and II, and for each configuration, FAR represents the false acceptance rate, FRR is the false rejection rate, while HTER is the half total error rate.... ..."
Next 10 →
Results 11 - 20 of 53,575
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University