Results 1 - 10
of
51,936
Table 1: The best recognition results for clean audio-visual database
2002
"... In PAGE 2: ...or any other triphone HMM, we xed A and V at 1.0 and 0.0 respectively. Table1 shows digit recognition results for the audio-visual data arti cially corrupted by an audio white noise, and for the clean data. These results show that our multi-modal ASR system achieves better performance than the audio-only ASR in all environments.... ..."
Cited by 3
Table 1. Highlight extraction results with test data set Audio Feature Only Visual Feature Only Audio-Visual Combination Sequence Ground
"... In PAGE 13: ... The results of audio-based method, motion-based methods and our proposed integrated framework performed on test data set are shown in the left, middle and right columns of Table 1, respectively. As can be seen from Table1 , the proposed integrated framework outperforms the other two methods on almost all the test sequences in terms of both the numbers of false positives and false negatives. This experimental result also clarifies our statements that the noises (e.... In PAGE 13: ... C. Trade-off between False Positives and False Negatives The false positive values shown in Table1 could be further reduced if additional post-processes are adopted. When observing the falsely detected highlights (which will be discussed later in Section 4.... In PAGE 14: ... Comparison between Symmetric Combination and Visual-Centric Framework Here, we compare the proposed symmetric audio-visual combination framework with the visual-centric method, which is further refined using audio information. The motion-based method shown in the leftmost column of Table 3 is the same as in Table1 . In this experiment, we consider that if the number of cheering clips in one segment exceeds a certain threshold, this segment is more probable to attract the audience and reflects higher degree of excitement.... ..."
Cited by 1
Table 2. The automatic fusion accuracies are higher than either of the audio and visual modalities, at all degradation levels. This highlights the complementary nature of the audio and visual speech signals and the fusion robustness. At the most severe mismatch levels tested (SNR 21dB, QF 2), the audio, visual, and audio-visual accuracies are 37.1%, 48%, and 71.4% respectively, giving a relative improvement of 92.5% on the audio and 49% on the visual accuracies.
"... In PAGE 7: ...ixtures per state. The performance w.r.t. audio degradation is given in the second row of Table2 .... In PAGE 9: ...Table2 . Automatic audio-visual fusion accuracies for ten levels of audio/visual degradation dB 48 45 42 39 36 33 30 27 24 21 QF V A 97.... ..."
Table 5. Audio-visual Emotion Recognition
"... In PAGE 13: ...15 7.3 Audio-visual Fusion The emotion recognition performance of audio-visual fusion is shown in Table5 . In this table, two combination schemes (weighting and training) are used to fuse the component HMMs from audio and visual channels.... In PAGE 14: ...stream fusion as a multi-class classification problem, there are a variety of methods that can be used to build the fusion. In addition to Adaboost MHMM, we used LDC and KNN (K=3 for female and K=5 for male) to build this audio-visual fusion, which are Ldc MHMM and Knn MHMM in Table5 . The performance comparison of these fusion methods is as follows: Adaboost MHMM gt; Knn MHMM gt; Acc MHMM gt; Ldc MHMM The results demonstrate that training combination outperforms weighting combination, except Ldc MHMM that is a linear fusion.... ..."
Cited by 1
Table 4: Audio-visual feature list
2004
Cited by 20
TABLE III AUDIO-VISUAL FEATURE LIST
2006
Cited by 7
Table 1. Audio-visual speaker ID
"... In PAGE 3: ... Noise mismatchwas created by adding speech noise to the audio signal at a signal-to-noise ra- tio of about 10 dB. Table1 shows the recognition accuracy for di#0Berent testing conditions and fusion techniques. The #0Crst two rows give the accuracy of audio-only ID and video-only ID.... ..."
TABLE III AUDIO-VISUAL FEATURE LIST
Table 1. Description of audio-visual signals used in tests
"... In PAGE 4: ...ig. 2. Comparison of answers for two types of experiments: sound source angle localization shift caused by the image appearance (left-hand), sound source distance localization shift caused by the image appearance (right-hand); loudspeaker No. 4 was the closest to the screen The list of audio-visual signals used in experiments is presented in Table1 . They are both low- and high-level.... ..."
Results 1 - 10
of
51,936