• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 51,936
Next 10 →

Table 1: The best recognition results for clean audio-visual database

in A Robust Multi-Modal Speech Recognition Method Using Optical-Flow Analysis
by Satoshi Tamura, Koji Iwano, Sadaoki Furui 2002
"... In PAGE 2: ...or any other triphone HMM, we xed A and V at 1.0 and 0.0 respectively. Table1 shows digit recognition results for the audio-visual data arti cially corrupted by an audio white noise, and for the clean data. These results show that our multi-modal ASR system achieves better performance than the audio-only ASR in all environments.... ..."
Cited by 3

Table 1. Highlight extraction results with test data set Audio Feature Only Visual Feature Only Audio-Visual Combination Sequence Ground

in Fusion of audio and motion information on HMM-based highlight extraction for baseball games
by Chih-Chieh Cheng, Chiou-Ting Hsu
"... In PAGE 13: ... The results of audio-based method, motion-based methods and our proposed integrated framework performed on test data set are shown in the left, middle and right columns of Table 1, respectively. As can be seen from Table1 , the proposed integrated framework outperforms the other two methods on almost all the test sequences in terms of both the numbers of false positives and false negatives. This experimental result also clarifies our statements that the noises (e.... In PAGE 13: ... C. Trade-off between False Positives and False Negatives The false positive values shown in Table1 could be further reduced if additional post-processes are adopted. When observing the falsely detected highlights (which will be discussed later in Section 4.... In PAGE 14: ... Comparison between Symmetric Combination and Visual-Centric Framework Here, we compare the proposed symmetric audio-visual combination framework with the visual-centric method, which is further refined using audio information. The motion-based method shown in the leftmost column of Table 3 is the same as in Table1 . In this experiment, we consider that if the number of cheering clips in one segment exceeds a certain threshold, this segment is more probable to attract the audience and reflects higher degree of excitement.... ..."
Cited by 1

Table 2. The automatic fusion accuracies are higher than either of the audio and visual modalities, at all degradation levels. This highlights the complementary nature of the audio and visual speech signals and the fusion robustness. At the most severe mismatch levels tested (SNR 21dB, QF 2), the audio, visual, and audio-visual accuracies are 37.1%, 48%, and 71.4% respectively, giving a relative improvement of 92.5% on the audio and 49% on the visual accuracies.

in Audio-Visual Speaker Identification via Adaptive Fusion Using Reliability Estimates of Both Modalities
by Niall A. Fox, Brian A. O’mullane, Richard B. Reilly
"... In PAGE 7: ...ixtures per state. The performance w.r.t. audio degradation is given in the second row of Table2 .... In PAGE 9: ...Table2 . Automatic audio-visual fusion accuracies for ten levels of audio/visual degradation dB 48 45 42 39 36 33 30 27 24 21 QF V A 97.... ..."

Table 5. Audio-visual Emotion Recognition

in Audio-visual spontaneous emotion recognition
by Zhihong Zeng, Yuxiao Hu, Glenn I. Roisman, Zhen Wen, Yun Fu, S. Huang
"... In PAGE 13: ...15 7.3 Audio-visual Fusion The emotion recognition performance of audio-visual fusion is shown in Table5 . In this table, two combination schemes (weighting and training) are used to fuse the component HMMs from audio and visual channels.... In PAGE 14: ...stream fusion as a multi-class classification problem, there are a variety of methods that can be used to build the fusion. In addition to Adaboost MHMM, we used LDC and KNN (K=3 for female and K=5 for male) to build this audio-visual fusion, which are Ldc MHMM and Knn MHMM in Table5 . The performance comparison of these fusion methods is as follows: Adaboost MHMM gt; Knn MHMM gt; Acc MHMM gt; Ldc MHMM The results demonstrate that training combination outperforms weighting combination, except Ldc MHMM that is a linear fusion.... ..."
Cited by 1

Table 4: Audio-visual feature list

in Modeling Individual and Group Actions in Meetings: a Two-Layer HMM Framework
by Dong Zhang, Daniel Gatica-perez, Samy Bengio, Iain McCowan, Guillaume Lathoud 2004
Cited by 20

TABLE III AUDIO-VISUAL FEATURE LIST

in Modeling individual and group actions in meetings with layered HMMs
by Dong Zhang, Student Member, Daniel Gatica-perez, Samy Bengio, Iain Mccowan 2006
Cited by 7

Table 7 Audio-Visual Bandwidth Allocation

in Space Research Association (USRA)
by Richard F. Haines, Richard F. Haines 1990

Table 1. Audio-visual speaker ID

in On The Use Of Visual Information For Improving Audio-Based Speaker Recognition
by Andrew Senior, Chalapathy V. Neti, Benoît Maison
"... In PAGE 3: ... Noise mismatchwas created by adding speech noise to the audio signal at a signal-to-noise ra- tio of about 10 dB. Table1 shows the recognition accuracy for di#0Berent testing conditions and fusion techniques. The #0Crst two rows give the accuracy of audio-only ID and video-only ID.... ..."

TABLE III AUDIO-VISUAL FEATURE LIST

in JOURNAL OF IEEE TRANSACTION ON MULTIMEDIA 1 Modeling Individual and Group Actions in Meetings With Layered HMMs
by Dong Zhang, Student Member, Daniel Gatica-perez, Samy Bengio, Iain Mccowan

Table 1. Description of audio-visual signals used in tests

in Lecture Notes in Computer Science 1 Rough-Neuro Approach to Testing Influence of Visual Cues on Surround Sound Perception
by Bozena Kostek
"... In PAGE 4: ...ig. 2. Comparison of answers for two types of experiments: sound source angle localization shift caused by the image appearance (left-hand), sound source distance localization shift caused by the image appearance (right-hand); loudspeaker No. 4 was the closest to the screen The list of audio-visual signals used in experiments is presented in Table1 . They are both low- and high-level.... ..."
Next 10 →
Results 1 - 10 of 51,936
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University