• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 437
Next 10 →

Audio-Visual Clustering for Multiple Speaker Localization

by Vasil Khalidov, Florence Forbes, Miles Hansard, Elise Arnaud, Radu Horaud
"... Abstract. We address the issue of identifying and localizing individuals in a scene that contains several people engaged in conversation. We use a human-like configuration of sensors (binaural and binocular) to gather both auditory and visual observations. We show that the identification and localiz ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
and localization problem can be recast as the task of clustering the audio-visual observations into coherent groups. We propose a probabilistic generative model that captures the relations between audio and visual observations. This model maps the data to a representation of the common 3D scene-space, via a pair

Detection and Localization of 3D Audio-Visual Objects Using Unsupervised Clustering

by Vasil Khalidov, Florence Forbes, Miles Hansard, Elise Arnaud, Radu Horaud
"... This paper addresses the issues of detecting and localizing objects in a scene that are both seen and heard. We explain the benefits of a human-like configuration of sensors (binaural and binocular) for gathering auditory and visual observations. It is shown that the detection and localization probl ..."
Abstract - Cited by 10 (6 self) - Add to MetaCart
problem can be recast as the task of clustering the audio-visual observations into coherent groups. We propose a probabilistic generative model that captures the relations between audio and visual observations. This model maps the data into a common audio-visual 3D representation via a pair of mixture

Submitted to the British Machine Vision Conference 2004 Autonomous learning of perceptual categories and symbolic protocols from audio-visual input

by C. J. Needham, D. R. Magee, V. Devin, P. Santos, A. G. Cohn, D. C. Hogg
"... The development of cognitive vision systems that autonomously learn how to interact with their environment through input from sensors is a major challenge for the Computer Vision, Machine Learning and Artificial Intelligence communities. This paper presents a framework in which a symbolic inference ..."
Abstract - Add to MetaCart
engine is integrated with a perceptual system. The rules of the inference engine are learned from audio-visual observation of the world, to form an interactive perceptual agent that can respond suitably to its environment. From the video and audio input streams interesting objects are identified using

A coupled HMM for audio-visual speech recognition

by Ara V. Nefian, Luhong Liang, Xiaobo Pi, Liu Xiaoxiang, Crusoe Mao, Kevin Murphy - in International Conference on Acoustics, Speech and Signal Processing (CASSP’02 , 2002
"... In recent years several speech recognition systems that use visual together with audio information showed significant increase in performance over the standard speech recognition systems. The use of visual features is justified by both the bimodality of the speech generation and by the need of featu ..."
Abstract - Cited by 42 (2 self) - Add to MetaCart
of features that are invariant to acoustic noise perturbation. The audio-visual speech recognition system presented in this paper introduces a novel audio-visual fusion technique that uses a coupled hidden Markov model (HMM). The statistical properties of the coupled-HMM allow us to model the state asynchrony

Author manuscript, published in "ACM/IEEE International Conference on Multimodal Interfaces (2008)" Detection and Localization of 3D Audio-Visual Objects Using Unsupervised Clustering

by Vasil Khalidov, Florence Forbes, Miles Hansard, Elise Arnaud, Radu Horaud , 2009
"... This paper addresses the issues of detecting and localizing objects in a scene that are both seen and heard. We explain the benefits of a human-like configuration of sensors (binaural and binocular) for gathering auditory and visual observations. It is shown that the detection and localization probl ..."
Abstract - Add to MetaCart
problem can be recast as the task of clustering the audio-visual observations into coherent groups. We propose a probabilistic generative model that captures the relations between audio and visual observations. This model maps the data into a common audio-visual 3D representation via a pair of mixture

Asynchrony Modeling for Audio-Visual Speech Recognition

by Guillaume Gravier, Gerasimos Potamianos, Chalapathy Neti , 2002
"... We investigate the use of multi-stream HMMs in the automatic recognition of audio-visual speech. Multi-stream HMMs allow the modeling of asynchrony between the audio and visual state sequences at a variety of levels (phone, syllable, word, etc.) and are equivalent to product, or composite, HMMs. In ..."
Abstract - Cited by 33 (2 self) - Add to MetaCart
We investigate the use of multi-stream HMMs in the automatic recognition of audio-visual speech. Multi-stream HMMs allow the modeling of asynchrony between the audio and visual state sequences at a variety of levels (phone, syllable, word, etc.) and are equivalent to product, or composite, HMMs

Stream Confidence Estimation For Audio-Visual Speech Recognition

by Gerasimos Potamianos, Chalapathy Neti - IN PROC. OF ICSLP 2000, BEJING , 2000
"... We investigate the use of single modality confidence measures as a means of estimating adaptive, local weights for improved audio -visual automatic speech recognition. We limit our work to the toy problem of audio-visual phonetic classification by means of a two-stream Gaussian mixture model (GMM), ..."
Abstract - Cited by 28 (3 self) - Add to MetaCart
We investigate the use of single modality confidence measures as a means of estimating adaptive, local weights for improved audio -visual automatic speech recognition. We limit our work to the toy problem of audio-visual phonetic classification by means of a two-stream Gaussian mixture model (GMM

Audio-Visual Speaker Tracking with Importance Particle Filters

by Daniel Gatica-perez, Guillaume Lathoud, Iain McCowan, Jean-marc Odobez, Darren Moore - in Proc. IEEE Int. Conf. on Image Processing (ICIP , 2003
"... We present a probabilistic method for audio-visual (AV) speaker tracking, using an uncalibrated wide-angle camera and a microphone array. The algorithm fuses 2-D object shape and audio information via importance particle filters (I-PFs), allowing for the asymmetrical integration of AV information in ..."
Abstract - Cited by 29 (4 self) - Add to MetaCart
We present a probabilistic method for audio-visual (AV) speaker tracking, using an uncalibrated wide-angle camera and a microphone array. The algorithm fuses 2-D object shape and audio information via importance particle filters (I-PFs), allowing for the asymmetrical integration of AV information

Speaker independent audio-visual continuous speech recognition

by Luhong Liang, Xiaoxing Liu, Yibao Zhao, Xiaobo Pi, Ara V. Nefian - In International Conference on Multimedia and Expo , 2002
"... The increase in the number of multimedia applications that require robust speech recognition systems determined a large interest in the study of audio-visual speech recognition (AVSR) systems. The use of visual features in AVSR is justified by both the audio and visual modality of the speech generat ..."
Abstract - Cited by 16 (3 self) - Add to MetaCart
The increase in the number of multimedia applications that require robust speech recognition systems determined a large interest in the study of audio-visual speech recognition (AVSR) systems. The use of visual features in AVSR is justified by both the audio and visual modality of the speech

Detection of documentary scene changes by audio-visual fusion

by Atulya Velivelli, Chong-wah Ngo, Thomas S. Huang - In Proc. CIVR , 2004
"... Abstract. The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was ob-served that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint obs ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract. The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was ob-served that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint
Next 10 →
Results 1 - 10 of 437
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University