Results 1 - 10
of
42
Sound-Source Recognition: A Theory and Computational Model
, 1999
"... The ability of a normal human listener to recognize objects in the environment from only the sounds they produce is extraordinarily robust with regard to characteristics of the acoustic environment and of other competing sound sources. In contrast, computer systems designed to recognize sound source ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
The ability of a normal human listener to recognize objects in the environment from only the sounds they produce is extraordinarily robust with regard to characteristics of the acoustic environment and of other competing sound sources. In contrast, computer systems designed to recognize sound sources function precariously, breaking down whenever the target sound is degraded by reverberation, noise, or competing sounds. Robust listening requires extensive contextual knowledge, but the potential contribution of sound-source recognition to the process of auditory scene analysis has largely been neglected by researchers building computational models of the scene analysis process. This thesis proposes a theory of sound-source recognition, casting recognition as a process of gathering information to enable the listener to make inferences about
Support vector machines for speech recognition
- Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
Social Signal Processing: Survey of an Emerging Domain
, 2008
"... The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next- ..."
Abstract
-
Cited by 32 (10 self)
- Add to MetaCart
The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next-generation computing needs to include the essence of social intelligence – the ability to recognize human social signals and social behaviours like turn taking, politeness, and disagreement – in order to become more effective and more efficient. Although each one of us understands the importance of social signals in everyday life situations, and in spite of recent advances in machine analysis of relevant behavioural cues like blinks, smiles, crossed arms, laughter, and similar, design and development of automated systems for Social Signal Processing (SSP) are rather difficult. This paper surveys the past efforts in solving these problems by a computer, it summarizes the relevant findings in social psychology, and it proposes a set of recommendations for enabling the development of the next generation of socially-aware computing.
Spectral Features for Automatic Text-Independent Speaker Recognition
, 2003
"... Front-end or feature extractor is the first component in an automatic speaker recognition system. Feature extraction transforms the raw speech signal into a compact but e#ective representation that is more stable and discriminative than the original signal. Since the front-end is the first component ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Front-end or feature extractor is the first component in an automatic speaker recognition system. Feature extraction transforms the raw speech signal into a compact but e#ective representation that is more stable and discriminative than the original signal. Since the front-end is the first component in the chain, the quality of the later components (speaker modeling and pattern matching) is strongly determined by the quality of the front-end. In other words, classification can be at most as accurate as the features.
For a recent report
- Core Experiment ROI 1: ROI refinement", ISO/IEC JTC1/SC29/WG1 N990
, 1995
"... Abstract This report gives a report of the developed methods and techniques of multimodal recognizers that are used in the M4 domain. This includes the description of recognizers in the auditory domain, like phoneme recognition and localization, the video domain, represented by gesture recognition, ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract This report gives a report of the developed methods and techniques of multimodal recognizers that are used in the M4 domain. This includes the description of recognizers in the auditory domain, like phoneme recognition and localization, the video domain, represented by gesture recognition, person identification, person tracking and gaze tracking, and multimodal multimodal approaches for tracking and localization of people. The outcome of these approaches give a sufficient input for the more higher level approaches in WP3 for efficient meeting analysis and multimodal access. M4 Deliverable D2.2 1
A Biomimetic Platform to Study Perception in Bats
, 2000
"... Echolocating bats achieve a surprising amount of autonomy primarily based on sonar sensing. In order to use insights from biosonar function to improve technical designs, it is necessary to understand the biosonar tasks (e.g., obstacle avoidance, prey capture, navigation), which provide the context f ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Echolocating bats achieve a surprising amount of autonomy primarily based on sonar sensing. In order to use insights from biosonar function to improve technical designs, it is necessary to understand the biosonar tasks (e.g., obstacle avoidance, prey capture, navigation), which provide the context for this function. To facilitate the study of these tasks, a system was designed, which combines the following aspects: It allows for interaction with the real world and mobility by mounting a sonarhead with 6 rotational degrees of freedom on a mobile platform. At the same time the system is capable of displaying the output of a parsimonious auditory model at an appropriate update-rate. This allows for interactive exploration of the echoes associated with a particular echolocation scenario. The use of the system in exploring biosonar tasks is demonstrated by several examples, namely continuous estimation of Doppler shifts as part of an acoustic ow analysis, two-target resolution with fm-sign...
An Analysis/Synthesis Auditory Filterbank Based on an IIR Implementation of the Gammachirp
- in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing
, 1998
"... This paper proposes a new auditory filterbank that enables signal resynthesis from dynamic representations produced by a level-dependent auditory filterbank. The filterbank is based on a new IIR implementation of the gammachirp, which has been shown to be an excellent candidate for asymmetric, level ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper proposes a new auditory filterbank that enables signal resynthesis from dynamic representations produced by a level-dependent auditory filterbank. The filterbank is based on a new IIR implementation of the gammachirp, which has been shown to be an excellent candidate for asymmetric, level-dependent auditory filters. Initially, the gammachirp filter is shown to be decomposed into a combination of a gammatone filter and an asymmetric function. The asymmetric function is excellently simulated with a minimum-phase IIR filter, named the "asymmetric compensation filter". Then, two filterbank structures are presented each based on the combination of a gammatone filterbank and a bank of asymmetric compensation filters controlled by a signal level estimation mechanism. The inverse filter of the asymmetric compensation filter is always stable because the minimum-phase condition is satisfied. When a bank of inverse filters is utilized after the gammachirp analysis filterbank and the id...
Biologically-based Auditory Signal Processing in Analog VLSI
- IEEE Asilomar Conference on Signals, Systems, and Computers
"... This paper reviews many recent analog silicon implementations of computational models of neural auditory processing. These implementations are based on physiological and psychophysical knowledge about biological auditory systems. 1 Introduction The questions "How do we hear?" and "How can we improve ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper reviews many recent analog silicon implementations of computational models of neural auditory processing. These implementations are based on physiological and psychophysical knowledge about biological auditory systems. 1 Introduction The questions "How do we hear?" and "How can we improve artificial hearing systems?" are rarely asked by the same researcher. Why is research that holistically addresses hearing systems so rare? In the vision community, a sizable group of researchers use both biological and artificial systems as motivations for their research. Biological motion control and robotic motion control are also considered in a unified framework by many researchers of movement. The different computational environments of biological auditory systems and artificial hearing systems are a key reason for the separation between the disciplines. Artificial hearing systems are usually specified as sampled digital algorithms, and are usually implemented as programs on a small n...
Three-Dimensional Localization of a Close-Range Acoustic Source Using Binaural Cues
, 1998
"... ..."
Visualization And Calculation Of The Roughness Of Acoustical Musical Signals Using The Synchronization Index Model (SIM)
, 2000
"... The synchronization index model of sensory dissonance and roughness accounts for the degree of phase-locking to a particular frequency that is present in the neural patterns. Sensory dissonance (roughness) is defined as the energy of the relevant beating frequencies in the auditory channels with res ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The synchronization index model of sensory dissonance and roughness accounts for the degree of phase-locking to a particular frequency that is present in the neural patterns. Sensory dissonance (roughness) is defined as the energy of the relevant beating frequencies in the auditory channels with respect to the total energy. The model takes rate-code patterns at the level of the auditory nerve as input and outputs a sensory dissonance (roughness) value. The synchronization index model entails a straightforward visualization of the principles underlying sensory dissonance and roughness, in particular in terms of (i) roughness contributions with respect to cochlear mechanical filtering (on a Critical Band scale), and (ii) roughness contributions with respect to phase-locking synchrony (=the synchronization index for the relevant beating frequencies on a frequency scale). This paper presents the concept, and implementation of the synchronization index model and its application to musical scales.

