Results 1 -
5 of
5
Speech Enhancement and Recognition in Meetings With an Audio–Visual Sensor Array
"... reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
(Show Context)
reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained
Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms
- In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP
, 2007
"... Abstract. The goal of this work is to provide robust and accurate speech detection for automatic speech recognition (ASR) in meeting room settings. The solution is based on computing long-term modulation spectrum, and examining specific frequency range for dominant speech components to classify spee ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Abstract. The goal of this work is to provide robust and accurate speech detection for automatic speech recognition (ASR) in meeting room settings. The solution is based on computing long-term modulation spectrum, and examining specific frequency range for dominant speech components to classify speech and non-speech signals for a given audio signal. Manually segmented speech segments, short-term energy, short-term energy and zero-crossing based segmentation techniques, and a recently proposed Multi Layer Perceptron (MLP) classifier system are tested for comparison purposes. Speech recognition evaluations of the segmentation methods are performed on a standard database and tested in conditions where the signal-to-noise ratio (SNR) varies considerably, as in the cases of close-talking headset, lapel, distant microphone array output, and distant microphone. The results reveal that the proposed method is more reliable and less sensitive to mode of signal acquisition and unforeseen conditions. 2 IDIAP–RR 06-57 1
Fusing Asynchronous Feature Streams for On-line Writer Identification
"... In this paper, we present a new approach to improving the performance of a writer identification system by fusing asynchronous feature streams. Different feature streams are extracted from on-line handwritten text acquired from a whiteboard. The feature streams are used to train a text and language ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper, we present a new approach to improving the performance of a writer identification system by fusing asynchronous feature streams. Different feature streams are extracted from on-line handwritten text acquired from a whiteboard. The feature streams are used to train a text and language independent writer identification system based on Gaussian Mixture Models (GMMs). From a stroke consisting of n points, n point-based feature vectors and one stroke-based feature vector are extracted. The resulting feature streams thus have an unequal number of feature vectors. We evaluate different methods to directly fuse the feature streams and show that, by means of feature fusion, we can improve the performance of the writer identification system on a data set produced by 200 different writers.
A Posterior Approach for Microphone Array Based Speech Recognition
"... Automatic speech recognition (ASR) is difficult in environments such as multiparty meetings because of adverse acoustic conditions: background noise, reverberation and cross-talk. Microphone arrays can increase ASR accuracy dramatically in such situations. However, most existing beamforming techniqu ..."
Abstract
- Add to MetaCart
(Show Context)
Automatic speech recognition (ASR) is difficult in environments such as multiparty meetings because of adverse acoustic conditions: background noise, reverberation and cross-talk. Microphone arrays can increase ASR accuracy dramatically in such situations. However, most existing beamforming techniques use time-domain signal processing theory and are based on a geometric analysis of the relationship between sources and microphones. This limits their application, and leads to performance degradation when the geometric properties are unavailable, or heterogeneous channels are used. We present a new posterior-based approach for microphone array speech recognition. Instead of enhancing speech signals, we enhance posterior phone probabilities which are used in a tandem ANN-HMM system. Significant improvements were achieved over a single channel baseline. Combining beamforming and our method is significantly better than beamforming alone, especially in a moving speakers scenario. Index Terms: speech recognition, microphone array, beamforming, tandem approach