• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Automatic speech recognition and speech activity detection in the chil smart room,” inMLMI (2005)

by S M Chu, E Marcheret
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

Speech Enhancement and Recognition in Meetings With an Audio–Visual Sensor Array

by Hari Krishna Maganti, Daniel Gatica-perez, Iain Mccowan
"... reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained ..."
Abstract - Cited by 19 (0 self) - Add to MetaCart
reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained
(Show Context)

Citation Context

...ilable online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TASL.2007.906197 1558-7916/$25.00 © 2007 IEEE environments [1], instrumented meeting rooms [44], [54], and seminar halls =-=[10]-=- facilitating remote collaboration. The current article examines the use of multimodal sensor arrays in the context of instrumented meeting rooms. Meetings consist of natural, complex interaction betw...

Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms

by Hari Krishna Maganti, Petr Motlicek, Hari Krishna, Maganti Petr, Motlicek Daniel Gatica-perez - In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP , 2007
"... Abstract. The goal of this work is to provide robust and accurate speech detection for automatic speech recognition (ASR) in meeting room settings. The solution is based on computing long-term modulation spectrum, and examining specific frequency range for dominant speech components to classify spee ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
Abstract. The goal of this work is to provide robust and accurate speech detection for automatic speech recognition (ASR) in meeting room settings. The solution is based on computing long-term modulation spectrum, and examining specific frequency range for dominant speech components to classify speech and non-speech signals for a given audio signal. Manually segmented speech segments, short-term energy, short-term energy and zero-crossing based segmentation techniques, and a recently proposed Multi Layer Perceptron (MLP) classifier system are tested for comparison purposes. Speech recognition evaluations of the segmentation methods are performed on a standard database and tested in conditions where the signal-to-noise ratio (SNR) varies considerably, as in the cases of close-talking headset, lapel, distant microphone array output, and distant microphone. The results reveal that the proposed method is more reliable and less sensitive to mode of signal acquisition and unforeseen conditions. 2 IDIAP–RR 06-57 1
(Show Context)

Citation Context

...nergy features generated directly from the signal, and the acoustic phonetic features derived from observations generated by ASR acoustic models were used as input to the GMM classification framework =-=[8]-=-. The existing methods are limited by two common drawbacks. On one hand, threshold based detection techniques fail under low SNR conditions, and on the other hand, pattern-matching techniques require ...

Fusing Asynchronous Feature Streams for On-line Writer Identification

by Andreas Schlapbach, Horst Bunke
"... In this paper, we present a new approach to improving the performance of a writer identification system by fusing asynchronous feature streams. Different feature streams are extracted from on-line handwritten text acquired from a whiteboard. The feature streams are used to train a text and language ..."
Abstract - Add to MetaCart
In this paper, we present a new approach to improving the performance of a writer identification system by fusing asynchronous feature streams. Different feature streams are extracted from on-line handwritten text acquired from a whiteboard. The feature streams are used to train a text and language independent writer identification system based on Gaussian Mixture Models (GMMs). From a stroke consisting of n points, n point-based feature vectors and one stroke-based feature vector are extracted. The resulting feature streams thus have an unequal number of feature vectors. We evaluate different methods to directly fuse the feature streams and show that, by means of feature fusion, we can improve the performance of the writer identification system on a data set produced by 200 different writers.
(Show Context)

Citation Context

...work described in this paper has been conducted in the context of research on Smart Meeting Rooms [12]. The aim of this research is to automate standard tasks usually performed by humans in a meeting =-=[1, 11]-=-. To record a meeting, Smart Meeting Rooms are equipped with synchronized recording interfaces to capture audio, video, and handwritten notes. An important task in a Smart Meeting Room is to capture t...

A Posterior Approach for Microphone Array Based Speech Recognition

by Dong Wang, Ivan Himawan, Joe Frankel, Simon King
"... Automatic speech recognition (ASR) is difficult in environments such as multiparty meetings because of adverse acoustic conditions: background noise, reverberation and cross-talk. Microphone arrays can increase ASR accuracy dramatically in such situations. However, most existing beamforming techniqu ..."
Abstract - Add to MetaCart
Automatic speech recognition (ASR) is difficult in environments such as multiparty meetings because of adverse acoustic conditions: background noise, reverberation and cross-talk. Microphone arrays can increase ASR accuracy dramatically in such situations. However, most existing beamforming techniques use time-domain signal processing theory and are based on a geometric analysis of the relationship between sources and microphones. This limits their application, and leads to performance degradation when the geometric properties are unavailable, or heterogeneous channels are used. We present a new posterior-based approach for microphone array speech recognition. Instead of enhancing speech signals, we enhance posterior phone probabilities which are used in a tandem ANN-HMM system. Significant improvements were achieved over a single channel baseline. Combining beamforming and our method is significantly better than beamforming alone, especially in a moving speakers scenario. Index Terms: speech recognition, microphone array, beamforming, tandem approach
(Show Context)

Citation Context

...nditions. Headset microphones can alleviate this problem, but is inconvenient for users. Microphone arrays are less intrusive and can significantly improve recognition accuracy, via noise suppression =-=[1, 2, 3, 4, 5]-=- and directionality enforcement [6, 7]. Currently, most array processing methods operate on the acoustic signals, and assume that SNR enhancement will lead to ASR error reductions. Classical array pro...

Published In:

by unknown authors
"... Peer reviewed version ..."
Abstract - Add to MetaCart
Peer reviewed version
(Show Context)

Citation Context

...nditions. Headset microphones can alleviate this problem, but is inconvenient for users. Microphone arrays are less intrusive and can significantly improve recognition accuracy, via noise suppression =-=[1, 2, 3, 4, 5]-=- and directionality enforcement [6, 7]. Currently, most array processing methods operate on the acoustic signals, and assume that SNR enhancement will lead to ASR error reductions. Classical array pro...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University