Results 1 -
7 of
7
Multi Stream Speech Recognition
, 1996
"... . In this paper, we discuss a new automatic speech recognition (ASR) approach based on independent processing and recombination of several feature streams. In this framework, it is assumed that the speech signal is represented in terms of multiple input streams, each input stream representing a diff ..."
Abstract
-
Cited by 113 (16 self)
- Add to MetaCart
. In this paper, we discuss a new automatic speech recognition (ASR) approach based on independent processing and recombination of several feature streams. In this framework, it is assumed that the speech signal is represented in terms of multiple input streams, each input stream representing a different characteristic of the signal. If the streams are entirely synchronous, they may be accommodated simply (as they usually are in state-of-the-art systems). However, as discussed in the paper, it may be required to permit some degree of asynchrony between streams. This paper introduces the basic framework of a statistical structure that can accommodate multiple (asynchronous) observation streams (possibly exhibiting different frame rates). This approach will then be applied to the particular case of multi-band speech recognition and will be shown to yield significantly better noise robustness. 2 IDIAP--RR 96-07 1 Introduction In current automatic speech recognition (ASR) systems, the a...
Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions
- in Adaptive Processing of Sequences and Data Structures, ser. Lecture Notes in Artificial Intelligence (1387
, 1998
"... ..."
Robust Speech Recognition Based on Multi-Stream Features
, 1997
"... In this paper, we discuss a new automatic speech recognition (ASR) approach based on the independent processing and recombination of several feature streams. In this framework, it is assumed that the speech signal is represented in terms of multiple input streams, each input stream representing a di ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
In this paper, we discuss a new automatic speech recognition (ASR) approach based on the independent processing and recombination of several feature streams. In this framework, it is assumed that the speech signal is represented in terms of multiple input streams, each input stream representing a different characteristic of the signal. If the streams are entirely synchronous, they may be accommodated simply. However, as discussed in the paper, it may be required to permit some degree of asynchrony between streams, which are then forced to recombine at some temporal "anchor points" associated with some (pre-defined) speech unit levels. We start by introducing the basic framework of a statistical structure that can accommodate multiple observation streams. This approach was initially applied to the case of subband-based speech recognition and was shown to yield significantly better noise robustness. After having summarized these results, the multi-stream approach will be used to combine ...
Speech Recognition Using Augmented Conditional Random Fields
"... Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ability to model spectral phenomena well. In this paper, a new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed. This paradigm addresses some limitations of HMMs while maintaining many of the aspects which have made them successful. In particular, the acoustic modeling problem is reformulated in a data driven, sparse, augmented space to increase discrimination. Acoustic context modeling is explicitly integrated to handle the sequential phenomena of the speech signal. We present an efficient framework for estimating these models that ensures scalability and generality. In the TIMIT
1053-5888/05/$20.00©2005IEEE IEEE SIGNAL PROCESSING MAGAZINE [81] SEPTEMBER 2005
"... [Beyond the spectral envelope as the fundamental representation for speech recognition] ..."
Abstract
- Add to MetaCart
[Beyond the spectral envelope as the fundamental representation for speech recognition]
HIGHLIGHTS DETECTION IN SPORTS VIDEOS BASED ON AUDIO ANALYSIS 1
"... While it is very hard to achieve automatic sports competition key moments detection only based on visual analysis, we propose in this paper automatic highlights detection based on an audio classifier. The audio classifier is based on a new modeling technique of the audio spectrum called Piecewise Ga ..."
Abstract
- Add to MetaCart
While it is very hard to achieve automatic sports competition key moments detection only based on visual analysis, we propose in this paper automatic highlights detection based on an audio classifier. The audio classifier is based on a new modeling technique of the audio spectrum called Piecewise Gaussian Modeling (PGM) and Neural Networks. The proposed approach was evaluated on soccer and tennis videos, though our technique has no restriction on the sports ’ type. It is shown that audio-based highlights detection can be effective for tennis segmentation since 97.5 % of end-of-serves were correctly classified. Goals can be detected in soccer videos using audio analysis as well. An intelligent sports-videos player is proposed based on the audio analysis permitting the user to navigate through key moments in a sports video. 1.

