Results 1 -
5 of
5
Graceful Degradation of Speech Recognition Performance over Packet-Erasure Networks
- IEEE Trans. On Speech and Audio Processing
, 2002
"... This paper explores packet loss recovery for automatic speech recognition (ASR) in spoken dialog systems, assuming an architecture in which a lightweight client communicates with a remote ASR server. Speech is transmitted with source and channel codes optimized for the ASR application, i.e., to mini ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper explores packet loss recovery for automatic speech recognition (ASR) in spoken dialog systems, assuming an architecture in which a lightweight client communicates with a remote ASR server. Speech is transmitted with source and channel codes optimized for the ASR application, i.e., to minimize word error rate. Unequal amounts of forward error correction, depending on the data's effect on ASR performance, are assigned to protect against packet loss. Experiments with simulated packet loss in a range of loss conditions are conducted on the DARPA Communicator (air travel information) task. Results show that the approach provides robust ASR performance which degrades gracefully as packet loss rates increase. Transmitting at 5.2 Kbps with up to 200 ms added delay, leads to only a 7% relative degradation in word error rate even under extremely adverse network conditions.
A High-Speed, Low-Resource ASR Back-End Based on Custom Arithmetic
"... Abstract—With the skyrocketing popularity of mobile devices, new processing methods tailored to a specific application have become necessary for low-resource systems. This work presents a high-speed, low-resource speech recognition system using custom arithmetic units, where all system variables are ..."
Abstract
- Add to MetaCart
Abstract—With the skyrocketing popularity of mobile devices, new processing methods tailored to a specific application have become necessary for low-resource systems. This work presents a high-speed, low-resource speech recognition system using custom arithmetic units, where all system variables are represented by integer indices and all arithmetic operations are replaced by hardware-based table lookups. To this end, several reordering and rescaling techniques, including two accumulation structures for Gaussian evaluation and a novel method for the normalization of Viterbi search scores, are proposed to ensure low entropy for all variables. Furthermore, a discriminatively inspired distortion measure is investigated for scalar quantization of forward probabilities to maximize the recognition rate. Finally, heuristic algorithms are explored to optimize system-wide resource allocation. Our best bit-width allocation scheme only requires 59 kB of ROMs to hold the lookup tables, and its recognition performance with various vocabulary sizes in both clean and noisy conditions is nearly as good as that of a system using a 32-bit floating-point unit. Simulations on various architectures show that, on most modern processor designs, we can expect a cycle-count speedup of at least three times over systems with floating-point units. Additionally, the memory bandwidth is reduced by over 70 % and the offline storage for model parameters is reduced by 80%. Index Terms—Alpha recursion, bit-width allocation, custom arithmetic, discriminative distortion measure, forward probability normalization and scaling, high speed, low resource, normalization, quantization, speech recognition. I.
Feature Pruning for Low-Power ASR Systems in Clean and Noisy Environments
"... Abstract—Likelihood evaluation can substantially affect the total computational load for continuous hidden Markov model (HMM)-based speech-recognition systems with small vocabularies. This letter presents feature pruning, a simple yet effective technique to reduce computation and, hence, power consu ..."
Abstract
- Add to MetaCart
Abstract—Likelihood evaluation can substantially affect the total computational load for continuous hidden Markov model (HMM)-based speech-recognition systems with small vocabularies. This letter presents feature pruning, a simple yet effective technique to reduce computation and, hence, power consumption of likelihood evaluation. Our technique, under certain conditions, only evaluates the likelihoods of a fraction of feature elements and approximates those of the remaining (pruned) ones by a simple function. The order in which feature elements are evaluated is obtained by a data-driven approach to minimize computation. With this order, feature pruning can speed up the likelihood evaluation by a factor of 1.3–1.8 and reduce its power consumption by 27%–43 % for various recognition tasks, including those in noisy environments. Index Terms—Gaussian evaluation, high speed, low power, speech recognition. I.
Graceful Degradation of Speech Recognition Performance
- In: Proc. Eurospeech01
, 2001
"... This paper explores packet loss recovery in client-server Automatic Speech Recognition (ASR) systems. A forward error correction (FEC) system is designed and tested over several channel loss models, at variable amounts of data acquisition delay. In experiments with simulated packet loss, the FEC sys ..."
Abstract
- Add to MetaCart
This paper explores packet loss recovery in client-server Automatic Speech Recognition (ASR) systems. A forward error correction (FEC) system is designed and tested over several channel loss models, at variable amounts of data acquisition delay. In experiments with simulated packet loss, the FEC system provides robust ASR performance which degrades gracefully as packet loss rates increase. Comparing this scheme to several alternatives under low and medium loss channel conditions, we found one approach (multiple transmission plus interpolation) that yielded similar performance, but the FEC system should scale better to lower bit rate conditions.
Improved Vector Quantization Approach for Discrete HMM Speech Recognition System
, 2006
"... Abstract: The paper presents an improved Vector Quantization (VQ) approach for discrete Hidden Markov Models (HMMs). This improved VQ approach performs an optimal distribution of VQ codebook components on HMM states. This technique, that we named the Distributed Vector Quantization (DVQ) of hidden M ..."
Abstract
- Add to MetaCart
Abstract: The paper presents an improved Vector Quantization (VQ) approach for discrete Hidden Markov Models (HMMs). This improved VQ approach performs an optimal distribution of VQ codebook components on HMM states. This technique, that we named the Distributed Vector Quantization (DVQ) of hidden Markov models, succeeds in unifying acoustic microstructure and phonetic macro-structure when the estimation of HMM parameters is performed. The DVQ technique is implemented through two variants; the first variant uses the K-means algorithm (K-means-DVQ) to optimize the VQ, while the second variant exploits the benefits of the classification behavior of Neural Networks (NN-DVQ) for the same purpose. The proposed variants are compared with the HMM-based baseline system by experiments of specific Arabic consonants recognition. The results show that the distributed vector quantization technique increase the performance of the discrete HMM system while maintaining the decoding speed of the models.

