## A High-Speed, Low-Resource ASR Back-End Based on Custom Arithmetic

### Cached

### Download Links

### BibTeX

@MISC{Li_ahigh-speed,,

author = {Xiao Li and Jonathan Malkin and Jeff A. Bilmes},

title = {A High-Speed, Low-Resource ASR Back-End Based on Custom Arithmetic},

year = {}

}

### OpenURL

### Abstract

Abstract—With the skyrocketing popularity of mobile devices, new processing methods tailored to a specific application have become necessary for low-resource systems. This work presents a high-speed, low-resource speech recognition system using custom arithmetic units, where all system variables are represented by integer indices and all arithmetic operations are replaced by hardware-based table lookups. To this end, several reordering and rescaling techniques, including two accumulation structures for Gaussian evaluation and a novel method for the normalization of Viterbi search scores, are proposed to ensure low entropy for all variables. Furthermore, a discriminatively inspired distortion measure is investigated for scalar quantization of forward probabilities to maximize the recognition rate. Finally, heuristic algorithms are explored to optimize system-wide resource allocation. Our best bit-width allocation scheme only requires 59 kB of ROMs to hold the lookup tables, and its recognition performance with various vocabulary sizes in both clean and noisy conditions is nearly as good as that of a system using a 32-bit floating-point unit. Simulations on various architectures show that, on most modern processor designs, we can expect a cycle-count speedup of at least three times over systems with floating-point units. Additionally, the memory bandwidth is reduced by over 70 % and the offline storage for model parameters is reduced by 80%. Index Terms—Alpha recursion, bit-width allocation, custom arithmetic, discriminative distortion measure, forward probability normalization and scaling, high speed, low resource, normalization, quantization, speech recognition. I.

### Citations

1219 |
An algorithm for vector quantizer design
- Linde, Buzo, et al.
- 1980
(Show Context)
Citation Context ...thmetic operations (as we employ in this work) will yield the best tradeoff between ASR accuracy and power consumption reduction. This section first reports the results of system development. The LBG =-=[26]-=- algorithm was used in all single variable quantization experiments. These experiments were performed on the 8 development subsets mentioned in the previous section. The WERs reported were an average ... |

414 | Evaluating future microprocessors: The simplescalar tool set
- Burger, Austin, et al.
- 1996
(Show Context)
Citation Context ...cision boundaries are probably slightly less susceptible to minor perturbations when variables are so coarsely quantized in the custom arithmetic case. C. CPU Time Simulation We utilized SimpleScalar =-=[27]-=-, an architecture-level execution-driven simulator, for CPU time simulation. All tables were precalculated by SimpleScalar at runtime. We extended the instruction set and modified the assembler to sup... |

206 |
Automatic Speech Recognition - The Development of the SPHIM system
- Lee
- 1989
(Show Context)
Citation Context .... In the literature, has commonly served as a normalized forward probability to solve the underflow problem that occurs in the Baum–Welch algorithm using fixed-precision floating-point representation =-=[21]-=-, [22]. This is equivalent to rescaling by , producing a quantity with a representable numerical range. The recursion then becomes (2) (3) (4) (5)sLI et al.: HIGH-SPEED, LOW-RESOURCE ASR BACK-END 1687... |

158 | The Chimaera Reconfigurable Functional Unit
- Hauck, Fry, et al.
- 1997
(Show Context)
Citation Context ...f simple table accesses. The physical device realization of the LUTs is beyond the scope of this work. The implementation of custom arithmetic units can be simplified using reconfigurable logic [19], =-=[20]-=-, although at the expense of increased power consumption. B. Design Issues for ASR In spite of the attractiveness of custom arithmetic, such a system becomes unrealistic if the table size gets too lar... |

65 | Tied Mixture Continuous Parameter Modeling for Speech Recognition - Bellegarda, Nahamoo - 1990 |

60 |
PhoneBook: A phonetically-rich isolatedword telephone-speech database
- Pitrelli, Fong, et al.
- 1995
(Show Context)
Citation Context ... case. The downside is that this algorithm takes substantially longer to complete. VII. SYSTEM ORGANIZATION A. Baseline System Configuration The database used for system evaluation is NYNEX PhoneBook =-=[24]-=-, a phonetically-rich speech database designed for isolated-word recognition tasks. It consists of isolated-word utterances recorded via telephone channels with an 8000-Hz sampling rate. Each sample i... |

51 |
Vector quantization for the efficient computation of continuous density likelihoods
- Bocchieri
- 1993
(Show Context)
Citation Context ...ctor quantization to the observations [1], [2]. Even in a CHMM, the computational load can be greatly reduced by restricting the precise likelihood computation to the most relevant Gaussians using VQ =-=[3]-=-, [4]. Second, quantization techniques also contribute to a compact representation of model parameters, which not only saves memory but also reduces computational cost [5]–[9]. The problem can also be... |

27 | Accelerating multimedia processing by implementing memoing in multiplication and division units
- Citron, Feitelson, et al.
- 1998
(Show Context)
Citation Context ...termediate values) with very low entropy. It would be beneficial to “record” these computation results so that they may be reused many times in the future, thereby amortizing the cost of computation. =-=[15]-=- uses cache-like structures they call memo-tables to store the outputs of particular instruction types. It performs a table lookup in parallel with conventional computation whichs1684 IEEE TRANSACTION... |

23 | State based Gaussian selection In large vocabulary continuous speech recognition using HMMs
- Gales, Knill, et al.
- 1999
(Show Context)
Citation Context ...quantization to the observations [1], [2]. Even in a CHMM, the computational load can be greatly reduced by restricting the precise likelihood computation to the most relevant Gaussians using VQ [3], =-=[4]-=-. Second, quantization techniques also contribute to a compact representation of model parameters, which not only saves memory but also reduces computational cost [5]–[9]. The problem can also be appr... |

23 | A low-power accelerator for the sphinx 3 speech recognition system
- Mathew, Davis, et al.
(Show Context)
Citation Context ...mance. With 32-bit computing having reached the embedded market and after years of finding ways to make general purpose chips more powerful, the use of custom logic might seem a rather curious choice =-=[14]-=-. Many signal processing applications produce system variables (system inputs, outputs, and all intermediate values) with very low entropy. It would be beneficial to “record” these computation results... |

22 | B.K.Mak, “Subspace distribution clustering hidden markov model - Bocchieri - 2001 |

20 |
Efficient speech recognition using subvector quantization and discrete-mixture HMMs
- Tsakalidis, Digalakis, et al.
- 1999
(Show Context)
Citation Context ...rovement, a discrete mixture HMM assumes discrete distributions at the scalar or subvector level of a mixture model, and applies scalar quantization or subvector quantization to the observations [1], =-=[2]-=-. Even in a CHMM, the computational load can be greatly reduced by restricting the precise likelihood computation to the most relevant Gaussians using VQ [3], [4]. Second, quantization techniques also... |

12 |
Floating-point bitwidth optimization for low-power signal processing applications
- Fang, Chen, et al.
- 2002
(Show Context)
Citation Context ...provide no information if bit-widths are too low) and greedily increase the bit-widths of one or a small group of variables until we find an acceptable solution. This is similar to the method used in =-=[23]-=- to optimize floating-point bit-widths. A. Single-Variable Quantization Finding a reasonable starting point is an important part of these algorithms. In general, it is expected that the noise introduc... |

11 | Hamming distance approximation for a fast log-likelihood computation for mixture densities - Beyerlein, Ullrich - 1995 |

10 | Speech recognition using HMMs with quantized parameters - Vasilache - 1999 |

9 |
Subvector clustering to improve memory and speed performance of acoustic likelihood computation
- Ravishankar, Bisiani, et al.
- 1997
(Show Context)
Citation Context ... relevant Gaussians using VQ [3], [4]. Second, quantization techniques also contribute to a compact representation of model parameters, which not only saves memory but also reduces computational cost =-=[5]-=-–[9]. The problem can also be approached from the hardware side. A floating-point unit is power hungry and requires a rather large chip area when implemented. Software implementation of floating-point... |

8 |
The HTK Book (for HTK Version 3.1
- Young
- 2000
(Show Context)
Citation Context ... while still accurately detecting speech when noise conditions are stationary. Feature extraction is triggered immediately when active speech is detected. It follows a standard procedure described in =-=[25]-=-. We then add the first-order dynamic features followed by mean subtraction and variance normalization. The feature vectors obtained are fed into the back-end, where the pattern matching takes place. ... |

7 | J.Bilmes, “Data-driven vector clustering for low-memory footprint ASR - Filali - 2002 |

6 |
Discrete mixture HMM
- Takahashi, Aikawa, et al.
- 1997
(Show Context)
Citation Context ...r improvement, a discrete mixture HMM assumes discrete distributions at the scalar or subvector level of a mixture model, and applies scalar quantization or subvector quantization to the observations =-=[1]-=-, [2]. Even in a CHMM, the computational load can be greatly reduced by restricting the precise likelihood computation to the most relevant Gaussians using VQ [3], [4]. Second, quantization techniques... |

6 |
Implementing a high accuracy speaker-independent continuous speech recognizer on a fixed DSP
- Gong, Kao
(Show Context)
Citation Context ...sually do not use the precision of a floating-point representation efficiently. Fixed-point arithmetic offers only a partial solution. Operations can be much faster using a fixed-point implementation =-=[11]-=-–[13], but this method often cuts the available dynamic range without having its representational precision fully utilized. Additionally, some operations can still take numerous processor cycles to co... |

6 | A low-power, fixed-point front-end feature extraction for a distributed speech recognition system
- Jayant, Hans, et al.
- 2002
(Show Context)
Citation Context ...ariables and operations which would quickly complicate the LUT design. In addition, fixed-point arithmetic for feature extraction has been well studied and can be implemented by DSPs very efficiently =-=[18]-=-. Therefore, we envision a chip using a combination of both standard fixed-point arithmetic for the front-end, and custom arithmetic for the back-end. The rest of the paper is organized as follows. Se... |

5 | An ultra low power, ultra miniature voice command system based on hidden Markov models - Cornu, Destrez, et al. - 2002 |

5 |
J.Bilmes, “Custom arithmetic for highspeed, low-resource ASR systems
- Malkin, Li, et al.
- 2004
(Show Context)
Citation Context ...ld ideally be consistent with minimizing the degradation in recognition performance. Finally, a bit-width allocation algorithm must be provided to optimize the resource performance. While in [16] and =-=[17]-=-, we proposed a general design methodology for custom arithmetic and reported preliminary results for system development, this paper approaches the problem systematically, discusses the solutions in g... |

4 |
An overview of floating-point support and math library on the Intel XScale T M architecture
- Iordache, Tang
- 2003
(Show Context)
Citation Context ...nit is power hungry and requires a rather large chip area when implemented. Software implementation of floating-point arithmetic takes less power and chip area, but has significantly higher latencies =-=[10]-=-. Additionally, speech recognizers usually do not use the precision of a floating-point representation efficiently. Fixed-point arithmetic offers only a partial solution. Operations can be much faster... |

4 | Reconfigurable computing for speech recognition: Preliminary findings
- Melnikoff, James-Roxby, et al.
(Show Context)
Citation Context ...ries of simple table accesses. The physical device realization of the LUTs is beyond the scope of this work. The implementation of custom arithmetic units can be simplified using reconfigurable logic =-=[19]-=-, [20], although at the expense of increased power consumption. B. Design Issues for ASR In spite of the attractiveness of custom arithmetic, such a system becomes unrealistic if the table size gets t... |

3 |
Codebook design for ASR systems using custom arithmetic units
- Li, Malkin, et al.
- 2004
(Show Context)
Citation Context ...able should ideally be consistent with minimizing the degradation in recognition performance. Finally, a bit-width allocation algorithm must be provided to optimize the resource performance. While in =-=[16]-=- and [17], we proposed a general design methodology for custom arithmetic and reported preliminary results for system development, this paper approaches the problem systematically, discusses the solut... |

1 |
et al., “ASR in mobile phones—An industrial approach
- Varga
- 2002
(Show Context)
Citation Context ...y do not use the precision of a floating-point representation efficiently. Fixed-point arithmetic offers only a partial solution. Operations can be much faster using a fixed-point implementation [11]–=-=[13]-=-, but this method often cuts the available dynamic range without having its representational precision fully utilized. Additionally, some operations can still take numerous processor cycles to complet... |

1 | Buried Markov models for speech recognition,” presented at the Int - Bilmes - 1999 |