Results 1 -
7 of
7
Signal modeling techniques in speech recognition
- PROCEEDINGS OF THE IEEE
, 1993
"... We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to norm ..."
Abstract
-
Cited by 99 (5 self)
- Add to MetaCart
We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to normalize and decor-relate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal’s spectrum can be estimated in a closed-loop manner. In this paper, we review the signal processing components of these algorithms. These al-gorithms are presented as part of a unified view of the signal parameterization problem in which there are three major tasks: measurement, transformation, and statistical modeling. This paper is by no means a comprehensive survey of all possible techniques of signal modeling in speech recognition. There are far too many algorithms in use today to make an exhaustive survey feasible (and cohesive). Instead, this paper is meant to serve as a tutorial on signal processing in state-of-the-art speech recognition systems and to review those techniques most commonly used. In keeping with this goal, a complete mathematical description of each algorithm has been included in the paper.
Sparse solutions to linear inverse problems with multiple measurement vectors
- IEEE Trans. Signal Processing
, 2005
"... Abstract—We address the problem of finding sparse solutions to an underdetermined system of equations when there are multiple measurement vectors having the same, but unknown, sparsity structure. The single measurement sparse solution problem has been extensively studied in the past. Although known ..."
Abstract
-
Cited by 68 (6 self)
- Add to MetaCart
Abstract—We address the problem of finding sparse solutions to an underdetermined system of equations when there are multiple measurement vectors having the same, but unknown, sparsity structure. The single measurement sparse solution problem has been extensively studied in the past. Although known to be NP-hard, many single–measurement suboptimal algorithms have been formulated that have found utility in many different applications. Here, we consider in depth the extension of two classes of algorithms–Matching Pursuit (MP) and FOCal Underdetermined System Solver (FOCUSS)–to the multiple measurement case so that they may be used in applications such as neuromagnetic imaging, where multiple measurement vectors are available, and solutions with a common sparsity structure must be computed. Cost functions appropriate to the multiple measurement problem are developed, and algorithms are derived based on their minimization. A simulation study is conducted on a test-case dictionary to show how the utilization of more than one measurement vector improves the performance of the MP and FOCUSS classes of algorithm, and their performances are compared. I.
Lossy Source Coding
- IEEE Trans. Inform. Theory
, 1998
"... Lossy coding of speech, high-quality audio, still images, and video is commonplace today. However, in 1948, few lossy compression systems were in service. Shannon introduced and developed the theory of source coding with a fidelity criterion, also called rate-distortion theory. For the first 25 year ..."
Abstract
-
Cited by 46 (1 self)
- Add to MetaCart
Lossy coding of speech, high-quality audio, still images, and video is commonplace today. However, in 1948, few lossy compression systems were in service. Shannon introduced and developed the theory of source coding with a fidelity criterion, also called rate-distortion theory. For the first 25 years of its existence, rate-distortion theory had relatively little impact on the methods and systems actually used to compress real sources. Today, however, rate-distortion theoretic concepts are an important component of many lossy compression techniques and standards. We chronicle the development of rate-distortion theory and provide an overview of its influence on the practice of lossy source coding. Index Terms---Data compression, image coding, speech coding, rate distortion theory, signal coding, source coding with a fidelity criterion, video coding. I.
High Quality Audio Compression Using an Adaptive Wavelet Packet Decomposition and Psychoacoustic Modelling
- IEEE TRANS. SIG. PROC
, 1999
"... This paper presents a technique to incorporate psychoacoustic models into an adaptive wavelet packet scheme to achieve perceptually transparent compression of high quality (44.1 KHz) audio signals at about 45 KBits/sec. The filter bank structure adapts according to psychoacoustic criteria and accord ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper presents a technique to incorporate psychoacoustic models into an adaptive wavelet packet scheme to achieve perceptually transparent compression of high quality (44.1 KHz) audio signals at about 45 KBits/sec. The filter bank structure adapts according to psychoacoustic criteria and according to the computational complexity that is available at the decoder. This permits software implementations that can perform according to the computational power available in order to achieve real time coding /decoding. The bit allocation scheme is an adapted zerotree algorithm that also takes input from the psychoacoustic model. The measure of performance is a quantity called Subband Perceptual Rate which the filter bank structure adapts to approach the Perceptual Entropy (PE) as closely as possible. In addition, this method is also amenable to progressive transmission, that is, it can achieve the best quality of reconstruction possible considering the size of the bit stream available at the...
Speech Processing for Communications: Whats New?
- MULTITEL ASBL, 1 Copernic Ave, Initialis Scientific Park, B-7000 MONS(**) Faculté Polytechnique de Mons, TCTS Lab, 1 Copernic Ave, Initialis Scientific Park, B-7000
, 2001
"... Speech is one of the most complex signals an engineer has to handle. It is thus not surprising that its automatic processing has only recently found a wide market. In this paper we analyze the latest developments in speech coding, synthesis and recognition, and show why they were necessary for comme ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Speech is one of the most complex signals an engineer has to handle. It is thus not surprising that its automatic processing has only recently found a wide market. In this paper we analyze the latest developments in speech coding, synthesis and recognition, and show why they were necessary for commercial maturity. Synthesis based on automatic unit selection, robust recognition systems, and mixed excitation coders are among the topics discussed here. Introduction Speech, which is one of the most complex signals an engineer has to handle (although we would need another article to support this claim), is also the easiest way of communication between humans. This is not a paradox : as opposed to telecommunication signals, speech was not invented by engineers. It was there much before them. If engineers had been given the task of designing speech, they sure would not have made it the way it is (chances are we would speak sinusoids, possibly with the help of attached bio-electronic devices...
ANALYSIS OF BLOOD FLOW VELOCITY AND PRESSURE SIGNALS USING THE MULTIPULSE METHOD
"... Abstract. This paper shows how the multipulse method from digital signal processing can be used to accurately synthesize signals obtained from blood pressure and blood flow velocity sensors during posture change from sitting to standing. The multipulse method can be used to analyze signals that are ..."
Abstract
- Add to MetaCart
Abstract. This paper shows how the multipulse method from digital signal processing can be used to accurately synthesize signals obtained from blood pressure and blood flow velocity sensors during posture change from sitting to standing. The multipulse method can be used to analyze signals that are composed of pulses of varying amplitudes. One of the advantages of the multipulse method is that it is able to produce an accurate and efficient representation of the signals at high resolution. The signals are represented as a set of input impulses passed through an autoregressive (AR) filter. The parameters that define the AR filter can be used to distinguish different conditions. In addition, the AR coefficients can be transformed to tube radii associated with digital wave guides, as well as pole-zero representation. Analysis of the dynamics of the model parameters have potential to provide better insight and understanding of the underlying physiological control mechanisms. For example, our data indicate that the tube radii may be related to the diameter of the blood vessels.
Speech Coding
, 1999
"... Introduction By "speech coding" we mean a method of reducing the amount of information needed to represent a speech signal. Speech coding has become an exciting and active area of research --- particularly in the past decade. Due to the development of several fundamental and powerful ideas, the sub ..."
Abstract
- Add to MetaCart
Introduction By "speech coding" we mean a method of reducing the amount of information needed to represent a speech signal. Speech coding has become an exciting and active area of research --- particularly in the past decade. Due to the development of several fundamental and powerful ideas, the subject had a rebirth in the 1980's. Speech coding provides a solution to the handling of the huge and increasing volume of information that needs to be carried from one point to another which often leads to the saturation of the capacity of existing telecommunications links -- even with the enormous channel capacities of fiber optic transmission systems. Furthermore, in the emerging era of large scale wireless communication, the use of speech coding techniques is essential for the tetherless transmission of information. Not only communication but also voice storage and multimedia applications now require digital speech coding. The recent advances in programmable digital signal processing chips

