Results 1  10
of
642
A tutorial on hidden Markov models and selected applications in speech recognition
 PROCEEDINGS OF THE IEEE
, 1989
"... Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical s ..."
Abstract

Cited by 5892 (1 self)
 Add to MetaCart
Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Second the models, when applied properly, work very well in practice for several important applications. In this paper we attempt to carefully and methodically review the theoretical aspects of this type of statistical modeling and show how they have been applied to selected problems in machine recognition of speech.
Wallflower: Principles and Practice of Background Maintenance
, 1999
"... Background maintenance is a frequent element of video surveillance systems. We develop Wallflower, a threecomponent system for background maintenance: the pixellevel component performs Wiener filtering to make probabilistic predictions of the expected background; the regionlevel component fills i ..."
Abstract

Cited by 477 (1 self)
 Add to MetaCart
Background maintenance is a frequent element of video surveillance systems. We develop Wallflower, a threecomponent system for background maintenance: the pixellevel component performs Wiener filtering to make probabilistic predictions of the expected background; the regionlevel component fills in homogeneous regions of foreground objects; and the framelevel component detects sudden, global changes in the image and swaps in better approximations of the background. We compare our system with 8 other background subtraction algorithms. Wallflower is shown to outperform previous algorithms by handling a greater set of the difficult situations that can occur. Finally, we analyze the experimental results and propose
Speech Analysis
, 1998
"... Contents 1 Introduction 4 1.1 What is Speech Analysis? . . . . . . . . . . . . . . . . . . . . 4 1.1.1 So what is an acoustic vector? . . . . . . . . . . . . . . 4 1.2 Why Speech Analysis? . . . . . . . . . . . . . . . . . . . . . . 4 1.3 The problems of speech analysis . . . . . . . . . . . . . . ..."
Abstract

Cited by 359 (0 self)
 Add to MetaCart
Contents 1 Introduction 4 1.1 What is Speech Analysis? . . . . . . . . . . . . . . . . . . . . 4 1.1.1 So what is an acoustic vector? . . . . . . . . . . . . . . 4 1.2 Why Speech Analysis? . . . . . . . . . . . . . . . . . . . . . . 4 1.3 The problems of speech analysis . . . . . . . . . . . . . . . . . 7 1.4 Standard references for this course . . . . . . . . . . . . . . . 7 2 Background 7 2.1 Sampling theory . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Sampling frequency . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Sampling resolution . . . . . . . . . . . . . . . . . . . . 8 2.2 Linear filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Finite Impulse Response filters . . . . . . . . . . . . . 8 2.2.2 Infinite Impulse Response filters . . . . . . . . . . . . . 11 2.3 The source filter model of speech . . . . . . . . . . . . . . . . 12 3 Filter bank Analysis 12 3.1 Spectrograms . . . . . . . . .
Speaker recognition: A tutorial
"... A tutorial on the design and development of automatic speakerrecognition systems is presented. Automatic speaker recognition is the use of a machine to recognize a person from a spoken phrase. These systems can operate in two modes: to identify a particular person or to verify a person’s claimed id ..."
Abstract

Cited by 269 (2 self)
 Add to MetaCart
A tutorial on the design and development of automatic speakerrecognition systems is presented. Automatic speaker recognition is the use of a machine to recognize a person from a spoken phrase. These systems can operate in two modes: to identify a particular person or to verify a person’s claimed identity. Speech processing and the basic components of automatic speakerrecognition systems are shown and design tradeoffs are discussed. Then, a new automatic speakerrecognition system is given. This recognizer performs with 98.9 % correct identification. Last, the performances of various systems are compared.
An overview of textindependent speaker recognition: from features to supervectors
, 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on textindependent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the stateoftheart methods. We start with the fundamentals of ..."
Abstract

Cited by 156 (37 self)
 Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on textindependent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the stateoftheart methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.
A note on the stochastic realization problem
 Hemisphere Publishing Corporation
, 1976
"... Abstract. Given a mean square continuous stochastic vector process y with stationary increments and a rational spectral density such that (oo) is finite and nonsingular, consider the problem of finding all minimal (wide sense) Markov representations (stochastic realizations) of y. All such realizati ..."
Abstract

Cited by 133 (28 self)
 Add to MetaCart
Abstract. Given a mean square continuous stochastic vector process y with stationary increments and a rational spectral density such that (oo) is finite and nonsingular, consider the problem of finding all minimal (wide sense) Markov representations (stochastic realizations) of y. All such realizations are characterized and classified with respect to deterministic as well as probabilistic properties. It is shown that only certain realizations (internal stochastic realizations) can be determined from the given output process y. All others (external stochastic realizations)require that the probability space be extended with an exogeneous random component. A complete characterization of the sets of internal and external stochastic realizations is provided. It is shown that the state process of any internal stochastic realization can be expressed in terms of two steadystate KalmanBucy filters, one evolving forward in time over the infinite past and one backward over the infinite future. An algorithm is presented which generates families Of external realizations defined on the same probability space and totally ordered with respect to state covariances. 1. Introduction. One
Aggregate features and AdaBoost for music classification
 Machine Learning
, 2006
"... Abstract. We present an algorithm that predicts musical genre and artist from an audio waveform. Our method uses the ensemble learner AdaBoost to select from a set of audio features that have been extracted from segmented audio and then aggregated. Our classifier proved to be the most effective meth ..."
Abstract

Cited by 84 (16 self)
 Add to MetaCart
(Show Context)
Abstract. We present an algorithm that predicts musical genre and artist from an audio waveform. Our method uses the ensemble learner AdaBoost to select from a set of audio features that have been extracted from segmented audio and then aggregated. Our classifier proved to be the most effective method for genre classification at the recent MIREX 2005 international contests in music information extraction, and the secondbest method for recognizing artists. This paper describes our method in detail, from feature extraction to song classification, and presents an evaluation of our method on three genre databases and two artistrecognition databases. Furthermore, we present evidence collected from a variety of popular features and classifiers that the technique of classifying features aggregated over segments of audio is better than classifying either entire songs or individual shorttimescale features.
Iterative Channel Estimation and Decoding of Pilot Symbol Assisted Turbo Codes Over FlatFading Channels
 IEEE Journal on Selected Areas in Communications
, 2001
"... A method for coherently detecting and decoding turbo coded binary phase shift keying (BPSK) signals transmitted over frequencyflat fading channels is discussed. Estimates of the complex channel gain and variance of the additive noise are derived first from known pilot symbols and an estimation fi ..."
Abstract

Cited by 81 (3 self)
 Add to MetaCart
(Show Context)
A method for coherently detecting and decoding turbo coded binary phase shift keying (BPSK) signals transmitted over frequencyflat fading channels is discussed. Estimates of the complex channel gain and variance of the additive noise are derived first from known pilot symbols and an estimation filter. After each iteration of turbo decoding, the channel estimates are refined using information fed back from the decoder. Both harddecision and softdecision feedback are considered and compared with three baseline turbocoded systems: 1) a BPSK system that has perfect channel estimates; 2) a system that uses differential phase shift keying and hence needs no estimates; and 3) a system that performs channel estimation using pilot symbols but has no feedback path from decoder to estimator. Performance can be further improved by borrowing channel estimates from the previously decoded frame. Simulation results show the influence of pilot symbol spacing, estimation filter size and type, and fade rate. Performance within 0.49 and 1.16 dB of turbocoded BPSK with perfect coherent detection is observed at a biterror rate of 10 4 for normalized fade rates of =0005 and =002, respectively.
Graphical models and automatic speech recognition
 Mathematical Foundations of Speech and Language Processing
, 2003
"... Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recog ..."
Abstract

Cited by 78 (15 self)
 Add to MetaCart
(Show Context)
Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recognition techniques commonly used as part of a speech recognition system can be described by a graph – this includes Gaussian distributions, mixture models, decision trees, factor analysis, principle component analysis, linear discriminant analysis, and hidden Markov models. Moreover, this paper shows that many advanced models for speech recognition and language processing can also be simply described by a graph, including many at the acoustic, pronunciation, and languagemodeling levels. A number of speech recognition techniques born directly out of the graphicalmodels paradigm are also surveyed. Additionally, this paper includes a novel graphical analysis regarding why derivative (or delta) features improve hidden Markov modelbased speech recognition by improving structural discriminability. It also includes an example where a graph can be used to represent language model smoothing constraints. As will be seen, the space of models describable by a graph is quite large. A thorough exploration of this space should yield techniques that ultimately will supersede the hidden Markov model.