Results 1 -
7 of
7
Combined Binary Classifiers With Applications To Speech Recognition
- NEAREST-NEIGHBOR ECOC WITH APPLICATION TO ALL-PAIRS MULTICLASS SVM
, 2002
"... Many applications require classification of examples into one of several classes. A common way of designing such classifiers is to determine the class based on the outputs of several binary classifiers. We consider some of the most popular methods for combining the decisions of the binary classifier ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Many applications require classification of examples into one of several classes. A common way of designing such classifiers is to determine the class based on the outputs of several binary classifiers. We consider some of the most popular methods for combining the decisions of the binary classifiers, and improve existing bounds on the error rates of the combined classifier over the training set. We also describe a new method for combining binary classifiers. The method is based on stacking a neural network and, when used with support vector machines as the binary learners, substantially decreased the error rate in two vowel classification tasks.
STOP CONSONANT CLASSIFICATION USING WAVELET PACKET TRANSFORMS AND A NEURAL NETWORK
"... A wavelet packet transform is described to compute N spectral/temporal features for the 6 English stop consonants /b,p,d,t,g,k/. These features were used by a Binary Pair Partitioned neural network for speaker-independent classification of the stop consonants. The wavelet packet transform is generat ..."
Abstract
- Add to MetaCart
A wavelet packet transform is described to compute N spectral/temporal features for the 6 English stop consonants /b,p,d,t,g,k/. These features were used by a Binary Pair Partitioned neural network for speaker-independent classification of the stop consonants. The wavelet packet transform is generated by a pair of quadratic mirror filters which decompose the signal into a series of subbands ("frequency slots") by repeated convolution and decimation. Choosing a complete set of subbands and dividing each subband into a number of "time slots " defines a decomposition of the time-frequency plane into N phase cells. The N mean square values (energy values) in these phase cells provide the N features for the neural network. The number of features N was varied between 18 and 200. One advantage of this type of wavelet analysis is that it is much faster than the conventional FFT based methods which is of particular interest for real time applications. In addition there is the potential to exploit non-uniform time/frequency resolution. Experimental results obtained with the stops extracted from the TIMIT data base will be presented in the paper.
THE APPLICATION OF BINARY-PAIR PARTITIONED NEURAL NETWORKS TO List 1: Pattern Recognition THE SPEAKER VERIFICATION TASK
"... A method is presented for the application of binary-pair partitioned neural networks in the task of speaker verification. The binary-pair partitioned neural network is a previously developed technique used for speaker identification [1]. The training and evaluation procedures are discussed, as well ..."
Abstract
- Add to MetaCart
A method is presented for the application of binary-pair partitioned neural networks in the task of speaker verification. The binary-pair partitioned neural network is a previously developed technique used for speaker identification [1]. The training and evaluation procedures are discussed, as well as the selection of the verification thresholds. For a verification task of 30 users and 41 impostors an accuracy of 96.3 percent was achieved using 13.5 seconds of input speech extracted from the DARPA/TIMIT database [2]. For input speech lengths as low as 2.7 seconds the system maintains a 86.9 percent accuracy.
SIGNAL MODELING WITH NON-UNIFORM TIME SAMPLING OF FEATURES FOR AUTOMATIC SPEECH RECOGNITION
, 2000
"... This dissertation presents an investigation of nonuniform time sampling methods for spectral/temporal feature extraction in speech. Frame-based features were computed based on an encoding of the global spectral shape using a Discrete Cosine Transform. In most current “standard” methods, trajectory ( ..."
Abstract
- Add to MetaCart
This dissertation presents an investigation of nonuniform time sampling methods for spectral/temporal feature extraction in speech. Frame-based features were computed based on an encoding of the global spectral shape using a Discrete Cosine Transform. In most current “standard” methods, trajectory (dynamic) features are determined from frame-based parameters using a fixed time sampling, i.e., fixed block length and fixed block spacing. In this research, new methods are proposed and investigated in which block length and/or block spacing are variable. The idea was initially tested with HMM-based isolated word recognition, and a significant performance improvement resulted when a variable block length and variable block method were applied. An accuracy of 97.9 % was obtained with an alphabet recognition task using the ISOLET database. This result is
2012 IEEE Statistical Signal Processing Workshop (SSP) PHASE DIFFERENCE OF FILTER-STABLE PART-TONES AS ACOUSTIC FEATURE
"... A part-tone decomposition of voiced sections of speech is introduced, which is adapted with high accuracy to the frequency of the glottal oscillator of the speaker. The iterative replacement of the center filter frequency contours (chosen locally as linear chirp) of the non-stationary bandpass filte ..."
Abstract
- Add to MetaCart
A part-tone decomposition of voiced sections of speech is introduced, which is adapted with high accuracy to the frequency of the glottal oscillator of the speaker. The iterative replacement of the center filter frequency contours (chosen locally as linear chirp) of the non-stationary bandpass filters converges extremely fast and leads to the extraction of filter-stable part-tones with uncorrupted phases. In contrast to phases of frequency decomposition with a priori defined, constant filter frequencies, the phase differences of filterstable part-tones promise to become a useful supplement of the amplitude based acoustic features used for conventional automatic speech recognition. The derived phase features are tested in vowel classification experiments based on the phonetically rich TIMIT database. Index Terms — time-frequency decomposition, filter stable part-tones, voiced speech, acoustic feature, relative phase 1.
Non-Stationary Signal Processing and its Application in Speech Recognition
"... The most widely used acoustic feature extraction methods of current automatic speech recognition (ASR) systems are based on the assumption of stationarity. In this paper we extensively evaluate a recently introduced filter stable, non-stationary signal processing method, which relies on an adaptive ..."
Abstract
- Add to MetaCart
The most widely used acoustic feature extraction methods of current automatic speech recognition (ASR) systems are based on the assumption of stationarity. In this paper we extensively evaluate a recently introduced filter stable, non-stationary signal processing method, which relies on an adaptive parttone decomposition of voiced speech to obtain alternative feature vectors for ASR. The non-stationary filterbank allows for more noise robust amplitude based features by suppressing the between-harmonics regions. Furthermore, by adapting the center filter frequencies to the underlying acoustic modes, it is possible to obtain useful phase features which can be interpreted in terms of the non-stationary dynamics within the vocal tract. The features are evaluated on different tasks ranging from vowel classification up to large vocabulary continuous speech recognition. Index Terms: non-stationary, adaptive filter, noise robust, phase features, ASR

