Results 1 - 10
of
15
Correlation-based feature selection for machine learning
, 1998
"... A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that ..."
Abstract
-
Cited by 86 (3 self)
- Add to MetaCart
A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set.
Discriminative Training of Hidden Markov Models
, 1998
"... vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Finding the Best Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Setting the Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Objective Functions 19 3.1 Properties of Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . 19 3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Maximum Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Frame Discrimination . . . . . . . . . . . . . . . . ....
Face Identification and Feature Extraction Using Hidden Markov Models
- Image Processing: Theory and Applications
"... This paper details work done on automatic face identification. A new approach to the problem is proposed involving the use of Hidden Markov Models. We illustrate how these models allow the automatic extraction of facial features and the classification of face images. Some experiments are presented t ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
This paper details work done on automatic face identification. A new approach to the problem is proposed involving the use of Hidden Markov Models. We illustrate how these models allow the automatic extraction of facial features and the classification of face images. Some experiments are presented to support the plausibility of this approach. Successful results were obtained under the constraints of homogeneous lighting and constant background.
Face Segmentation For Identification Using Hidden Markov Models
, 1993
"... This paper details work done on face processing using a novel approach in- volving Hidden Markov Models. Experimental results from earher work [14] indicated that left-to-right models with use of structural information yield better feature extraction than ergodic models. This paper illustrates how t ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
This paper details work done on face processing using a novel approach in- volving Hidden Markov Models. Experimental results from earher work [14] indicated that left-to-right models with use of structural information yield better feature extraction than ergodic models. This paper illustrates how these hybrid models can be used to extract facial bands and automatically segment a face image into meaningful regions, showing the benefits of simul- taneous use of statistical and structural information. It is shown how the segmented data can be used to identify different subjects. Successful segmentation and identification of face images was obtained, even when facial details (with/without glasses, smihng/non-smiling, open/closed eyes) were varied. Some experiments with a simple left-to-right model are presented to support the plausibility of this approach. Finally, present and future directions of research work using these models are indicated.
Reconstruction Of Incomplete Spectrograms For Robust Speech Recognition
, 2000
"... The performance of automatic speech recognition (ASR) systems degrades greatly when speech is corrupted by noise. Missing feature methods attempt to reduce this degradation by deleting components of a time-frequency representation of speech (such as a spectrogram) that exhibit low signal-to-noise ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The performance of automatic speech recognition (ASR) systems degrades greatly when speech is corrupted by noise. Missing feature methods attempt to reduce this degradation by deleting components of a time-frequency representation of speech (such as a spectrogram) that exhibit low signal-to-noise ratio (SNR). Recognition is then performed using only the remaining components of the incomplete spectrogram. These methods have been shown to result in recognition accuracies that are very robust to the effects of additive noise. However, conventional missing feature methods, which modify the classifier used to perform the recognition, suffer from the drawback that they are constrained to use the log-spectral vectors of the spectrogram as features for recognition. It is well known recognition systems that use log-spectral features perform poorly compared to systems that use cepstral features. In this
Speech Spectrum Modelling from Multiple Sources
, 2000
"... This project presents a new model for a discrete Fourier transform (DFT) magnitude spectrum for the case when the spectrum shows combined input sound from multiple harmonic sources. The model is a Gaussian Mixture Model (GMM) whose means are constrained to lie at integer multiples of the component s ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This project presents a new model for a discrete Fourier transform (DFT) magnitude spectrum for the case when the spectrum shows combined input sound from multiple harmonic sources. The model is a Gaussian Mixture Model (GMM) whose means are constrained to lie at integer multiples of the component sources' fundamental frequencies. This model is then used to perform signal separation, recovering the independent component sounds from the original combined signal. For this model to be successfully applied, reliable estimates of the fundamental frequencies (F0s) in the combined input signal must be obtained - a successful new multiple F0 detection scheme is presented for doing this. The F0 values are used to initialise a new implementation of the Expectation-Maximisation (EM) algorithm that fits the GMM to the spectrum, a prerequisite for successful signal separation. A testing methodology is developed that allows for the comparison of the synthesised separate sound signals with the original signals. The average signal to noise ratio (SNR) for resynthesised sounds extracted from two TIMIT speech database vowel combinations (given reliable F0 data) is shown to be 10.14, near the threshold of a non-perceptible difference from the originals.
Direct Speech Feature Estimation Using an Iterative EM Algorithm for Vocal Fold Pathology Detection
- IEEE Transactions on Biomedical Engineering
, 1996
"... The focus of this study is to formulate a speech parameter estimation algorithm for analysis/detection of vocal fold pathology. The speech processing algorithm proposed estimates features necessary to formulate a stochastic model to characterize healthy and pathology conditions, from speech recordin ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The focus of this study is to formulate a speech parameter estimation algorithm for analysis/detection of vocal fold pathology. The speech processing algorithm proposed estimates features necessary to formulate a stochastic model to characterize healthy and pathology conditions, from speech recordings. The general idea is to separate speech components under healthy and assumed pathology conditions. This problem is addressed using an iterative maximum likelihood (ML) estimation procedure, based on the Estimation-Maximization (EM) algorithm. A new feature for characterizing pathology, termed Enhanced Spectral Pathology Component (ESPC) is estimated and shown to vary consistently between healthy and pathology conditions. It is also shown that the Mean Area Peak Value (MAPV) and the weighted slope (WSLOPE) indexes, which are obtained from the ESPC estimate, are meaningful measures of speech pathology conditions. For classification purposes, a 5-state Hidden Markov Model (HMM) recognizer w...
COMPARISON OF NEURAL NETWORKS FOR SPEAKER RECOGNITION
"... In a world where authentication and privacy are taking a lot of our daily efforts, it is becoming more important for us to prove our identity to different systems every day so that we can access required and useful services. The problem addressed in this research is speaker verification as it involv ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In a world where authentication and privacy are taking a lot of our daily efforts, it is becoming more important for us to prove our identity to different systems every day so that we can access required and useful services. The problem addressed in this research is speaker verification as it involves knowing the identity of a given speaker using a predefined set of samples. The steps of this process start with processing the voice signal using the Fast Fourier Transform (FFT), the Hanning window, and a histogram representation to make it suitable for the next part. The identification part is based on a neural network where the identification can be done in one or two classification parts. Finally, several different algorithms were tested and the results compared. 1.
An Algorithm for V/UV/S Segmentation of Speech by
"... Let f(n) be a sampled voice signal. Our goal is to identify the voiced (V) portions of f (as opposed to the silence (S) and unvoiced (UV) portions). In the following discussion, the sampling rate is 22050 Hz, quantized at 8 bits. A window of length of 880 is twice the maximum period of the minimum f ..."
Abstract
- Add to MetaCart
Let f(n) be a sampled voice signal. Our goal is to identify the voiced (V) portions of f (as opposed to the silence (S) and unvoiced (UV) portions). In the following discussion, the sampling rate is 22050 Hz, quantized at 8 bits. A window of length of 880 is twice the maximum period of the minimum frequency of 50 Hz we will track in the time domain. We use the algorithm described below to provide a robust estimate of the fundamental period to start our glottal pulse (GP) or pitch tracker described in [1]. We compute R(f), the discrete Fourier transform (DFT) for samples in the window. Let T(f) = | R(f) | having eliminated the last half of R because of symmetry. If the window contains more than one occurrence of the fundamental period of a voiced utterance, there will be “aliasing ” in T, where we define “aliasing ” to be any inaccuracies in the frequency analysis of a periodic signal resulting from a poor choice of window size. A plot of T(f) in Figure 1 is of voiced speech. Figure 1(a) DFT with aliasing. Figure 1(b) DFT w/o aliasing Notice Figure 1(b) is the envelope of Figure 1(a). The aliasing is seen in Figure 2 by observing the large area under the DC term and under the first peak. The real Cepstrum c(f) is the logarithm of the power spectrum, c(f) = 2 ln | R(f) |. From the properties of the Cepstrum, we know c(f) consists of two components: a slowly varying component which corresponds to the spectral envelope and a rapidly varying component which corresponds to the pitch harmonic peaks [2]. Since the logarithm is monotonically increasing, R(f) also has two components: one which corresponds to the spectral envelope and another which corresponds to the pitch harmonic peaks. These components can be separated by filtering, which is the traditional way to proceed, or by a second Fourier transform, which we have found to be more resistant to noise. Y(f) = T(R(f)). A graph of Y(f) in the case of a voiced utterance is shown as follows:
Maximization Of The Subjective Loudness Of Speech With Constrained Amplitude
, 1999
"... We introduce an adaptive algorithm for constraining the amplitude of speech signals while at the same time trying to maintain the subjective loudness and trying not to produce disturbing artifacts. The algorithm can be applied to compensate for the clipping distortion of amplifiers in speech reprodu ..."
Abstract
- Add to MetaCart
We introduce an adaptive algorithm for constraining the amplitude of speech signals while at the same time trying to maintain the subjective loudness and trying not to produce disturbing artifacts. The algorithm can be applied to compensate for the clipping distortion of amplifiers in speech reproduction devices. The algorithm analyzes the speech signal on multiple frequency bands and applies an internal audibility law in order to make inaudible changes to the signal. An example of the audibility law, presented in the form of a matrix, is described, associated with a specific speech reproduction device. Multiple band-pass signals are processed with a waveshaper to accomplish soft-clipping and to constrain the amplitude of the processed signal. When processed with the proposed algorithm, the computational loudness value of speech signals was found to diminish only slightly (approximately 6 sones) during processing, while at the same time the signal amplitude could be reduced by even ...

