Results 1  10
of
31
Speech Recognition using SVMs
 Advances in Neural Information Processing Systems 14
, 2002
"... An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihoodratio. The sco ..."
Abstract

Cited by 70 (17 self)
 Add to MetaCart
An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihoodratio. The scorespace de ned by this mapping avoids some limitations of the Fisher score. Classconditional generative models are directly incorporated into the de nition of the scorespace. The mapping, and appropriate normalisation schemes, are evaluated on a speakerindependent isolated letter task where the new mapping outperforms both the Fisher score and HMMs trained to maximise likelihood.
Data clustering using a model granular magnet
 Neural Computation
, 1997
"... We present a new approach to clustering, based on the physical properties of an inhomogeneous ferromagnet. No assumption is made regarding the underlying distribution of the data. We assign a Potts spin to each data point and introduce an interaction between neighboring points, whose strength is a d ..."
Abstract

Cited by 57 (2 self)
 Add to MetaCart
We present a new approach to clustering, based on the physical properties of an inhomogeneous ferromagnet. No assumption is made regarding the underlying distribution of the data. We assign a Potts spin to each data point and introduce an interaction between neighboring points, whose strength is a decreasing function of the distance between the neighbors. This magnetic system exhibits three phases. At very low temperatures, it is completely ordered; all spins are aligned. At very high temperatures, the system does not exhibit any ordering, and in an intermediate regime, clusters of relatively strongly coupled spins become ordered, whereas different clusters remain uncorrelated. This intermediate phase is identified by a jump in the order parameters. The spinspin correlation function is used to partition the spins and the corresponding data points into clusters. We demonstrate on three synthetic and three real data sets how the method works. Detailed comparison to the performance of other techniques clearly indicates the relative success of our method. 1
Largescale parallel data clustering
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... Abstract—Algorithmic enhancements are described that enable large computational reduction in mean squareerror data clustering. These improvements are incorporated into a parallel dataclustering tool, PCLUSTER, designed to execute on a network of workstations. Experiments involving the unsupervise ..."
Abstract

Cited by 41 (4 self)
 Add to MetaCart
Abstract—Algorithmic enhancements are described that enable large computational reduction in mean squareerror data clustering. These improvements are incorporated into a parallel dataclustering tool, PCLUSTER, designed to execute on a network of workstations. Experiments involving the unsupervised segmentation of standard texture images were performed. For some data sets, a 96 percent reduction in computation was achieved. Index Terms—Data clustering, mean square error, data mining, image segmentation, parallel algorithm, network of workstations. ——————— — F ———————— 1
Probabilistic Segmentation for SegmentBased Speech Recognition
, 1998
"... Segmentbased speech recognition systems must explicitly hypothesize segment start and end times. The purpose of a segmentation algorithm is to hypothesize those times and to compose a graph of segments from them. During recognition, this graph is an input to a search that finds the optimal sequence ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
Segmentbased speech recognition systems must explicitly hypothesize segment start and end times. The purpose of a segmentation algorithm is to hypothesize those times and to compose a graph of segments from them. During recognition, this graph is an input to a search that finds the optimal sequence of sound units through the graph. The goal of this thesis is to create a highquality, realtime phonetic segmentation algorithm for segmentbased speech recognition. A highquality segmentation algorithm produces a sparse network of segments that contains most of the actual segments in the speech utterance. A realtime algorithm implies that it is fast, and that it is able to produce an output in a pipelined manner. The approach taken in this thesis is to adopt the framework of a stateoftheart algorithm that does not operate in realtime, and to make the modifications necessary to enable it to run in realtime. The algorithm adopted as the starting point for this work makes use of a for...
Connected Letter Recognition with a MultiState Time Delay Neural Network
 In 3rd European Conference on Speech, Communication and Technology (EUROSPEECH) 93
, 1993
"... The MultiState Time Delay Neural Network (MSTDNN) integrates a nonlinear time alignment procedure (DTW) and the highaccuracy phoneme spotting capabilities of a TDNN into a connectionist speech recognition system with wordlevel classification and error backpropagation. We present an MSTDNN for re ..."
Abstract

Cited by 23 (13 self)
 Add to MetaCart
The MultiState Time Delay Neural Network (MSTDNN) integrates a nonlinear time alignment procedure (DTW) and the highaccuracy phoneme spotting capabilities of a TDNN into a connectionist speech recognition system with wordlevel classification and error backpropagation. We present an MSTDNN for recognizing continuously spelled letters, a task characterized by a small but highly confusable vocabulary. Our MSTDNN achieves 98.5/92.0% word accuracy on speaker dependent/independent tasks, outperforming previously reported results on the same databases. We propose training techniques aimed at improving sentence level performance, including free alignment across word boundaries, word duration modeling and error backpropagation on the sentence rather than the word level. Architectures integrating submodules specialized on a subset of speakers achieved further improvements. 1 INTRODUCTION The recognition of spelled strings of letters is essential for all applications involving proper names,...
Efficient Kernel Discriminant Analysis via Spectral Regression
"... Linear Discriminant Analysis (LDA) has been a popular method for extracting features which preserve class separability. The projection vectors are commonly obtained by maximizing the between class covariance and simultaneously minimizing the within class covariance. LDA can be performed either in th ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
Linear Discriminant Analysis (LDA) has been a popular method for extracting features which preserve class separability. The projection vectors are commonly obtained by maximizing the between class covariance and simultaneously minimizing the within class covariance. LDA can be performed either in the original input space or in the reproducing kernel Hilbert space (RKHS) into which data points are mapped, which leads to Kernel Discriminant Analysis (KDA). When the data are highly nonlinear distributed, KDA can achieve better performance than LDA. However, computing the projective functions in KDA involves eigendecomposition of kernel matrix, which is very expensive when a large number of training samples exist. In this paper, we present a new algorithm for kernel discriminant analysis, called Spectral Regression Kernel Discriminant Analysis (SRKDA). By using spectral graph analysis, SRKDA casts discriminant analysis into a regression framework which facilitates both efficient computation and the use of regularization techniques. Specifically, SRKDA only needs to solve a set of regularized regression problems and there is no eigenvector computation involved, which is a huge save of computational cost. Moreover, the new formulation makes it very easy to develop incremental version of the algorithm which can fully utilize the computational results of the existing training samples. Extensive experiments on spoken letter, handwritten digit image and face image data demonstrate the effectiveness and efficiency of the proposed algorithm.
Deterministically Annealed Design of Hidden Markov Model Speech Recognizers
, 2001
"... Many conventional speech recognition systems are based on the use of hidden Markov models (HMM) within the context of discriminantbased pattern classification. While the speech recognition objective is a low rate of misclassification, HMM design has been traditionally approached via maximum likelih ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
Many conventional speech recognition systems are based on the use of hidden Markov models (HMM) within the context of discriminantbased pattern classification. While the speech recognition objective is a low rate of misclassification, HMM design has been traditionally approached via maximum likelihood (ML) modeling which is, in general, mismatched with the minimum error objective and hence suboptimal. Direct minimization of the error rate is difficult because of the complex nature of the cost surface, and has only been addressed recently by discriminative design methods such as generalized probabilistic descent (GPD). While existing discriminative methods offer significant benefits, they commonly rely on local optimization via gradient descent whose performance suffers from the prevalence of shallow local minima. As an alternative, we propose the deterministic annealing (DA) design method that directly minimizes the error rate while avoiding many poor local minima of the cost. DA is derived from fundamental principles of statistical physics and information theory. In DA, the HMM classifier's decision is randomized and its expected error rate is minimized subject to a constraint on the level of randomness which is measured by the Shannon entropy. The entropy constraint is gradually relaxed, leading in the limit of zero entropy to the design of regular nonrandom HMM classifiers. An efficient forwardbackward algorithm is proposed for the DA method. Experiments on synthetic data and on a simplified recognizer for isolated English letters demonstrate that the DA design method can improve recognition error rates over both ML and GPD methods.
Class Visualization of HighDimensional Data with Applications
, 2003
"... Consider the problem of visualizing highdimensional data that has been categorized into various classes. Our goal in visualizing is to quickly absorb interclass and intraclass relationships. Towards this end, classpreserving projections of the multidimensional data onto twodimensional planes, ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Consider the problem of visualizing highdimensional data that has been categorized into various classes. Our goal in visualizing is to quickly absorb interclass and intraclass relationships. Towards this end, classpreserving projections of the multidimensional data onto twodimensional planes, which can be displayed on a computer screen, are introduced. These classpreserving projections maintain the highdimensional class structure, and are closely related to Fisher's linear discriminants. By displaying sequences of such twodimensional projections and by moving continuously from one projection to the next, an illusion of smooth motion through a multidimensional display can be created. We call such sequences class tours. Furthermore, we overlay classsimilarity graphs on our twodimensional projections to capture the distance relationships in the original highdimensional space. We illustrate
English Alphabet Recognition with Telephone Speech
 Advances in Neural Information Processing Systems 4
, 1992
"... A recognition system is reported which recognizes names spelled over the telephone with brief pauses between letters. The system uses separate neural networks to locate segment boundaries and classify letters. The letter scores are then used to search a database of names to find the best scoring nam ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
A recognition system is reported which recognizes names spelled over the telephone with brief pauses between letters. The system uses separate neural networks to locate segment boundaries and classify letters. The letter scores are then used to search a database of names to find the best scoring name. The speakerindependent classification rate for spoken letters is 89%. The system retrieves the correct name, spelled with pauses between letters, 91% of the time from a database of 50,000 names. 1 INTRODUCTION The English alphabet is difficult to recognize automatically because many letters sound alike; e.g., B/D, P/T, V/Z and F/S. When spoken over the telephone, the information needed to discriminate among several of these pairs, such as F/S, P/T, B/D and V/Z, is further reduced due to the limited bandwidth of the channel Speakerindependent recognition of spelled names over the telephone is difficult due to variability caused by channel distortions, different handsets, and a variety o...
Unsupervised feature selection for multicluster data
 KDD
, 2010
"... In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. In this paper, we consider the feature selection ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenario, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. Traditional unsupervised feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature. These approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset. Inspired from the recent developments on manifold learning and L1regularized models for subset selection, we propose in this paperanewapproach,calledMultiCluster Feature Selection (MCFS), for unsupervised feature selection. Specifically, we select those features such that the multicluster structure of the data can be best preserved. The corresponding optimization problem can be efficiently solved since it only involves a sparse eigenproblem and a L1regularized least squares problem. Extensive experimental results over various reallife data sets have demonstrated the superiority of the proposed algorithm.