Results 1  10
of
15
An Application of Recurrent Nets to Phone Probability Estimation
 IEEE Transactions on Neural Networks
, 1994
"... This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed ..."
Abstract

Cited by 193 (8 self)
 Add to MetaCart
This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed
Shortlist: a connectionist model of continuous speech recognition
 Cognition
, 1994
"... Previous work has shown how a backpropagation network with recurrent connections can successfully model many aspects of human spoken word recognition (Norris, 1988, 1990, 1992, 1993). However, such networks are unable to revise their decisions in the light of subsequent context. TRACE (McClelland ..."
Abstract

Cited by 171 (7 self)
 Add to MetaCart
Previous work has shown how a backpropagation network with recurrent connections can successfully model many aspects of human spoken word recognition (Norris, 1988, 1990, 1992, 1993). However, such networks are unable to revise their decisions in the light of subsequent context. TRACE (McClelland & Elman, 1986), on the other hand, manages to deal appropriately with following context, but only by using a highly implausible architecture that fails to account for some important experimental results. A new model is presented which displays the more desirable properties of each of these models. In contrast to TRACE the new model is entirely bottomup and can readily perform simulations with vocabularies of tens of thousands of words. 1.
Signal modeling techniques in speech recognition
 PROCEEDINGS OF THE IEEE
, 1993
"... We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or timederivative, spectral information, have become common. Second, similariry transform techniques, often used to norm ..."
Abstract

Cited by 126 (5 self)
 Add to MetaCart
We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or timederivative, spectral information, have become common. Second, similariry transform techniques, often used to normalize and decorrelate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signalâ€™s spectrum can be estimated in a closedloop manner. In this paper, we review the signal processing components of these algorithms. These algorithms are presented as part of a unified view of the signal parameterization problem in which there are three major tasks: measurement, transformation, and statistical modeling. This paper is by no means a comprehensive survey of all possible techniques of signal modeling in speech recognition. There are far too many algorithms in use today to make an exhaustive survey feasible (and cohesive). Instead, this paper is meant to serve as a tutorial on signal processing in stateoftheart speech recognition systems and to review those techniques most commonly used. In keeping with this goal, a complete mathematical description of each algorithm has been included in the paper.
Complete and Partial Fault Tolerance of Feedforward Neural Nets
 IEEE Trans. Neural Networks
, 1995
"... A method is proposed to estimate the fault tolerance of feedforward Artificial Neural Nets (ANNs) and synthesize robust nets. The fault model abstracts a variety of failure modes of hardware implementations to permanent stuckat type faults of single components. A procedure is developed to build f ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
A method is proposed to estimate the fault tolerance of feedforward Artificial Neural Nets (ANNs) and synthesize robust nets. The fault model abstracts a variety of failure modes of hardware implementations to permanent stuckat type faults of single components. A procedure is developed to build fault tolerant ANNs by replicating the hidden units. It exploits the intrinsic weighted summation operation performed by the processing units in order to overcome faults. It is simple, robust and is applicable to any feedforward net. Based on this procedure, metrics are devised to quantify the fault tolerance as a function of redundancy. Furthermore, a lower bound on the redundancy required to tolerate all possible single faults is analytically derived. This bound demonstrates that less than Triple Modular Redundancy (TMR) cannot provide complete fault tolerance for all possible single faults. This general result establishes a necessary condition that holds for all feedforward nets, irrespec...
Bayesian Neural Networks for Classification: How Useful is the Evidence Framework?
, 1998
"... This paper presents an empirical assessment of the Bayesian evidence framework for neural networks using four synthetic and four realworld classification problems. We focus on three issues; model selection, automatic relevance determination (ARD) and the use of committees. Model selection using the ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
This paper presents an empirical assessment of the Bayesian evidence framework for neural networks using four synthetic and four realworld classification problems. We focus on three issues; model selection, automatic relevance determination (ARD) and the use of committees. Model selection using the evidence criterion is only tenable if the number of training examples exceeds the number of network weights by a factor of five or ten. With this number of available examples, however, crossvalidation is a viable alternative. The ARD feature selection scheme is only useful in networks with many hidden units and for data sets containing many irrelevant variables. ARD is also useful as a hard feature selection method. Results on applying the evidence framework to the realworld data sets showed that committees of Bayesian networks achieved classification accuracies similar to the best alternative methods. Importantly, this was achievable with a minimum of human intervention. 1 Introduction ...
Connectivity and performance tradeoffs in the cascade correlation learning architecture
, 1992
"... Abstruct The Cascade Correlation [l] is a very flexible, efficient and fast algorithm for supervised learning. It incrementally builds the network by adding hidden units one at a time, until the desired input/output mapping is achieved. It connects all the previously installed units to the new unit ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Abstruct The Cascade Correlation [l] is a very flexible, efficient and fast algorithm for supervised learning. It incrementally builds the network by adding hidden units one at a time, until the desired input/output mapping is achieved. It connects all the previously installed units to the new unit being added. Consequently, each new unit in effect adds a new layer and the fanin of the hidden and output units keeps on increasing as more units get added. The resulting structure could be hard to implement in VLSI, because the connections are irregular and the fanin is unbounded. Moreover, the depth or the propagation delay through the resulting network is directly proportional to the number of units and can be excessive. We have modified the algorithm to generate networks with restricted fanin and small depth (propagation delay) by controlling the connectivity. Our results reveal that there is a tradeoff between connectivity and other performance attributes like depth, total number of independent parameters, learning time, etc. When the number of inputs or outputs is small relative to the size of the training set, a higher connectivity usually leads to faster learning, and fewer independent parameters, but it also results in unbounded fanin and depth. Strictly layered architectures with restricted connectivity, on the other hand, need more epochs to learn and use more parameters, but generate more regular structures, with smaller, limited fanin and significantly smaller depth (propagation delay), and may be better suited for VLSI implementations. When the number of inputs or outputs is not very small compared to the size of the training set, however, a strictly layered topology is seen to yield an overall better performance. I.
A General FeedForward Algorithm for Gradient Descent in Connectionist Networks
, 1990
"... An extended feedforward algorithm for recurrent connectionist networks is presented. This algorithm, which works locally in time, is derived both for discreteintime networks and for continuous networks. Several standard gradient descent algorithms for connectionist networks (e.g. [48], [30], [28] ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
An extended feedforward algorithm for recurrent connectionist networks is presented. This algorithm, which works locally in time, is derived both for discreteintime networks and for continuous networks. Several standard gradient descent algorithms for connectionist networks (e.g. [48], [30], [28] [15], [34]), especially the backpropagation algorithm [36], are mathematically derived as a special case of this general algorithm. The learning algorithm presented in this paper is a superset of gradient descent learning algorithms for multilayer networks, recurrent networks and timedelay networks that allows any combinations of their components. In addition, the paper presents feedforward approximation procedures for initial activations and external input values. The former one is used for optimizing starting values of the socalled context nodes, the latter one turned out to be very useful for finding spurious input attractors of a trained connectionist network. Finally, we compare tim...
Fault Tolerant Artificial Neural Networks
 Proceedings of the 5th IEEE Dual Use Technologies and Applications Conference (Utica/Rome
, 1995
"... This paper investigates improved training procedures to enhance the fault tolerance (FT) of feedforward artificial neural networks (ANNs). We have considered (i) On line training with permuted samples. (ii) Cross validation training. (iii) Fault Tolerant Gradient Descent: modify the objective func ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This paper investigates improved training procedures to enhance the fault tolerance (FT) of feedforward artificial neural networks (ANNs). We have considered (i) On line training with permuted samples. (ii) Cross validation training. (iii) Fault Tolerant Gradient Descent: modify the objective function for the gradient descent to enhance the initial partial fault tolerance (PFT) of the resulting net. Our data indicate that using permuted samples and/or cross validation during training do not yield a discernible PFT enhancement. Providing initial redundancy and modifying the objective function to utilize the extra parameters does improve the PFT to some extent. However, a brute force method of replications proposed in [1, 2] seems to achieve a higher PFT for the same level of redundancy (as compared with the fault tolerant gradient descent training) which is a counterintuitive result. 1. Introduction Neural or connectionist computation and modeling is an emerging technology with a ...
Computations and Evaluations of an Optimal Featureset for an HMMbased Recognizer
, 1996
"... The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the frontend and backend. The frontend deals with the conversion of the analog sp ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the frontend and backend. The frontend deals with the conversion of the analog speech signal into features for classification. This thesis investigates optimal featuresets for speech recognition. The objectives for an optimal featureset are improved recognition performance, noise robustness, talker insensitivity and efficiency. Three problems that make it difficult to find optimal features are: 1) the amount of resources (time and computations) required to evaluate the performance of a featureset, 2) the size of the feature space, and 3) the dependence of features upon some words in t...
AN SVM FRONTEND LANDMARK SPEECH RECOGNITION SYSTEM
, 2008
"... Support vector machines (SVMs) can be trained to detect manner transitions between phones and to identify the manner and place of articulation of any given phone. The SVMs can perform these tasks with high accuracy using a variety of acoustic representations. The SVMs generalize well to unseen test ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Support vector machines (SVMs) can be trained to detect manner transitions between phones and to identify the manner and place of articulation of any given phone. The SVMs can perform these tasks with high accuracy using a variety of acoustic representations. The SVMs generalize well to unseen test data if these data were created under identical conditions to the training corpus. Unseen acoustic data from different corpora present a problem for the SVM, even if these acoustic data were generated under similar conditions. The discriminant outputs of these SVMs are used to create both a hybrid SVM/HMM (hidden Markov model) phone recognition system and a hybrid SVM/HMM word recognition system. There is a significant improvement in both phone and word recognition accuracy when these SVM discriminant features are used instead of mel frequency cepstral coefficients (MFCCs).