Results 1  10
of
18
An Application of Recurrent Nets to Phone Probability Estimation
 IEEE Transactions on Neural Networks
, 1994
"... This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed ..."
Abstract

Cited by 198 (8 self)
 Add to MetaCart
This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed
Shortlist: a connectionist model of continuous speech recognition
 Cognition
, 1994
"... Previous work has shown how a backpropagation network with recurrent connections can successfully model many aspects of human spoken word recognition (Norris, 1988, 1990, 1992, 1993). However, such networks are unable to revise their decisions in the light of subsequent context. TRACE (McClelland ..."
Abstract

Cited by 176 (7 self)
 Add to MetaCart
Previous work has shown how a backpropagation network with recurrent connections can successfully model many aspects of human spoken word recognition (Norris, 1988, 1990, 1992, 1993). However, such networks are unable to revise their decisions in the light of subsequent context. TRACE (McClelland & Elman, 1986), on the other hand, manages to deal appropriately with following context, but only by using a highly implausible architecture that fails to account for some important experimental results. A new model is presented which displays the more desirable properties of each of these models. In contrast to TRACE the new model is entirely bottomup and can readily perform simulations with vocabularies of tens of thousands of words. 1.
Signal modeling techniques in speech recognition
 PROCEEDINGS OF THE IEEE
, 1993
"... We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or timederivative, spectral information, have become common. Second, similariry transform techniques, often used to norm ..."
Abstract

Cited by 132 (5 self)
 Add to MetaCart
We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or timederivative, spectral information, have become common. Second, similariry transform techniques, often used to normalize and decorrelate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal’s spectrum can be estimated in a closedloop manner. In this paper, we review the signal processing components of these algorithms. These algorithms are presented as part of a unified view of the signal parameterization problem in which there are three major tasks: measurement, transformation, and statistical modeling. This paper is by no means a comprehensive survey of all possible techniques of signal modeling in speech recognition. There are far too many algorithms in use today to make an exhaustive survey feasible (and cohesive). Instead, this paper is meant to serve as a tutorial on signal processing in stateoftheart speech recognition systems and to review those techniques most commonly used. In keeping with this goal, a complete mathematical description of each algorithm has been included in the paper.
Multidimensional Detective
 in Proc. of IEEE Information Visualization ’97
, 1997
"... Automation has arrived to Parallel Coordinates. A geometrically motivated classifier is presented and applied, with both training and testing stages, to 3 real datasets. Our results compared to those from 23 other classifiers have the least error. The algorithm is based on parallel coordinates and: ..."
Abstract

Cited by 80 (1 self)
 Add to MetaCart
Automation has arrived to Parallel Coordinates. A geometrically motivated classifier is presented and applied, with both training and testing stages, to 3 real datasets. Our results compared to those from 23 other classifiers have the least error. The algorithm is based on parallel coordinates and: has very low computational complexity in the number of variables and the size of the dataset – contrasted with the very high or unknown (often unstated) complexity of other classifiers, the low complexity enables the rule derivation to be done in near realtime hence making the classification adaptive to changing conditions, provides comprehensible and explicit rules – contrasted to neural networks which are “black boxes”, does dimensionality selection – where the minimal set of original variables (not transformed new variables as in Principal Component Analysis) required to state the rule is found, orders these variables so as to optimize the clarity of separation between the designated set and its complement – this solves the pesky “ordering problem ” in parallel coordinates. The algorithm is display independent, hence it can be applied to very large in size and number of variables datasets. Though it is instructive to present the results visually, the input size is no longer displaylimited as for visual data mining. Motivation and the Algorithm T he display of multivariate datasets in parallel coordinates (abbr. kcoords) transforms the search for relations into a 2D pattern recognition problem. Until now the discovery involved a skillful interaction between the “detective ” and the data display; a process which was illustrated in the “Multidimensional Detective ” [3]. It is not surprising that
Complete and Partial Fault Tolerance of Feedforward Neural Nets
 IEEE Trans. Neural Networks
, 1995
"... A method is proposed to estimate the fault tolerance of feedforward Artificial Neural Nets (ANNs) and synthesize robust nets. The fault model abstracts a variety of failure modes of hardware implementations to permanent stuckat type faults of single components. A procedure is developed to build f ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
A method is proposed to estimate the fault tolerance of feedforward Artificial Neural Nets (ANNs) and synthesize robust nets. The fault model abstracts a variety of failure modes of hardware implementations to permanent stuckat type faults of single components. A procedure is developed to build fault tolerant ANNs by replicating the hidden units. It exploits the intrinsic weighted summation operation performed by the processing units in order to overcome faults. It is simple, robust and is applicable to any feedforward net. Based on this procedure, metrics are devised to quantify the fault tolerance as a function of redundancy. Furthermore, a lower bound on the redundancy required to tolerate all possible single faults is analytically derived. This bound demonstrates that less than Triple Modular Redundancy (TMR) cannot provide complete fault tolerance for all possible single faults. This general result establishes a necessary condition that holds for all feedforward nets, irrespec...
Bayesian Neural Networks for Classification: How Useful is the Evidence Framework?
, 1998
"... This paper presents an empirical assessment of the Bayesian evidence framework for neural networks using four synthetic and four realworld classification problems. We focus on three issues; model selection, automatic relevance determination (ARD) and the use of committees. Model selection using the ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
This paper presents an empirical assessment of the Bayesian evidence framework for neural networks using four synthetic and four realworld classification problems. We focus on three issues; model selection, automatic relevance determination (ARD) and the use of committees. Model selection using the evidence criterion is only tenable if the number of training examples exceeds the number of network weights by a factor of five or ten. With this number of available examples, however, crossvalidation is a viable alternative. The ARD feature selection scheme is only useful in networks with many hidden units and for data sets containing many irrelevant variables. ARD is also useful as a hard feature selection method. Results on applying the evidence framework to the realworld data sets showed that committees of Bayesian networks achieved classification accuracies similar to the best alternative methods. Importantly, this was achievable with a minimum of human intervention. 1 Introduction ...
Connectivity and performance tradeoffs in the cascade correlation learning architecture
, 1992
"... Abstruct The Cascade Correlation [l] is a very flexible, efficient and fast algorithm for supervised learning. It incrementally builds the network by adding hidden units one at a time, until the desired input/output mapping is achieved. It connects all the previously installed units to the new unit ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Abstruct The Cascade Correlation [l] is a very flexible, efficient and fast algorithm for supervised learning. It incrementally builds the network by adding hidden units one at a time, until the desired input/output mapping is achieved. It connects all the previously installed units to the new unit being added. Consequently, each new unit in effect adds a new layer and the fanin of the hidden and output units keeps on increasing as more units get added. The resulting structure could be hard to implement in VLSI, because the connections are irregular and the fanin is unbounded. Moreover, the depth or the propagation delay through the resulting network is directly proportional to the number of units and can be excessive. We have modified the algorithm to generate networks with restricted fanin and small depth (propagation delay) by controlling the connectivity. Our results reveal that there is a tradeoff between connectivity and other performance attributes like depth, total number of independent parameters, learning time, etc. When the number of inputs or outputs is small relative to the size of the training set, a higher connectivity usually leads to faster learning, and fewer independent parameters, but it also results in unbounded fanin and depth. Strictly layered architectures with restricted connectivity, on the other hand, need more epochs to learn and use more parameters, but generate more regular structures, with smaller, limited fanin and significantly smaller depth (propagation delay), and may be better suited for VLSI implementations. When the number of inputs or outputs is not very small compared to the size of the training set, however, a strictly layered topology is seen to yield an overall better performance. I.
A General FeedForward Algorithm for Gradient Descent in Connectionist Networks
, 1990
"... An extended feedforward algorithm for recurrent connectionist networks is presented. This algorithm, which works locally in time, is derived both for discreteintime networks and for continuous networks. Several standard gradient descent algorithms for connectionist networks (e.g. [48], [30], [28] ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
An extended feedforward algorithm for recurrent connectionist networks is presented. This algorithm, which works locally in time, is derived both for discreteintime networks and for continuous networks. Several standard gradient descent algorithms for connectionist networks (e.g. [48], [30], [28] [15], [34]), especially the backpropagation algorithm [36], are mathematically derived as a special case of this general algorithm. The learning algorithm presented in this paper is a superset of gradient descent learning algorithms for multilayer networks, recurrent networks and timedelay networks that allows any combinations of their components. In addition, the paper presents feedforward approximation procedures for initial activations and external input values. The former one is used for optimizing starting values of the socalled context nodes, the latter one turned out to be very useful for finding spurious input attractors of a trained connectionist network. Finally, we compare tim...
Computations and Evaluations of an Optimal Featureset for an HMMbased Recognizer
, 1996
"... The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the frontend and backend. The frontend deals with the conversion of the analog sp ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the frontend and backend. The frontend deals with the conversion of the analog speech signal into features for classification. This thesis investigates optimal featuresets for speech recognition. The objectives for an optimal featureset are improved recognition performance, noise robustness, talker insensitivity and efficiency. Three problems that make it difficult to find optimal features are: 1) the amount of resources (time and computations) required to evaluate the performance of a featureset, 2) the size of the feature space, and 3) the dependence of features upon some words in t...
Fault Tolerant Artificial Neural Networks
 Proceedings of the 5th IEEE Dual Use Technologies and Applications Conference (Utica/Rome
, 1995
"... This paper investigates improved training procedures to enhance the fault tolerance (FT) of feedforward artificial neural networks (ANNs). We have considered (i) On line training with permuted samples. (ii) Cross validation training. (iii) Fault Tolerant Gradient Descent: modify the objective func ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This paper investigates improved training procedures to enhance the fault tolerance (FT) of feedforward artificial neural networks (ANNs). We have considered (i) On line training with permuted samples. (ii) Cross validation training. (iii) Fault Tolerant Gradient Descent: modify the objective function for the gradient descent to enhance the initial partial fault tolerance (PFT) of the resulting net. Our data indicate that using permuted samples and/or cross validation during training do not yield a discernible PFT enhancement. Providing initial redundancy and modifying the objective function to utilize the extra parameters does improve the PFT to some extent. However, a brute force method of replications proposed in [1, 2] seems to achieve a higher PFT for the same level of redundancy (as compared with the fault tolerant gradient descent training) which is a counterintuitive result. 1. Introduction Neural or connectionist computation and modeling is an emerging technology with a ...