Results 11  20
of
3,174
ContextDependent Pretrained Deep Neural Networks for Large Vocabulary Speech Recognition
 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING
, 2012
"... We propose a novel contextdependent (CD) model for large vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pretrained deep neural network hidden Markov model (DNNHMM) hybrid architecture that trains the DNN to pr ..."
Abstract

Cited by 254 (50 self)
 Add to MetaCart
We propose a novel contextdependent (CD) model for large vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pretrained deep neural network hidden Markov model (DNNHMM) hybrid architecture that trains the DNN
Improving the Learning Speed of 2layer Neural Networks by Choosing
 Initial Values of the Adaptive Weights, International Joint Conference of Neural Networks
, 1990
"... A twolayer neural network can be used to approximate any nonlinear function. The behavior of the hidden nodes that allows the network to do this is described. Networks with one input are analyzed first, and the analysis is then extended to networks with multiple inputs. The result of this analysis ..."
Abstract

Cited by 195 (1 self)
 Add to MetaCart
is used to formulate a method for initialization of the weights of neural networks to reduce training time. Training examples are given and the learning curve for these examples are shown to illustrate the decrease in necessary training time.
Bayesian Regularisation and Pruning using a Laplace Prior
 Neural Computation
, 1994
"... Standard techniques for improved generalisation from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy indicates a Laplace rather than a G ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
of setting weights to exact zerosbecomes a consequence of regularisation alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a Gaussian regulariser. 1 Introduction Neural networks designed
Multiclass Protein Fold Recognition Using Support Vector Machines and Neural Networks
 Bioinformatics
, 2001
"... Motivation: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multiclass classication methods and examined many issues important for a practical recognition system. Results: Most current discriminative ..."
Abstract

Cited by 207 (8 self)
 Add to MetaCart
SCOP folds. We used the Support Vector Machine and the Neural Network learning methods as base classiers. SVM converges fast and leads to high accuracy. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. We examined many issues
Optimal Linear Combinations of Neural Networks
 NEURAL NETWORKS
, 1994
"... Neural network (NN)based modeling often involves trying multiple networks with different architectures and training parameters in order to achieve acceptable model accuracy. Typically, one of the trained NNs is chosen as best, while the rest are discarded. Hashem and Schmeiser [25] proposed using o ..."
Abstract

Cited by 155 (2 self)
 Add to MetaCart
optimal linear combinations of a number of trained neural networks instead of using a single best network. Combining the trained networks may help integrate the knowledge acquired by the component networks and thus improve model accuracy. In this paper, we discuss and extend the idea of optimal linear
Bayesian Methods for Adaptive Models
, 1992
"... The Bayesian framework for model comparison and regularisation is demonstrated by studying interpolation and classification problems modelled with both linear and nonlinear models. This framework quantitatively embodies `Occam's razor'. Overcomplex and underregularised models are automa ..."
Abstract

Cited by 177 (2 self)
 Add to MetaCart
are automatically inferred to be less probable, even though their flexibility allows them to fit the data better. When applied to `neural networks', the Bayesian framework makes possible (1) objective comparison of solutions using alternative network architectures; (2) objective stopping rules for network
On the difficulty of training recurrent neural networks
"... There are two widely known issues with properly training recurrent neural networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geo ..."
Abstract

Cited by 42 (6 self)
 Add to MetaCart
There are two widely known issues with properly training recurrent neural networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a
Training with Noise is Equivalent to Tikhonov Regularization
 Neural Computation
, 1994
"... It is well known that the addition of noise to the input data of a neural network during training can, in some circumstances, lead to significant improvements in generalization performance. Previous work has shown that such training with noise is equivalent to a form of regularization in which an ex ..."
Abstract

Cited by 158 (0 self)
 Add to MetaCart
It is well known that the addition of noise to the input data of a neural network during training can, in some circumstances, lead to significant improvements in generalization performance. Previous work has shown that such training with noise is equivalent to a form of regularization in which
Novelty Detection and Neural Network Validation
, 1994
"... One of the key factors limiting the use of neural networks in many industrial applications has been the difficulty of demonstrating that a trained network will continue to generate reliable outputs once it is in routine use. An important potential source of errors arises from novel input data, that ..."
Abstract

Cited by 121 (3 self)
 Add to MetaCart
One of the key factors limiting the use of neural networks in many industrial applications has been the difficulty of demonstrating that a trained network will continue to generate reliable outputs once it is in routine use. An important potential source of errors arises from novel input data
Improving Performance in Neural Networks Using a Boosting Algorithm.
 Advances in Neural Information Processing Systems 5,
, 1993
"... Abstract Patrice Simard AT &T Bell Laboratories Holmdel, NJ 07733 A boosting algorithm converts a learning machine with error rate less than 50% to one with an arbitrarily low error rate. However, the algorithm discussed here depends on having a large supply of independent training samples. We ..."
Abstract

Cited by 104 (1 self)
 Add to MetaCart
show how to circumvent this problem and generate an ensemble of learning machines whose performance in optical character recognition problems is dramatically improved over that of a single network. We report the effect of boosting on four databases (all handwritten) consisting of 12,000 digits from
Results 11  20
of
3,174