Results 1 
8 of
8
A deep architecture with bilinear modeling of hidden representations: Applications to phonetic recognition
 Proc. ICASSP
, 2012
"... We develop and describe a novel deep architecture, the Tensor Deep Stacking Network (TDSN), where multiple blocks are stacked one on top of another and where a bilinear mapping from hidden representations to the output in each block is used to incorporate higherorder statistics of the input feature ..."
Abstract

Cited by 9 (8 self)
 Add to MetaCart
(Show Context)
We develop and describe a novel deep architecture, the Tensor Deep Stacking Network (TDSN), where multiple blocks are stacked one on top of another and where a bilinear mapping from hidden representations to the output in each block is used to incorporate higherorder statistics of the input features. A learning algorithm for the TDSN is presented, in which the main parameter estimation burden is shifted to a convex subproblem with a closedform solution. Using an efficient and scalable parallel implementation, we train a TDSN to discriminate standard threestate monophones in the TIMIT database. The TDSN outperforms an alternative pretrained Deep Neural Network (DNN) architecture in framelevel classification (both state and phone) and in the crossentropy measure. For continuous phonetic recognition, TDSN performs equivalently to a DNN but without the need for a hardtoscale, sequential finetuning step. Index Terms — deep learning, higherorder statistics, tensors, stacking model, phonetic classification and recognition 1.
Accelerated parallelizable neural network learning algorithm for speech recognition
 in Proc. Interspeech
, 2011
"... We describe a set of novel, batchmode algorithms we developed recently as one key component in scalable, deep neural network based speech recognition. The essence of these algorithms is to structure the singlehiddenlayer neural network so that the upperlayer’s weights can be written as a determin ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
(Show Context)
We describe a set of novel, batchmode algorithms we developed recently as one key component in scalable, deep neural network based speech recognition. The essence of these algorithms is to structure the singlehiddenlayer neural network so that the upperlayer’s weights can be written as a deterministic function of the lowerlayer’s weights. This structure is effectively exploited during training by plugging in the deterministic function to the least square error objective function while calculating the gradients. Accelerating techniques are further exploited to make the weight updates move along the most promising directions. The experiments on TIMIT framelevel phone and phonestate classification show strong results. In particular, the error rate is strictly monotonically dropping as the minibatch size increases. This demonstrates the potential for the proposed batchmode algorithms in large scale speech recognition since they are easily parallelizable across computers. Index Terms: neural network, scalability, structure, constraints, FISTA acceleration, optimization, pseudoinverse, weighted LSE, phone state classification, speech recognition, deep learning 1.
USING DEEP STACKING NETWORK TO IMPROVE STRUCTURED COMPRESSED SENSING WITH MULTIPLE MEASUREMENT VECTORS
"... ABSTRACT We study the MMV (Multiple Measurement Vectors) compressive sensing setting with a specific sparse structured support. The locations of the nonzero rows in the sparse matrix are not known. All that is known is that the locations of the nonzero rows have probabilities that vary from one g ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
ABSTRACT We study the MMV (Multiple Measurement Vectors) compressive sensing setting with a specific sparse structured support. The locations of the nonzero rows in the sparse matrix are not known. All that is known is that the locations of the nonzero rows have probabilities that vary from one group of rows to another. We propose two novel greedy algorithms for the exact recovery of the sparse matrix in this structured MMV compressive sensing problem. The first algorithm models the matrix sparse structure using a shallow nonlinear neural network. The input of this network is the residual matrix after the prediction and the output is the sparse matrix to be recovered. The second algorithm improves the shallow neural network prediction by using the stacking operation to form a deep stacking network. Experimental evaluation demonstrates the superior performance of both new algorithms over existing MMV methods. Among all, the algorithm using the deep stacking network for modelling the structure in MMV compressive sensing performs the best.
Learning Input and Recurrent Weight Matrices in Echo State Networks
"... Abstract The traditional echo state network (ESN) is a special type of a temporally deep model, the recurrent network (RNN), which carefully designs the recurrent matrix and fixes both the recurrent and input matrices in the RNN. The ESN also adopts the linear output (or readout) units to simplify ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract The traditional echo state network (ESN) is a special type of a temporally deep model, the recurrent network (RNN), which carefully designs the recurrent matrix and fixes both the recurrent and input matrices in the RNN. The ESN also adopts the linear output (or readout) units to simplify the leanring of the only output matrix in the RNN. In this paper, we devise a special technique that takes advantage of the linearity in the output units in the ESN to learn the input and recurrent matrices, not carried on earlier ESNs due to the wellknown difficulty of their learning. Compared with the technique of BackProp Through Time (BPTT) in learning the general RNNs, our proposed technique makes use of the linearity in the output units to provide constraints among various matrices in the RNN, enabling the computation of the gradients as the learning signal in an analytical form instead of by recursion as in the BPTT. Experimental results on phone state classification show that learning either or both the input and recurrent matrices in the ESN is superior to the traditional ESN without learning them, especially when longer time steps are used in analytically computing the gradients.
Recurrent deepstacking networks for sequence classification,”
 in Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit International Conference on,
, 2014
"... ABSTRACT Deep Stacking Networks (DSNs) are constructed by stacking shallow feedforward neural networks on top of each other using concatenated features derived from the lower modules of the DSN and the raw input data. DSNs do not have recurrent connections, making them less effective to model and ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
ABSTRACT Deep Stacking Networks (DSNs) are constructed by stacking shallow feedforward neural networks on top of each other using concatenated features derived from the lower modules of the DSN and the raw input data. DSNs do not have recurrent connections, making them less effective to model and classify input data with temporal dependencies. In this paper, we embed recurrent connections into the DSN, giving rise to Recurrent Deep Stacking Networks (RDSNs). Each module of the RDSN consists of a special form of recurrent neural networks. Generalizing from the earlier DSN, the use of linearity in the output units of the RDSN enables us to derive a closed form for computing the gradient of the cost function with respect to all network matrices without backpropagating errors. Each module in the RDSN is initialized with an echo state network, where the input and recurrent weights are fixed to have the echo state property. Then all connection weights within the module are fine tuned using batchmode gradient descent where the gradient takes an analytical form. Experiments are performed on the TIMIT dataset for framelevel phone state classification with 183 classes. The results show that the RDSN gives higher classification accuracy over a single recurrent neural network without stacking.
Convolutional deep stacking networks for distributed compressive sensing,”
 Signal Processing,
, 2016
"... a b s t r a c t This paper addresses the reconstruction of sparse vectors in the Multiple Measurement Vectors (MMV) problem in compressive sensing, where the sparse vectors are correlated. This problem has so far been studied using model based and Bayesian methods. In this paper, we propose a deep ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
a b s t r a c t This paper addresses the reconstruction of sparse vectors in the Multiple Measurement Vectors (MMV) problem in compressive sensing, where the sparse vectors are correlated. This problem has so far been studied using model based and Bayesian methods. In this paper, we propose a deep learning approach that relies on a Convolutional Deep Stacking Network (CDSN) to capture the dependency among the different channels. To reconstruct the sparse vectors, we propose a greedy method that exploits the information captured by CDSN. The proposed method encodes the sparse vectors using random measurements (as done usually in compressive sensing). Experiments using a real world image dataset show that the proposed method outperforms the traditional MMV solver, i.e., Simultaneous Orthogonal Matching Pursuit (SOMP), as well as three of the Bayesian methods proposed for solving the MMV compressive sensing problem. We also show that the proposed method is almost as fast as greedy methods. The good performance of the proposed method depends on the availability of training data (as is the case in all deep learning methods). The training data, e.g., different images of the same class or signals with similar sparsity patterns are usually available for many applications.
Fig. 1. Example TDSN architecture with two complete blocks.
"... Fig. 2. Equivalent architecture to the bottom (blue) block of TDSN in Fig. 1 where the tensor is unfolded into a large matrix. ..."
Abstract
 Add to MetaCart
Fig. 2. Equivalent architecture to the bottom (blue) block of TDSN in Fig. 1 where the tensor is unfolded into a large matrix.
RESEARCH ARTICLE Fast, Simple and Accurate Handwritten Digit Classification by Training Shallow Neural Network Classifiers with the ‘Extreme Learning Machine ’ Algorithm
"... Recent advances in training deep (multilayer) architectures have inspired a renaissance in neural network use. For example, deep convolutional networks are becoming the default option for difficult tasks on large datasets, such as image and speech recognition. However, here we show that error rates ..."
Abstract
 Add to MetaCart
(Show Context)
Recent advances in training deep (multilayer) architectures have inspired a renaissance in neural network use. For example, deep convolutional networks are becoming the default option for difficult tasks on large datasets, such as image and speech recognition. However, here we show that error rates below 1 % on the MNIST handwritten digit benchmark can be replicated with shallow nonconvolutional neural networks. This is achieved by training such networks using the ‘Extreme Learning Machine ’ (ELM) approach, which also enables a very rapid training time ( * 10 minutes). Adding distortions, as is common practise for MNIST, reduces error rates even further. Our methods are also shown to be capable of achieving less than 5.5 % error rates on the NORB image database. To achieve these results, we introduce several enhancements to the standard ELM algorithm, which individually and in combination can significantly improve performance. The main innovation is to ensure each hiddenunit operates only on a randomly sized and positioned patch of each image. This form of random ‘receptive field ’ sampling of the input ensures the input weight matrix is sparse, with about 90 % of weights equal to zero. Furthermore, combining our methods with