Results 1 -
8 of
8
A deep architecture with bilinear modeling of hidden representations: Applications to phonetic recognition
- Proc. ICASSP
, 2012
"... We develop and describe a novel deep architecture, the Tensor Deep Stacking Network (T-DSN), where multiple blocks are stacked one on top of another and where a bilinear mapping from hidden representations to the output in each block is used to incorporate higherorder statistics of the input feature ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
(Show Context)
We develop and describe a novel deep architecture, the Tensor Deep Stacking Network (T-DSN), where multiple blocks are stacked one on top of another and where a bilinear mapping from hidden representations to the output in each block is used to incorporate higherorder statistics of the input features. A learning algorithm for the T-DSN is presented, in which the main parameter estimation burden is shifted to a convex sub-problem with a closed-form solution. Using an efficient and scalable parallel implementation, we train a T-DSN to discriminate standard three-state monophones in the TIMIT database. The T-DSN outperforms an alternative pretrained Deep Neural Network (DNN) architecture in frame-level classification (both state and phone) and in the cross-entropy measure. For continuous phonetic recognition, T-DSN performs equivalently to a DNN but without the need for a hard-to-scale, sequential fine-tuning step. Index Terms — deep learning, higher-order statistics, tensors, stacking model, phonetic classification and recognition 1.
Accelerated parallelizable neural network learning algorithm for speech recognition
- in Proc. Interspeech
, 2011
"... We describe a set of novel, batch-mode algorithms we developed recently as one key component in scalable, deep neural network based speech recognition. The essence of these algorithms is to structure the singlehidden-layer neural network so that the upper-layer’s weights can be written as a determin ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
(Show Context)
We describe a set of novel, batch-mode algorithms we developed recently as one key component in scalable, deep neural network based speech recognition. The essence of these algorithms is to structure the singlehidden-layer neural network so that the upper-layer’s weights can be written as a deterministic function of the lower-layer’s weights. This structure is effectively exploited during training by plugging in the deterministic function to the least square error objective function while calculating the gradients. Accelerating techniques are further exploited to make the weight updates move along the most promising directions. The experiments on TIMIT frame-level phone and phonestate classification show strong results. In particular, the error rate is strictly monotonically dropping as the minibatch size increases. This demonstrates the potential for the proposed batch-mode algorithms in large scale speech recognition since they are easily parallelizable across computers. Index Terms: neural network, scalability, structure, constraints, FISTA acceleration, optimization, pseudoinverse, weighted LSE, phone state classification, speech recognition, deep learning 1.
USING DEEP STACKING NETWORK TO IMPROVE STRUCTURED COMPRESSED SENSING WITH MULTIPLE MEASUREMENT VECTORS
"... ABSTRACT We study the MMV (Multiple Measurement Vectors) compressive sensing setting with a specific sparse structured support. The locations of the non-zero rows in the sparse matrix are not known. All that is known is that the locations of the non-zero rows have probabilities that vary from one g ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
ABSTRACT We study the MMV (Multiple Measurement Vectors) compressive sensing setting with a specific sparse structured support. The locations of the non-zero rows in the sparse matrix are not known. All that is known is that the locations of the non-zero rows have probabilities that vary from one group of rows to another. We propose two novel greedy algorithms for the exact recovery of the sparse matrix in this structured MMV compressive sensing problem. The first algorithm models the matrix sparse structure using a shallow non-linear neural network. The input of this network is the residual matrix after the prediction and the output is the sparse matrix to be recovered. The second algorithm improves the shallow neural network prediction by using the stacking operation to form a deep stacking network. Experimental evaluation demonstrates the superior performance of both new algorithms over existing MMV methods. Among all, the algorithm using the deep stacking network for modelling the structure in MMV compressive sensing performs the best.
Learning Input and Recurrent Weight Matrices in Echo State Networks
"... Abstract The traditional echo state network (ESN) is a special type of a temporally deep model, the recurrent network (RNN), which carefully designs the recurrent matrix and fixes both the recurrent and input matrices in the RNN. The ESN also adopts the linear output (or readout) units to simplify ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract The traditional echo state network (ESN) is a special type of a temporally deep model, the recurrent network (RNN), which carefully designs the recurrent matrix and fixes both the recurrent and input matrices in the RNN. The ESN also adopts the linear output (or readout) units to simplify the leanring of the only output matrix in the RNN. In this paper, we devise a special technique that takes advantage of the linearity in the output units in the ESN to learn the input and recurrent matrices, not carried on earlier ESNs due to the well-known difficulty of their learning. Compared with the technique of BackProp Through Time (BPTT) in learning the general RNNs, our proposed technique makes use of the linearity in the output units to provide constraints among various matrices in the RNN, enabling the computation of the gradients as the learning signal in an analytical form instead of by recursion as in the BPTT. Experimental results on phone state classification show that learning either or both the input and recurrent matrices in the ESN is superior to the traditional ESN without learning them, especially when longer time steps are used in analytically computing the gradients.
Recurrent deepstacking networks for sequence classification,”
- in Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit International Conference on,
, 2014
"... ABSTRACT Deep Stacking Networks (DSNs) are constructed by stacking shallow feed-forward neural networks on top of each other using concatenated features derived from the lower modules of the DSN and the raw input data. DSNs do not have recurrent connections, making them less effective to model and ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
ABSTRACT Deep Stacking Networks (DSNs) are constructed by stacking shallow feed-forward neural networks on top of each other using concatenated features derived from the lower modules of the DSN and the raw input data. DSNs do not have recurrent connections, making them less effective to model and classify input data with temporal dependencies. In this paper, we embed recurrent connections into the DSN, giving rise to Recurrent Deep Stacking Networks (R-DSNs). Each module of the R-DSN consists of a special form of recurrent neural networks. Generalizing from the earlier DSN, the use of linearity in the output units of the R-DSN enables us to derive a closed form for computing the gradient of the cost function with respect to all network matrices without backpropagating errors. Each module in the R-DSN is initialized with an echo state network, where the input and recurrent weights are fixed to have the echo state property. Then all connection weights within the module are fine tuned using batch-mode gradient descent where the gradient takes an analytical form. Experiments are performed on the TIMIT dataset for frame-level phone state classification with 183 classes. The results show that the R-DSN gives higher classification accuracy over a single recurrent neural network without stacking.
Convolutional deep stacking networks for distributed compressive sensing,”
- Signal Processing,
, 2016
"... a b s t r a c t This paper addresses the reconstruction of sparse vectors in the Multiple Measurement Vectors (MMV) problem in compressive sensing, where the sparse vectors are correlated. This problem has so far been studied using model based and Bayesian methods. In this paper, we propose a deep ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
a b s t r a c t This paper addresses the reconstruction of sparse vectors in the Multiple Measurement Vectors (MMV) problem in compressive sensing, where the sparse vectors are correlated. This problem has so far been studied using model based and Bayesian methods. In this paper, we propose a deep learning approach that relies on a Convolutional Deep Stacking Network (CDSN) to capture the dependency among the different channels. To reconstruct the sparse vectors, we propose a greedy method that exploits the information captured by CDSN. The proposed method encodes the sparse vectors using random measurements (as done usually in compressive sensing). Experiments using a real world image dataset show that the proposed method outperforms the traditional MMV solver, i.e., Simultaneous Orthogonal Matching Pursuit (SOMP), as well as three of the Bayesian methods proposed for solving the MMV compressive sensing problem. We also show that the proposed method is almost as fast as greedy methods. The good performance of the proposed method depends on the availability of training data (as is the case in all deep learning methods). The training data, e.g., different images of the same class or signals with similar sparsity patterns are usually available for many applications.
Fig. 1. Example T-DSN architecture with two complete blocks.
"... Fig. 2. Equivalent architecture to the bottom (blue) block of T-DSN in Fig. 1 where the tensor is unfolded into a large matrix. ..."
Abstract
- Add to MetaCart
Fig. 2. Equivalent architecture to the bottom (blue) block of T-DSN in Fig. 1 where the tensor is unfolded into a large matrix.
RESEARCH ARTICLE Fast, Simple and Accurate Handwritten Digit Classification by Training Shallow Neural Network Classifiers with the ‘Extreme Learning Machine ’ Algorithm
"... Recent advances in training deep (multi-layer) architectures have inspired a renaissance in neural network use. For example, deep convolutional networks are becoming the default option for difficult tasks on large datasets, such as image and speech recognition. However, here we show that error rates ..."
Abstract
- Add to MetaCart
(Show Context)
Recent advances in training deep (multi-layer) architectures have inspired a renaissance in neural network use. For example, deep convolutional networks are becoming the default option for difficult tasks on large datasets, such as image and speech recognition. However, here we show that error rates below 1 % on the MNIST handwritten digit benchmark can be replicated with shallow non-convolutional neural networks. This is achieved by training such networks using the ‘Extreme Learning Machine ’ (ELM) approach, which also enables a very rapid training time ( * 10 minutes). Adding distortions, as is common practise for MNIST, reduces error rates even further. Our methods are also shown to be capable of achieving less than 5.5 % error rates on the NORB image database. To achieve these results, we intro-duce several enhancements to the standard ELM algorithm, which individually and in com-bination can significantly improve performance. The main innovation is to ensure each hidden-unit operates only on a randomly sized and positioned patch of each image. This form of random ‘receptive field ’ sampling of the input ensures the input weight matrix is sparse, with about 90 % of weights equal to zero. Furthermore, combining our methods with