#### DMCA

## Convolutional deep stacking networks for distributed compressive sensing,” (2016)

Venue: | Signal Processing, |

Citations: | 1 - 1 self |

### Citations

3609 | Compressed sensing - Donoho - 2006 |

1389 | Stable signal recovery from incomplete and inaccurate measurements - Candès, Romberg, et al. |

963 | Sparse bayesian learning and the relevance vector machine
- Tipping
- 2001
(Show Context)
Citation Context ...ave been studied extensively in the compressive sensing literature [13–15,17,22–27]. In [22], it has been theoretically shown that using signal models that exploit these structures results in a decrease in the number of measurements. In [23], a thorough review on CS methods that exploit the structure present in sparse signal is presented. In [17], a Bayesian framework for CS is presented. This framework uses a prior information about the sparsity of the vector s to provide a posterior density function for the entries of s (assuming y is observed). It then uses a Relevance Vector Machine (RVM) [28] to estimate the entries of the sparse vector. This method is called Bayesian Compressive Sensing (BCS). In [14], a Bayesian framework is presented for the MMV problem. It assumes that the L channels or “tasks” in the MMV problem (4), are not statistically independent. By imposing a shared prior on the L channels, an empirical method that estimates the hyperparameters is presented and extensions of RVM used for the inference step. This method is known as Multitask Compressive Sensing (MT-BCS). In [14], it is experimentally shown that the MT-BCS outperforms three methods, the method that applie... |

928 | K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation
- Aharon, Elad, et al.
- 2006
(Show Context)
Citation Context ...the extension of [26] to the MMV problem is expected to give similar performance to that of [25]. In [27], a different MMV problem is solved where the sparsity patterns of different sparse vectors in S are NOT H. Palangi et al. / Signal Processing 131 (2017) 181–189 183similar. A method based on Long Short Term Memory (LSTM) was proposed to address this problem. In this paper, we assume that the sparse vectors in S have similar sparsity patterns. For example, the sparse vectors are the DCT or Wavelet transforms of images. In the sparse representation literature, the dictionary learning method [33] uses the available training data to learn the sparsifying basis (Ψ in (2)) that can represent the signal as compactly as possible. The main difference between dictionary learning and our work here is that we assume the sparsifying basis as given and there is no need to learn it. In other words, the sparse vectors in S are not necessarily very sparse. Although we expect the performance to improve by combining dictionary learning with our proposed method, in this paper we focus on the performance improvement obtained by using the proposed approach only.3. Proposed method To give a high level pi... |

850 | Training products of experts by minimizing contrastive divergence
- Hinton
(Show Context)
Citation Context ...lect the appropriate column of A at each iteration of OMP. The main modification to OMP is that of replacing the correlation step with a neural network. This method was experimentally shown to outperform OMP and ℓ1 optimization. This method is called Neural Network OMP (NNOMP). An extension of [24] with a hierarchical Deep Stacking Network (DSN) [16] is proposed in [25] for the MMV problem. DSN architecture has different applications, for example see [1–5]. The joint sparsity of S is an important assumption in the proposed method. To train the DSN model, the Restricted Boltzmann Machine (RBM) [30] is used to pre-train DSN and then fine tuning is performed. It has been experimentally shown that this method outperforms SOMP and ℓ1,2 in the MMV problem. The proposed methods are called Nonlinear Weighted SOMP (NWSOMP) for the one layer model and DSN-WSOMP for the multilayer model. In [26], a feedforward neural network was used to solve the SMV problem as a regression task. A pre-training phase followed by fine tuning was used. For pre-training, the Stacked Denoising Auto-encoder (SDA) proposed in [31] had been used. Note that an RBM with Gaussian visible units and binary hidden units (i.e.... |

796 |
Reducing the dimensionality of data with neural networks
- Hinton, Salakhutdinov
(Show Context)
Citation Context ...( ) ( ) ( ) ( ) ( ) ( ) ( )⎡⎣ ⎤⎦ 17E z h z b z b b h z W h, T T T init1 1 12 1 1 1 1 2 1 1 1,1 1 where b1 and b2 are vectors of bias values for the visible and hidden units respectively. The goal is to find ( )W init1, 1 from the training input data, i.e., the residual vectors, ( )z 1 , in the training data generated as explained earlier. Then we use ( )W init1, 1 to initialize ( )W1 1 as shown in Fig. 1(b). This approach has been shown to be also helpful in training neural networks and specificallyof the proposed method. H. Palangi et al. / Signal Processing 131 (2017) 181–189186autoencoders [37]. The parameters of the RBM can be found by maximizing the log probability that the RBM assigns to the input data, which is a function of the energy function in (17), using the contrastive divergence method [30]. The details on the general RBM training method used in this work can be found at [38]. As shown in the block diagram of CDSN in Fig. 1(a), to initialize the parameters of the upper layers of CDSN, ( + )W l1 1 , we use the learned parameters of the lower layer, ( )W l1 , as initialization. This approach has been shown to be helpful in training DSNs [16] and it was helpful in our task a... |

697 | An Introduction to Compressive Sensing
- Baraniuk, Davenport, et al.
- 2010
(Show Context)
Citation Context ...y CDSN at the decoder to reconstruct S. Note that the great success of deep learning [18–21], motivated new applications reported in this paper. The rest of the paper is organized as follows: in the next section, related work is discussed.In Section 3 the proposed method is presented. Experimental evaluation and discussion are presented in Section 4. In Section 5, conclusions and future work are presented.2. Related work Model based methods that exploit the information in the structure of sparse vector(s) have been studied extensively in the compressive sensing literature [13–15,17,22–27]. In [22], it has been theoretically shown that using signal models that exploit these structures results in a decrease in the number of measurements. In [23], a thorough review on CS methods that exploit the structure present in sparse signal is presented. In [17], a Bayesian framework for CS is presented. This framework uses a prior information about the sparsity of the vector s to provide a posterior density function for the entries of s (assuming y is observed). It then uses a Relevance Vector Machine (RVM) [28] to estimate the entries of the sparse vector. This method is called Bayesian Compressiv... |

408 |
Information processing in dynamical systems: Foundations of harmony theory. In
- Smolensky
- 1986
(Show Context)
Citation Context ... T W W W W W 1 1 4 16 k l k l l k l k k k l k l k k k l k l 1, 1, 2 2 1, 1 1 2 2 1, 1 1, 1 1 1, 1, 1 The curve of the FISTA coefficients − + m m k k 1 1 with respect to the epoch number is represented in Fig. 2. After computing ( )W l1 from (16), we use the closed form formulation in (14) to find ( )W l2 . Another important consideration for training the CDSN is that the cost function in (11) is not necessarily convex, therefore the initialization of ( )W l1 before fine tuning plays an important role. For initialization of the first layer of CDSN, we train a Restricted Boltzmann Machine (RBM) [36,30] with Gaussian visible units and binary hidden units. This results in the following energy function between visible units, i.e., entries of ( )z 1 , and hidden units, i.e., entries of ( )h 1 : ( ) ( ) ( ) ( )= − − − −( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )⎡⎣ ⎤⎦ 17E z h z b z b b h z W h, T T T init1 1 12 1 1 1 1 2 1 1 1,1 1 where b1 and b2 are vectors of bias values for the visible and hidden units respectively. The goal is to find ( )W init1, 1 from the training input data, i.e., the residual vectors, ( )z 1 , in the training data generated as explained earlier. Then we use ( )W init1, 1 to initializ... |

360 | Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit
- Tropp
- 2006
(Show Context)
Citation Context ...ondition to obtain a unique S given Y is: ( ) < ( ) − + ( ) ( )supp spark rank S A S1 2 5 where |( )|supp S is the number of rows in S with non-zero energy and spark of a given matrix is the smallest possible number of linearly dependent columns of that matrix [10]. spark gives a measure of linear dependency in the system modelled by a given matrix. In the SMV problem, no rank information exists. In the MMV problem, the rank information exists and affects the uniqueness bounds. In the current literature, S in (4) is reconstructed using one of the following types of methods: (1) greedy methods [11] like Simultaneous Orthogonal Matching Pursuit (SOMP) which performs nonoptimal subset selection, (2) relaxed mixed norm minimization methods [12] where the ℓ0 norm is relaxed to ℓ1 norm to formulate a convex optimization problem, or (3) Bayesian methods like [13– 15] where a posterior density function for the values of S is created, assuming a prior belief, e.g., Y is observed and S should be sparse in a basis Ψ. As shown in [13–15], in the MMV problem, the model based methods like the Bayesian methods usually perform better than the first two groups of methods because they exploit the statis... |

329 | Bayesian compressive sensing
- Ji, Xue, et al.
- 2008
(Show Context)
Citation Context ...ich we refer to as CDSN. To capture the dependencies among different channels we propose the use of a sliding convolution window over the columns of the matrix S (where each convolution window contains w consecutive columns of S where w is the size of convolution window). To address the second part of above question, we propose a two step greedy algorithm to exploit this information at the decoder during the reconstruction. By performing experiments on an image dataset, we show that the proposed method outperforms the well known MMV solver SOMP and the model based Bayesian methods proposed in [17,14,15]. This is shown using the popular Wavelet and DCT transforms as the sparsifying basis. We emphasize that the proposed method does not add any complexity to the encoder, i.e., the encoder is a randommatrix. The complexity is added at the decoder. The contributions of this paper are as follows: a convolutional version of the Deep Stacking Networks (DSNs), which we refer to as CDSN, is proposed. We then propose the use of CDSN, which is a data driven method, to capture the dependencies among different channels in the MMV problem. We then use a two step greedy reconstruction algorithm to exploit t... |

272 | Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, - Hinton, Deng, et al. - 2012 |

254 | Contextdependent pre-trained deep neural networks for large vocabulary speech recognition,” - Dahl, Yu, et al. - 2012 |

251 | Extracting and composing robust features with denoising autoencoders.
- Vincent, Larochelle, et al.
- 2008
(Show Context)
Citation Context ...on in the proposed method. To train the DSN model, the Restricted Boltzmann Machine (RBM) [30] is used to pre-train DSN and then fine tuning is performed. It has been experimentally shown that this method outperforms SOMP and ℓ1,2 in the MMV problem. The proposed methods are called Nonlinear Weighted SOMP (NWSOMP) for the one layer model and DSN-WSOMP for the multilayer model. In [26], a feedforward neural network was used to solve the SMV problem as a regression task. A pre-training phase followed by fine tuning was used. For pre-training, the Stacked Denoising Auto-encoder (SDA) proposed in [31] had been used. Note that an RBM with Gaussian visible units and binary hidden units (i.e., the one used in [25]) has the same energy function as an auto-encoder with sigmoid hidden units and real valued observations [32]. Therefore the extension of [26] to the MMV problem is expected to give similar performance to that of [25]. In [27], a different MMV problem is solved where the sparsity patterns of different sparse vectors in S are NOT H. Palangi et al. / Signal Processing 131 (2017) 181–189 183similar. A method based on Long Short Term Memory (LSTM) was proposed to address this problem. In... |

185 | A practical guide to training restricted Boltzmann machines,” UTML TR.,
- Hinton
- 2010
(Show Context)
Citation Context ...ining data generated as explained earlier. Then we use ( )W init1, 1 to initialize ( )W1 1 as shown in Fig. 1(b). This approach has been shown to be also helpful in training neural networks and specificallyof the proposed method. H. Palangi et al. / Signal Processing 131 (2017) 181–189186autoencoders [37]. The parameters of the RBM can be found by maximizing the log probability that the RBM assigns to the input data, which is a function of the energy function in (17), using the contrastive divergence method [30]. The details on the general RBM training method used in this work can be found at [38]. As shown in the block diagram of CDSN in Fig. 1(a), to initialize the parameters of the upper layers of CDSN, ( + )W l1 1 , we use the learned parameters of the lower layer, ( )W l1 , as initialization. This approach has been shown to be helpful in training DSNs [16] and it was helpful in our task as well. This completes the description of the training method for CDSN and the answer for question (ii). Given the trained CDSN, a summary of the proposed reconstruction algorithm that finds the sparsest solution S given Y and A in (4) is presented in Algorithm 1. We refer to this algorithm as CDS... |

147 | Curriculum learning. - Bengio, Louradour, et al. - 2009 |

101 | Structured compressed sensing: from theory to applications
- Duarte, Eldar
- 2011
(Show Context)
Citation Context ...rest of the paper is organized as follows: in the next section, related work is discussed.In Section 3 the proposed method is presented. Experimental evaluation and discussion are presented in Section 4. In Section 5, conclusions and future work are presented.2. Related work Model based methods that exploit the information in the structure of sparse vector(s) have been studied extensively in the compressive sensing literature [13–15,17,22–27]. In [22], it has been theoretically shown that using signal models that exploit these structures results in a decrease in the number of measurements. In [23], a thorough review on CS methods that exploit the structure present in sparse signal is presented. In [17], a Bayesian framework for CS is presented. This framework uses a prior information about the sparsity of the vector s to provide a posterior density function for the entries of s (assuming y is observed). It then uses a Relevance Vector Machine (RVM) [28] to estimate the entries of the sparse vector. This method is called Bayesian Compressive Sensing (BCS). In [14], a Bayesian framework is presented for the MMV problem. It assumes that the L channels or “tasks” in the MMV problem (4), ar... |

98 | Average case analysis for multichannel sparse recovery using convex relaxation
- Eldar, Rauhut
- 2009
(Show Context)
Citation Context ... 2 . Assuming that S is a matrix whose columns are the sparse vectors and Y is a matrix whose columns are the corresponding measurement vectors, the MMV problem can be stated as: = ( )Y AS 4 To reconstruct S using Y and A in (4), one can either reconstruct each of the sparse vectors in S independently or H. Palangi et al. / Signal Processing 131 (2017) 181–189182reconstruct S jointly. Using an average case analysis, it has been shown that solving the MMV problem jointly can lead to better uniqueness guarantees than those obtained by solving the SMV problem for each sparse vector independently [9]. More formally, assume that S is jointly sparse, i.e., the non-zero entries of each sparse vector occur at the same indices as those of other vectors. In other words, the sparse vectors have the same support. Then, the necessary and sufficient condition to obtain a unique S given Y is: ( ) < ( ) − + ( ) ( )supp spark rank S A S1 2 5 where |( )|supp S is the number of rows in S with non-zero energy and spark of a given matrix is the smallest possible number of linearly dependent columns of that matrix [10]. spark gives a measure of linear dependency in the system modelled by a given matrix. In... |

91 | An empirical Bayesian strategy for solving the simultaneous sparse approximation problem,”
- Wipf, Rao
- 2007
(Show Context)
Citation Context ... statistically independent. By imposing a shared prior on the L channels, an empirical method that estimates the hyperparameters is presented and extensions of RVM used for the inference step. This method is known as Multitask Compressive Sensing (MT-BCS). In [14], it is experimentally shown that the MT-BCS outperforms three methods, the method that applies Orthogonal Matching Pursuit (OMP) on each channel, the Simultaneous Orthogonal Matching Pursuit (SOMP) method which is a straightforward extension of OMP to the MMV problem, and the method that applies BCS on each channel independently. In [13], the Sparse Bayesian Learning (SBL) [28,29] is used to solve the MMV problem. It is shown that the global minimum of the proposed method is always the sparsest one. The authors in [15] address the MMV problem when the entries in each row of S are correlated. An algorithm based on SBL is proposed and it is shown that the proposed algorithm outperforms the mixed norm ( ℓ1,2) optimization as well as the method proposed in [13]. The proposed method is called T-SBL. In [24], a greedy algorithm aided by a neural network is proposed to address the SMV problem in (3). The neural network parameters ar... |

80 | Compressed sensing [Lecture Notes], - Baraniuk - 2007 |

59 | Sparse Signal Recovery with Temporally Correlated Source Vectors using Sparse Bayesian Learning”, Selected Topics in Signal Processing,
- Zhang, Rao
- 2011
(Show Context)
Citation Context ...ich we refer to as CDSN. To capture the dependencies among different channels we propose the use of a sliding convolution window over the columns of the matrix S (where each convolution window contains w consecutive columns of S where w is the size of convolution window). To address the second part of above question, we propose a two step greedy algorithm to exploit this information at the decoder during the reconstruction. By performing experiments on an image dataset, we show that the proposed method outperforms the well known MMV solver SOMP and the model based Bayesian methods proposed in [17,14,15]. This is shown using the popular Wavelet and DCT transforms as the sparsifying basis. We emphasize that the proposed method does not add any complexity to the encoder, i.e., the encoder is a randommatrix. The complexity is added at the decoder. The contributions of this paper are as follows: a convolutional version of the Deep Stacking Networks (DSNs), which we refer to as CDSN, is proposed. We then propose the use of CDSN, which is a data driven method, to capture the dependencies among different channels in the MMV problem. We then use a two step greedy reconstruction algorithm to exploit t... |

58 | Analysis of sparse bayesian learning
- Faul, Tipping
(Show Context)
Citation Context ... shared prior on the L channels, an empirical method that estimates the hyperparameters is presented and extensions of RVM used for the inference step. This method is known as Multitask Compressive Sensing (MT-BCS). In [14], it is experimentally shown that the MT-BCS outperforms three methods, the method that applies Orthogonal Matching Pursuit (OMP) on each channel, the Simultaneous Orthogonal Matching Pursuit (SOMP) method which is a straightforward extension of OMP to the MMV problem, and the method that applies BCS on each channel independently. In [13], the Sparse Bayesian Learning (SBL) [28,29] is used to solve the MMV problem. It is shown that the global minimum of the proposed method is always the sparsest one. The authors in [15] address the MMV problem when the entries in each row of S are correlated. An algorithm based on SBL is proposed and it is shown that the proposed algorithm outperforms the mixed norm ( ℓ1,2) optimization as well as the method proposed in [13]. The proposed method is called T-SBL. In [24], a greedy algorithm aided by a neural network is proposed to address the SMV problem in (3). The neural network parameters are calculated by solving a regression problem... |

45 |
Gradient-based algorithms with applications to signal recovery problems
- Beck, Teboulle
- 2009
(Show Context)
Citation Context ... that the gradient of the cost function in (11) with respect to ( )W l1 is: ( ) ∂ − ∂ = ◦ − ◦ [ ] − [ ] ( ) ( ) ( ) ( ) ( ) ( ) † ( ) ( ) † ( ) † ⎡ ⎣ ⎢⎢ ⎡⎣ ⎤⎦ ⎡⎣ ⎤⎦ ⎡ ⎣⎢ ⎡⎣ ⎤⎦ ⎡ ⎣⎢ ⎤ ⎦⎥ ⎡ ⎣⎢ ⎤ ⎦⎥ ⎡ ⎣⎢ ⎤ ⎦⎥ ⎤ ⎦⎥ ⎤ ⎦ ⎥⎥ 15 V T W Z H H H H T T H T T H1 l l l l T l T l l T l T l2 2 1 where ( )Z l is a matrix whose columns are ( )z l in (10) corresponding to different training samples in the training set and ○ is the Hadamard product operator. Using the gradient information from past iterations can help to improve the convergence speed inFig. 3. High level block diagramconvex optimization problems [35]. Although the problem in (11) is not necessarily convex because of the stack of non-linear hidden layers, but we found out experimentally that the gradient information from the past iterations can be helpful here as well. As in [34], we use the FISTA algorithm to accelerate the fine tuning. Therefore, the update equations for ( )W l1 at the k-th iteration are as follows: ( ) ( ) ρ= ^ − ∂ − ∂ ^ = + + ^ = ^ + − ( ) ( ) ( ) ( ) ( ) + + ( ) ( ) − + ( ) − ( ) m m m m W W V T W W W W W 1 1 4 16 k l k l l k l k k k l k l k k k l k l 1, 1, 2 2 1, 1 1 2 2 1, 1 1, 1 1 1, 1, 1 The curve of the FISTA coe... |

33 | Rank awareness in joint sparse recovery
- Eldar
- 2012
(Show Context)
Citation Context ...ntees than those obtained by solving the SMV problem for each sparse vector independently [9]. More formally, assume that S is jointly sparse, i.e., the non-zero entries of each sparse vector occur at the same indices as those of other vectors. In other words, the sparse vectors have the same support. Then, the necessary and sufficient condition to obtain a unique S given Y is: ( ) < ( ) − + ( ) ( )supp spark rank S A S1 2 5 where |( )|supp S is the number of rows in S with non-zero energy and spark of a given matrix is the smallest possible number of linearly dependent columns of that matrix [10]. spark gives a measure of linear dependency in the system modelled by a given matrix. In the SMV problem, no rank information exists. In the MMV problem, the rank information exists and affects the uniqueness bounds. In the current literature, S in (4) is reconstructed using one of the following types of methods: (1) greedy methods [11] like Simultaneous Orthogonal Matching Pursuit (SOMP) which performs nonoptimal subset selection, (2) relaxed mixed norm minimization methods [12] where the ℓ0 norm is relaxed to ℓ1 norm to formulate a convex optimization problem, or (3) Bayesian methods like [... |

29 |
Scalable stacking and learning for building deep architectures,”
- Deng, Yu, et al.
- 2012
(Show Context)
Citation Context ...s of electroencephalogram (EEG) of the same patient over time, different patches of images of the same class, e.g., buildings or flowers, etc. In these applications, the sparse vectors in S usually have some form of correlation (dependency) with each other. The question is: (i) how can we use the available data to capture the dependencies among the channels and (ii) how to exploit the captured information to improve the performance of the reconstruction algorithm in the MMV problem? To address the first part of this question, we propose a Convolutional version of Deep Stacking Networks (DSNs) [16] which we refer to as CDSN. To capture the dependencies among different channels we propose the use of a sliding convolution window over the columns of the matrix S (where each convolution window contains w consecutive columns of S where w is the size of convolution window). To address the second part of above question, we propose a two step greedy algorithm to exploit this information at the decoder during the reconstruction. By performing experiments on an image dataset, we show that the proposed method outperforms the well known MMV solver SOMP and the model based Bayesian methods proposed ... |

29 | A connection between score matching and denoising autoencoders.
- Vincent
- 2011
(Show Context)
Citation Context ... and ℓ1,2 in the MMV problem. The proposed methods are called Nonlinear Weighted SOMP (NWSOMP) for the one layer model and DSN-WSOMP for the multilayer model. In [26], a feedforward neural network was used to solve the SMV problem as a regression task. A pre-training phase followed by fine tuning was used. For pre-training, the Stacked Denoising Auto-encoder (SDA) proposed in [31] had been used. Note that an RBM with Gaussian visible units and binary hidden units (i.e., the one used in [25]) has the same energy function as an auto-encoder with sigmoid hidden units and real valued observations [32]. Therefore the extension of [26] to the MMV problem is expected to give similar performance to that of [25]. In [27], a different MMV problem is solved where the sparsity patterns of different sparse vectors in S are NOT H. Palangi et al. / Signal Processing 131 (2017) 181–189 183similar. A method based on Long Short Term Memory (LSTM) was proposed to address this problem. In this paper, we assume that the sparse vectors in S have similar sparsity patterns. For example, the sparse vectors are the DCT or Wavelet transforms of images. In the sparse representation literature, the dictionary lear... |

17 | Deep Convex Network: A Scalable Architecture for Speech Pattern Classification, - Deng, Yu - 2011 |

11 | Tensor Deep Stacking Networks, - Hutchinson, Yu, et al. - 2013 |

8 | Efficient and effective algorithms for training single-hidden-layer neural networks,”
- Yu, Deng
- 2012
(Show Context)
Citation Context ... m k+ 1 Fig. 2. The curve of FISTA coefficients − + mk mk 1 1 in (16) with respect to the epoch number. H. Palangi et al. / Signal Processing 131 (2017) 181–189 185μ= − + ( ) ( ) ( ) ( ) ( ) ( ) ⎡⎣ ⎤⎦W W H T Wargmin 12 13 l l T l l W 2 2 2 2 2 2 2 l 2 which results in: μ= + [ ] ( ) ( ) ( ) ( ) − ( )⎡⎣ ⎤⎦W I H H H T 14l l l T l T2 1 where I is the identity matrix. To find ( )W l1 , for each layer of CDSN we use the stochastic gradient descent method. To calculate the gradient of the cost function with respect to ( )W l1 given the fact that ( )W l2 and ( )H l depend on ( )W l1 , it can be shown [34] that the gradient of the cost function in (11) with respect to ( )W l1 is: ( ) ∂ − ∂ = ◦ − ◦ [ ] − [ ] ( ) ( ) ( ) ( ) ( ) ( ) † ( ) ( ) † ( ) † ⎡ ⎣ ⎢⎢ ⎡⎣ ⎤⎦ ⎡⎣ ⎤⎦ ⎡ ⎣⎢ ⎡⎣ ⎤⎦ ⎡ ⎣⎢ ⎤ ⎦⎥ ⎡ ⎣⎢ ⎤ ⎦⎥ ⎡ ⎣⎢ ⎤ ⎦⎥ ⎤ ⎦⎥ ⎤ ⎦ ⎥⎥ 15 V T W Z H H H H T T H T T H1 l l l l T l T l l T l T l2 2 1 where ( )Z l is a matrix whose columns are ( )z l in (10) corresponding to different training samples in the training set and ○ is the Hadamard product operator. Using the gradient information from past iterations can help to improve the convergence speed inFig. 3. High level block diagramconvex optimization problems ... |

6 | A tutorial survey of architectures, algorithms, and applications for deep learning. - Deng - 2014 |

2 |
Embedding prior knowledge within compressed sensing by neural networks,
- Merhej, Diab, et al.
- 2011
(Show Context)
Citation Context ...h is a straightforward extension of OMP to the MMV problem, and the method that applies BCS on each channel independently. In [13], the Sparse Bayesian Learning (SBL) [28,29] is used to solve the MMV problem. It is shown that the global minimum of the proposed method is always the sparsest one. The authors in [15] address the MMV problem when the entries in each row of S are correlated. An algorithm based on SBL is proposed and it is shown that the proposed algorithm outperforms the mixed norm ( ℓ1,2) optimization as well as the method proposed in [13]. The proposed method is called T-SBL. In [24], a greedy algorithm aided by a neural network is proposed to address the SMV problem in (3). The neural network parameters are calculated by solving a regression problem and are used to select the appropriate column of A at each iteration of OMP. The main modification to OMP is that of replacing the correlation step with a neural network. This method was experimentally shown to outperform OMP and ℓ1 optimization. This method is called Neural Network OMP (NNOMP). An extension of [24] with a hierarchical Deep Stacking Network (DSN) [16] is proposed in [25] for the MMV problem. DSN architecture ... |

2 | Using deep stacking network to improve structured compressed sensing with multiple measurement vectors, in:
- Palangi, Ward, et al.
- 2013
(Show Context)
Citation Context ...he proposed method is called T-SBL. In [24], a greedy algorithm aided by a neural network is proposed to address the SMV problem in (3). The neural network parameters are calculated by solving a regression problem and are used to select the appropriate column of A at each iteration of OMP. The main modification to OMP is that of replacing the correlation step with a neural network. This method was experimentally shown to outperform OMP and ℓ1 optimization. This method is called Neural Network OMP (NNOMP). An extension of [24] with a hierarchical Deep Stacking Network (DSN) [16] is proposed in [25] for the MMV problem. DSN architecture has different applications, for example see [1–5]. The joint sparsity of S is an important assumption in the proposed method. To train the DSN model, the Restricted Boltzmann Machine (RBM) [30] is used to pre-train DSN and then fine tuning is performed. It has been experimentally shown that this method outperforms SOMP and ℓ1,2 in the MMV problem. The proposed methods are called Nonlinear Weighted SOMP (NWSOMP) for the one layer model and DSN-WSOMP for the multilayer model. In [26], a feedforward neural network was used to solve the SMV problem as a regre... |

1 | Gokhan Tur, Xiaodong He, Dilek Hakkani-Tür, Use of Kernel Deep Convex Networks and End-To-End Learning for Spoken Language Understanding, - Deng - 2012 |

1 | Xiaodong He, Jianfeng Gao, Deep Stacking Networks for Information Retrieval, - Deng - 2013 |