Results 1 - 10
of
52
Large scale distributed deep networks,
- Proceedings of NIPS,
, 2012
"... Abstract Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have d ..."
Abstract
-
Cited by 107 (12 self)
- Add to MetaCart
supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have
Fast optimization of non-convex Machine Learning objectives
"... In this project we examined the problem of non-convex optimization in the context of Machine Learning, drawing inspiration from the increasing popularity of methods such as Deep Belief Networks, which involve non-convex objectives. We focused on the task of training the Neural Autoregressive Distrib ..."
Abstract
- Add to MetaCart
In this project we examined the problem of non-convex optimization in the context of Machine Learning, drawing inspiration from the increasing popularity of methods such as Deep Belief Networks, which involve non-convex objectives. We focused on the task of training the Neural Autoregressive
Deep convex net: A scalable architecture for speech pattern classification
- In Twelfth Annual Conference of the International Speech Communication Association
, 2011
"... We recently developed context-dependent DNN-HMM (Deep-Neural-Net/Hidden-Markov-Model) for large-vocabulary speech recognition. While achieving impressive recognition error rate reduction, we face the insurmountable problem of scalability in dealing with virtually unlimited amount of training data av ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
. The superiority is reflected not only in training scalability and CPU-only computation, but more importantly in classification accuracy in both tasks. Index Terms: deep learning, scalability, convex optimization, neural network, deep belief network, phone state
Revisit Long Short-Term Memory: An Optimization Perspective
"... Long Short-Term Memory (LSTM) is a deep recurrent neural network archi-tecture with high computational complexity. Contrary to the standard practice to train LSTM online with stochastic gradient descent (SGD) methods, we pro-pose a matrix-based batch learning method for LSTM with full Backpropagatio ..."
Abstract
- Add to MetaCart
Long Short-Term Memory (LSTM) is a deep recurrent neural network archi-tecture with high computational complexity. Contrary to the standard practice to train LSTM online with stochastic gradient descent (SGD) methods, we pro-pose a matrix-based batch learning method for LSTM with full
Accelerated parallelizable neural network learning algorithm for speech recognition
- in Proc. Interspeech
, 2011
"... We describe a set of novel, batch-mode algorithms we developed recently as one key component in scalable, deep neural network based speech recognition. The essence of these algorithms is to structure the singlehidden-layer neural network so that the upper-layer’s weights can be written as a determin ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
scale speech recognition since they are easily parallelizable across computers. Index Terms: neural network, scalability, structure, constraints, FISTA acceleration, optimization, pseudoinverse, weighted LSE, phone state classification, speech recognition, deep learning 1.
Exploring the power of GPU’s for training Polyglot language models
"... Abstract. One of the major research trends currently is the evolution of heterogeneous parallel computing. GP-GPU computing is being widely used and several applications have been designed to exploit the mas-sive parallelism that GP-GPU’s have to offer. While GPU’s have always been widely used in ar ..."
Abstract
- Add to MetaCart
models. More specifically, we investigate the performance of training Polyglot language models[1] us-ing deep belief neural networks. We evaluate the performance of training the model on the GPU and present optimizations that boost the perfor-mance on the GPU.One of the key optimizations, we propose
Improving the speed of neural networks on CPUs
- in Deep Learning and Unsupervised Feature Learning Workshop, NIPS
, 2011
"... Recent advances in deep learning have made the use of large, deep neural networks with tens of millions of parameters suitable for a number of applications that require real-time processing. The sheer size of these networks can represent a challenging computational burden, even for modern CPUs. For ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
Recent advances in deep learning have made the use of large, deep neural networks with tens of millions of parameters suitable for a number of applications that require real-time processing. The sheer size of these networks can represent a challenging computational burden, even for modern CPUs
Marginalized Stacked Denoising Autoencoders
"... Stacked Denoising Autoencoders (SDAs) [4] have been used successfully in many learning scenarios and application domains. In short, denoising autoencoders (DAs) train one-layer neural networks to reconstruct input data from partial random corruption. The denoisers are then stacked into deep learning ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Stacked Denoising Autoencoders (SDAs) [4] have been used successfully in many learning scenarios and application domains. In short, denoising autoencoders (DAs) train one-layer neural networks to reconstruct input data from partial random corruption. The denoisers are then stacked into deep
Effective Multi-Modal Retrieval based on Stacked Auto-Encoders
"... Multi-modal retrieval is emerging as a new search paradigm that enables seamless information retrieval from various types of me-dia. For example, users can simply snap a movie poster to search relevant reviews and trailers. To solve the problem, a set of map-ping functions are learned to project hig ..."
Abstract
- Add to MetaCart
-modal data and ranking examples, our method requires little prior knowledge. Given a large training dataset, we split it into mini-batches and continually adjust the mapping functions for each batch of input. Hence, our method is memory efficient with respect to the data volume. Experiments on three real
Results 1 - 10
of
52