• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 52
Next 10 →

Large scale distributed deep networks,

by Jeffrey Dean , Greg S Corrado , Rajat Monga , Kai Chen , Matthieu Devin , Quoc V Le , Mark Z Mao , Aurelio Marc' , Andrew Ranzato , Paul Senior , Ke Tucker , Andrew Y Yang , Ng - Proceedings of NIPS, , 2012
"... Abstract Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have d ..."
Abstract - Cited by 107 (12 self) - Add to MetaCart
supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have

Fast optimization of non-convex Machine Learning objectives

by Nikolaos Nikolaou
"... In this project we examined the problem of non-convex optimization in the context of Machine Learning, drawing inspiration from the increasing popularity of methods such as Deep Belief Networks, which involve non-convex objectives. We focused on the task of training the Neural Autoregressive Distrib ..."
Abstract - Add to MetaCart
In this project we examined the problem of non-convex optimization in the context of Machine Learning, drawing inspiration from the increasing popularity of methods such as Deep Belief Networks, which involve non-convex objectives. We focused on the task of training the Neural Autoregressive

Deep convex net: A scalable architecture for speech pattern classification

by Li Deng, Dong Yu - In Twelfth Annual Conference of the International Speech Communication Association , 2011
"... We recently developed context-dependent DNN-HMM (Deep-Neural-Net/Hidden-Markov-Model) for large-vocabulary speech recognition. While achieving impressive recognition error rate reduction, we face the insurmountable problem of scalability in dealing with virtually unlimited amount of training data av ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
. The superiority is reflected not only in training scalability and CPU-only computation, but more importantly in classification accuracy in both tasks. Index Terms: deep learning, scalability, convex optimization, neural network, deep belief network, phone state

Revisit Long Short-Term Memory: An Optimization Perspective

by Qi Lyu, Jun Zhu
"... Long Short-Term Memory (LSTM) is a deep recurrent neural network archi-tecture with high computational complexity. Contrary to the standard practice to train LSTM online with stochastic gradient descent (SGD) methods, we pro-pose a matrix-based batch learning method for LSTM with full Backpropagatio ..."
Abstract - Add to MetaCart
Long Short-Term Memory (LSTM) is a deep recurrent neural network archi-tecture with high computational complexity. Contrary to the standard practice to train LSTM online with stochastic gradient descent (SGD) methods, we pro-pose a matrix-based batch learning method for LSTM with full

Accelerated parallelizable neural network learning algorithm for speech recognition

by Dong Yu, Li Deng - in Proc. Interspeech , 2011
"... We describe a set of novel, batch-mode algorithms we developed recently as one key component in scalable, deep neural network based speech recognition. The essence of these algorithms is to structure the singlehidden-layer neural network so that the upper-layer’s weights can be written as a determin ..."
Abstract - Cited by 6 (5 self) - Add to MetaCart
scale speech recognition since they are easily parallelizable across computers. Index Terms: neural network, scalability, structure, constraints, FISTA acceleration, optimization, pseudoinverse, weighted LSE, phone state classification, speech recognition, deep learning 1.

Exploring the power of GPU’s for training Polyglot language models

by Vivek Kulkarni, Rami Al-rfou, Bryan Perozzi, Steven Skiena
"... Abstract. One of the major research trends currently is the evolution of heterogeneous parallel computing. GP-GPU computing is being widely used and several applications have been designed to exploit the mas-sive parallelism that GP-GPU’s have to offer. While GPU’s have always been widely used in ar ..."
Abstract - Add to MetaCart
models. More specifically, we investigate the performance of training Polyglot language models[1] us-ing deep belief neural networks. We evaluate the performance of training the model on the GPU and present optimizations that boost the perfor-mance on the GPU.One of the key optimizations, we propose

Improving the speed of neural networks on CPUs

by Vincent Vanhoucke, Andrew Senior, Mark Z. Mao - in Deep Learning and Unsupervised Feature Learning Workshop, NIPS , 2011
"... Recent advances in deep learning have made the use of large, deep neural networks with tens of millions of parameters suitable for a number of applications that require real-time processing. The sheer size of these networks can represent a challenging computational burden, even for modern CPUs. For ..."
Abstract - Cited by 17 (6 self) - Add to MetaCart
Recent advances in deep learning have made the use of large, deep neural networks with tens of millions of parameters suitable for a number of applications that require real-time processing. The sheer size of these networks can represent a challenging computational burden, even for modern CPUs

Object/relational query optimization with chase

by Lucian Popa
"... and backchase ..."
Abstract - Add to MetaCart
and backchase

Marginalized Stacked Denoising Autoencoders

by Minmin Chen, Zhixiang (eddie Xu, Kilian Q. Weinberger, Fei Sha
"... Stacked Denoising Autoencoders (SDAs) [4] have been used successfully in many learning scenarios and application domains. In short, denoising autoencoders (DAs) train one-layer neural networks to reconstruct input data from partial random corruption. The denoisers are then stacked into deep learning ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Stacked Denoising Autoencoders (SDAs) [4] have been used successfully in many learning scenarios and application domains. In short, denoising autoencoders (DAs) train one-layer neural networks to reconstruct input data from partial random corruption. The denoisers are then stacked into deep

Effective Multi-Modal Retrieval based on Stacked Auto-Encoders

by unknown authors
"... Multi-modal retrieval is emerging as a new search paradigm that enables seamless information retrieval from various types of me-dia. For example, users can simply snap a movie poster to search relevant reviews and trailers. To solve the problem, a set of map-ping functions are learned to project hig ..."
Abstract - Add to MetaCart
-modal data and ranking examples, our method requires little prior knowledge. Given a large training dataset, we split it into mini-batches and continually adjust the mapping functions for each batch of input. Hence, our method is memory efficient with respect to the data volume. Experiments on three real
Next 10 →
Results 1 - 10 of 52
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University