Results 11 - 20
of
73
A neuro-evolution approach to general atari game playing
, 2013
"... This article addresses the challenge of learning to play many dierent video games with little domain-speci c knowledge. Specically, it introduces a neuro-evolution approach to general Atari 2600 game playing. Four neuro-evolution algorithms were paired with three dierent state representations and ev ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
(Show Context)
This article addresses the challenge of learning to play many dierent video games with little domain-speci c knowledge. Specically, it introduces a neuro-evolution approach to general Atari 2600 game playing. Four neuro-evolution algorithms were paired with three dierent state representations and evaluated on a set of 61 Atari games. The neuro-evolution agents represent dierent points along the spectrum of algorithmic sophistication- including weight evolution on topologically xed neural net-works (Conventional Neuro-evolution), Covariance Matrix Adaptation Evolution Strategy (CMA-ES), evolution of network topology and weights (NEAT), and indirect network encoding (HyperNEAT). State representations include an object representation of the game screen, the raw pixels of the game screen, and seeded noise (a comparative baseline). Results indicate that direct-encoding methods work best on compact state representations while indirect-encoding methods (i.e. HyperNEAT) allow scaling to higher-dimensional representations (i.e. the raw game screen). Previous approaches based on temporal-dierence learning had trouble dealing with the large state spaces and sparse reward gradients often found in Atari games. Neuro-evolution ameliorates these problems and evolved policies achieve state-of-the-art results, even surpassing human high scores on three games. These results suggest that neuro-evolution is a promising approach to general video game playing. 1
Recurrent Convolutional Neural Networks for Discourse Compositionality
"... The compositionality of meaning extends beyond the single sentence. Just as words combine to form the meaning of sentences, so do sentences combine to form the meaning of paragraphs, dialogues and general discourse. We introduce both a sentence model and a discourse model corresponding to the two le ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
The compositionality of meaning extends beyond the single sentence. Just as words combine to form the meaning of sentences, so do sentences combine to form the meaning of paragraphs, dialogues and general discourse. We introduce both a sentence model and a discourse model corresponding to the two levels of compositionality. The sentence model adopts convolution as the central operation for composing semantic vectors and is based on a novel hierarchical convolutional neural network. The discourse model extends the sentence model and is based on a recurrent neural network that is conditioned in a novel way both on the current sentence and on the current speaker. The discourse model is able to capture both the sequentiality of sentences and the interaction between different speakers. Without feature engineering or pretraining and with simple greedy decoding, the discourse model coupled to the sentence model obtains state of the art performance on a dialogue act classification experiment. 1
REVISITING RECURRENT NEURAL NETWORKS FOR ROBUST ASR
"... In this paper, we show how new training principles and optimization techniques for neural networks can be used for different network structures. In particular, we revisit the Recurrent Neural Network (RNN), which explicitly models the Markovian dynamics of a set of observations through a non-linear ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
(Show Context)
In this paper, we show how new training principles and optimization techniques for neural networks can be used for different network structures. In particular, we revisit the Recurrent Neural Network (RNN), which explicitly models the Markovian dynamics of a set of observations through a non-linear function with a much larger hidden state space than traditional sequence models such as an HMM. We apply pretraining principles used for Deep Neural Networks (DNNs) and second-order optimization techniques to train an RNN. Moreover, we explore its application in the Aurora2 speech recognition task under mismatched noise conditions using a Tandem approach. We observe top performance on clean speech, and under high noise conditions, compared to multi-layer perceptrons (MLPs) and DNNs, with the added benefit of being a “deeper ” model than an MLP but more compact than a DNN.
The deep tensor neural network with applications to large vocabulary speech recognition
- IEEE Audio, Speech, Lang. Process
, 2013
"... Abstract—The recently proposed context-dependent deep neural network hidden Markov models (CD-DNN-HMMs) have been proved highly promising for large vocabulary speech recognition. In this paper, we develop a more advanced type of DNN, which we call the deep tensor neural network (DTNN). The DTNN exte ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
(Show Context)
Abstract—The recently proposed context-dependent deep neural network hidden Markov models (CD-DNN-HMMs) have been proved highly promising for large vocabulary speech recognition. In this paper, we develop a more advanced type of DNN, which we call the deep tensor neural network (DTNN). The DTNN extends the conventional DNN byreplacingoneormoreofits layers with a double-projection (DP) layer, in which each input vector is projected into two nonlinear subspaces, and a tensor layer, in which two subspace projections interact with each other and jointly predict the next layer in the deep architecture. In addition, we describe an approach to map the tensor layers to the conventional sigmoid layers so that the former can be treated and trained in a similar way to the latter. With this mapping we can consider a DTNN as the DNN augmented with DP layers so that not only the BP learning algorithm of DTNNs can be cleanly derived but also new types of DTNNs can be more easily developed. Evaluation on Switchboard tasks indicates that DTNNs can outperform the already high-performing DNNs with 4–5 % and 3 % relative word error reduction, respectively, using 30-hr and 309-hr training sets. Index Terms—Automatic speech recognition, CD-DNN-HMM, large vocabulary, tensor deep neural networks.
Revisiting natural gradient for deep networks
- In International Conference on Learning Representations
, 2014
"... We evaluate natural gradient, an algorithm originally proposed in Amari (1997), for learning deep models. The contributions of this paper are as follows. We show the connection between natural gradient and three other recently proposed meth-ods: Hessian-Free (Martens, 2010), Krylov Subspace Descent ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
(Show Context)
We evaluate natural gradient, an algorithm originally proposed in Amari (1997), for learning deep models. The contributions of this paper are as follows. We show the connection between natural gradient and three other recently proposed meth-ods: Hessian-Free (Martens, 2010), Krylov Subspace Descent (Vinyals and Povey, 2012) and TONGA (Le Roux et al., 2008). We empirically evaluate the robust-ness of natural gradient to the ordering of the training set compared to stochastic gradient descent and show how unlabeled data can be used to improve generaliza-tion error. Another contribution is to extend natural gradient to incorporate second error information alongside the manifold information. Lastly we benchmark this new algorithm as well as natural gradient, where both are implemented using a truncated Newton approach for inverting the metric matrix instead of using a di-agonal approximation of it. 1
Learning a recurrent visual representation for image caption generation
, 2014
"... In this paper we explore the bi-directional mapping be-tween images and their sentence-based descriptions. We propose learning this mapping using a recurrent neural net-work. Unlike previous approaches that map both sentences and images to a common embedding, we enable the generation of novel senten ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In this paper we explore the bi-directional mapping be-tween images and their sentence-based descriptions. We propose learning this mapping using a recurrent neural net-work. Unlike previous approaches that map both sentences and images to a common embedding, we enable the generation of novel sentences given an image. Using the same model, we can also reconstruct the visual features associated with an image given its visual description. We use a novel recurrent visual memory that automatically learns to remember long-term visual concepts to aid in both sentence generation and visual feature reconstruction. We evaluate our approach on several tasks. These include sentence generation, sentence retrieval and image retrieval. State-of-the-art results are shown for the task of generating novel image descriptions. When compared to human generated captions, our automatically generated captions are preferred by humans over 19.8 % of the time. Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.
A Multiplicative Model for Learning Distributed Text-Based Attribute Representations
"... In this paper we propose a general framework for learning distributed represen-tations of attributes: characteristics of text whose representations can be jointly learned with word embeddings. Attributes can correspond to a wide variety of concepts, such as document indicators (to learn sentence vec ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
In this paper we propose a general framework for learning distributed represen-tations of attributes: characteristics of text whose representations can be jointly learned with word embeddings. Attributes can correspond to a wide variety of concepts, such as document indicators (to learn sentence vectors), language in-dicators (to learn distributed language representations), meta-data and side infor-mation (such as the age, gender and industry of a blogger) or representations of authors. We describe a third-order model where word context and attribute vectors interact multiplicatively to predict the next word in a sequence. This leads to the notion of conditional word similarity: how meanings of words change when con-ditioned on different attributes. We perform several experimental tasks including sentiment classification, cross-lingual document classification, and blog author-ship attribution. We also qualitatively evaluate conditional word neighbours and attribute-conditioned text generation. 1
Modeling deep temporal dependencies with recurrent grammar cells
- In Advances in Neural Information Processing Systems 27
, 2014
"... We propose modeling time series by representing the transformations that take a frame at time t to a frame at time t+1. To this end we show how a bi-linear model of transformations, such as a gated autoencoder, can be turned into a recurrent net-work, by training it to predict future frames from the ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
We propose modeling time series by representing the transformations that take a frame at time t to a frame at time t+1. To this end we show how a bi-linear model of transformations, such as a gated autoencoder, can be turned into a recurrent net-work, by training it to predict future frames from the current one and the inferred transformation using backprop-through-time. We also show how stacking multi-ple layers of gating units in a recurrent pyramid makes it possible to represent the ”syntax ” of complicated time series, and that it can outperform standard recurrent neural networks in terms of prediction accuracy on a variety of tasks. 1
CONTEXTUAL DOMAIN CLASSIFICATION IN SPOKEN LANGUAGE UNDERSTANDING SYSTEMS USING RECURRENT NEURAL NETWORK
"... In a multi-domain, multi-turn spoken language understanding session, information from the history often greatly reduces the ambiguity of the current turn. In this paper, we apply the recurrent neural network (RNN) to exploit contextual in-formation for query domain classification. The Jordan-type RN ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
In a multi-domain, multi-turn spoken language understanding session, information from the history often greatly reduces the ambiguity of the current turn. In this paper, we apply the recurrent neural network (RNN) to exploit contextual in-formation for query domain classification. The Jordan-type RNN directly sends the vector of output distribution to the next query turn as additional input features to the convo-lutional neural network (CNN). We evaluate our approach against SVM with and without contextual features. On our contextually labeled dataset, we observe a 1.4 % absolute (8.3 % relative) improvement in classification error rate over the non-contextual SVM, and 0.9 % absolute (5.5 % relative) improvement over the contextual SVM. Index Terms — Recurrent neural network, contextual do-main classification 1.
A P300 BCI for the Masses: Prior Information Enables Instant Unsupervised Spelling
"... The usability of Brain Computer Interfaces (BCI) based on the P300 speller is severely hindered by the need for long training times and many repetitions of the same stimulus. In this contribution we introduce a set of unsupervised hierarchical probabilistic models that tackle both problems simultane ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
The usability of Brain Computer Interfaces (BCI) based on the P300 speller is severely hindered by the need for long training times and many repetitions of the same stimulus. In this contribution we introduce a set of unsupervised hierarchical probabilistic models that tackle both problems simultaneously by incorporating prior knowledge from two sources: information from other training subjects (through transfer learning) and information about the words being spelled (through language models). We show, that due to this prior knowledge, the performance of the unsupervised models parallels and in some cases even surpasses that of supervised models, while eliminating the tedious training session. 1