Results 1  10
of
25
A Neural Probabilistic Language Model
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen ..."
Abstract

Cited by 396 (19 self)
 Add to MetaCart
(Show Context)
A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on ngrams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on stateoftheart ngram models, and that the proposed approach allows to take advantage of longer contexts.
Hierarchical probabilistic neural network language model
 In AISTATS
, 2005
"... In recent years, variants of a neural network architecture for statistical language modeling have been proposed and successfully applied, e.g. in the language modeling component of speech recognizers. The main advantage of these architectures is that they learn an embedding for words (or other symbo ..."
Abstract

Cited by 93 (4 self)
 Add to MetaCart
(Show Context)
In recent years, variants of a neural network architecture for statistical language modeling have been proposed and successfully applied, e.g. in the language modeling component of speech recognizers. The main advantage of these architectures is that they learn an embedding for words (or other symbols) in a continuous space that helps to smooth the language model and provide good generalization even when the number of training examples is insufficient. However, these models are extremely slow in comparison to the more commonly used ngram models, both for training and recognition. As an alternative to an importance sampling method proposed to speedup training, we introduce a hierarchical decomposition of the conditional probabilities that yields a speedup of about 200 both during training and recognition. The hierarchical decomposition is a binary hierarchical clustering constrained by the prior knowledge extracted from the WordNet semantic hierarchy. 1
Formal Theory of Creativity, Fun, and Intrinsic Motivation (19902010)
"... The simple but general formal theory of fun & intrinsic motivation & creativity (1990) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditio ..."
Abstract

Cited by 70 (15 self)
 Add to MetaCart
The simple but general formal theory of fun & intrinsic motivation & creativity (1990) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditional field of active learning, and is related to old but less formal ideas in aesthetics theory and developmental psychology. It has been argued that the theory explains many essential aspects of intelligence including autonomous development, science, art, music, humor. This overview first describes theoretically optimal (but not necessarily practical) ways of implementing the basic computational principles on exploratory, intrinsically motivated agents or robots, encouraging them to provoke event sequences exhibiting previously unknown but learnable algorithmic regularities. Emphasis is put on the importance of limited computational resources for online prediction and compression. Discrete and continuous time formulations are given. Previous practical but nonoptimal implementations (1991, 1995, 19972002) are reviewed, as well as several recent variants by others (2005). A simplified typology addresses current confusion concerning the precise nature of intrinsic motivation.
Developmental Robotics, Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts
, 2006
"... Even in absence of external reward, babies and scientists and others explore their world. Using some sort of adaptive predictive world model, they improve their ability to answer questions such as: what happens if I do this or that? They lose interest in both the predictable things and those predict ..."
Abstract

Cited by 64 (18 self)
 Add to MetaCart
Even in absence of external reward, babies and scientists and others explore their world. Using some sort of adaptive predictive world model, they improve their ability to answer questions such as: what happens if I do this or that? They lose interest in both the predictable things and those predicted to remain unpredictable despite some effort. One can design curious robots that do the same. The author’s basic idea for doing so (1990, 1991): a reinforcement learning (RL) controller is rewarded for action sequences that improve the predictor. Here this idea is revisited in the context of recent results on optimal predictors and optimal RL machines. Several new variants of the basic principle are proposed. Finally it is pointed out how the fine arts can be formally understood as a consequence of the principle: given some subjective observer, great works of art and music yield observation histories exhibiting more novel, previously unknown compressibility / regularity / predictability (with respect to the observer’s particular learning algorithm) than lesser works, thus deepening the observer’s understanding of the world and what is possible in it.
Image compression with neural networks  A survey
 Signal Processing: Image Communication 14
, 1999
"... Apart from the existing technology on image compression represented by series of JPEG, MPEG and H.26x standards, new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. Successful applications of neural networks to vector quantization ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
(Show Context)
Apart from the existing technology on image compression represented by series of JPEG, MPEG and H.26x standards, new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. Successful applications of neural networks to vector quantization have now become well established, and other aspects of neural network involvement in this area are stepping up to play signi"cant roles in assisting with those traditional technologies. This paper presents an extensive survey on the development of neural networks for image compression which covers three categories: direct image compression by neural networks; neural network implementation of existing techniques, and neural network based technology which provide improvement over traditional algorithms. # 1999 Elsevier Science B.V. All rights reserved.
The Use of a Bayesian Neural Network Model for Classification Tasks
, 1997
"... This thesis deals with a Bayesian neural network model. The focus is on how to use the model for automatic classification, i.e. on how to train the neural network to classify objects from some domain, given a database of labeled examples from the domain. The original Bayesian neural network is a one ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
This thesis deals with a Bayesian neural network model. The focus is on how to use the model for automatic classification, i.e. on how to train the neural network to classify objects from some domain, given a database of labeled examples from the domain. The original Bayesian neural network is a onelayer network implementing a naive Bayesian classifier. It is based on the assumption that different attributes of the objects appear independently of each other. This work has been aimed at extending the original Bayesian neural network model, mainly focusing on three different aspects. First the model is extended to a multilayer network, to relax the independence requirement. This is done by introducing a hidden layer of complex columns, groups of units which take input from the same set of input attributes. Two different types of complex column structures in the hidden layer are studied and compared. An information theoretic measure is used to decide which input attributes to consider toget...
Neural Predictors For Detecting And Removing Redundant Information
 IN ADAPTIVE BEHAVIOR AND LEARNING
, 1998
"... The components of most realworld patterns contain redundant information. However, most pattern classifiers (e.g., statistical classifiers and neural nets) work better if pattern components are nonredundant. I present various unsupervised nonlinear predictorbased "neural" learning algorit ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
(Show Context)
The components of most realworld patterns contain redundant information. However, most pattern classifiers (e.g., statistical classifiers and neural nets) work better if pattern components are nonredundant. I present various unsupervised nonlinear predictorbased "neural" learning algorithms that transform patterns and pattern sequences into less redundant patterns without loss of information. The first part of the paper shows how a neural predictor can be used to remove redundant information from input sequences. Experiments with artificial sequences demonstrate that certain supervised classification techniques can greatly benefit from this kind of unsupervised preprocessing. In the second part of the paper, a neural predictor is used to remove redundant information from natural text. With certain short newspaper articles, the neural method can achieve better compression ratios than the widely used asymptotically optimal LempelZiv string compression algorithm. The third part of the ...
Extracting Finite State Representations from Recurrent Neural Networks trained on Chaotic Symbolic Sequences
 IEEE Transactions on Neural Networks
, 1999
"... While much work has been done in neural based modeling of real valued chaotic time series, little effort has been devoted to address similar problems in the symbolic domain. We investigate the knowledge induction process associated with training recurrent neural networks (RNNs) on single long chaoti ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
While much work has been done in neural based modeling of real valued chaotic time series, little effort has been devoted to address similar problems in the symbolic domain. We investigate the knowledge induction process associated with training recurrent neural networks (RNNs) on single long chaotic symbolic sequences. Even though training RNNs to predict the next symbol leaves the standard performance measures such as the mean square error on the network output virtually unchanged, the networks nevertheless do extract a lot of knowledge. We monitor the knowledge extraction process by considering the networks stochastic sources and letting them generate sequences which are then confronted with the training sequence via information theoretic entropy and crossentropy measures. We also study the possibility of reformulating the knowledge gained by RNNs in a compact and easytoanalyze form of finite state stochastic machines. The experiments are performed on two sequences with different...
Artificial Scientists & Artists Based on the Formal Theory of Creativity
"... I have argued that a simple but general formal theory of creativity explains many essential aspects of intelligence including science, art, music, humor. It is based on the concept of maximizing reward for the creation or discovery of novel patterns allowing for improved data compression or predicti ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
I have argued that a simple but general formal theory of creativity explains many essential aspects of intelligence including science, art, music, humor. It is based on the concept of maximizing reward for the creation or discovery of novel patterns allowing for improved data compression or prediction. Here I discuss what kind of general bias towards algorithmic regularities we insert into our robots by implementing the principle, why that bias is good, and how the approach greatly generalizes the field of active learning. I emphasize the importance of limited computational resources for online prediction and compression, and provide discrete and continuous time formulations for ongoing work on building an Artificial General Intelligence (AGI) based on variants of the artificial creativity framework.
Online SymbolicSequence Prediction with DiscreteTime Recurrent Neural Networks
 Proceedings of the International Conference on Artificial Neural Networks (ICANN’01
, 2001
"... This paper studies the use of discretetime recurrent neural networks for predicting the next symbol in a sequence. The focus is on online prediction, a task much harder than the classical offine grammatical inference with neural networks. The results obtained show that the performance of recurrent ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This paper studies the use of discretetime recurrent neural networks for predicting the next symbol in a sequence. The focus is on online prediction, a task much harder than the classical offine grammatical inference with neural networks. The results obtained show that the performance of recurrent networks working online is acceptable when sequences come from finitestate machines or even from some chaotic sources. When predicting texts in human language, however, dynamics seem to be too complex to be correctly learned in realtime by the net.