Results 1 - 10
of
11
A Neural Probabilistic Language Model
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen ..."
Abstract
-
Cited by 81 (8 self)
- Add to MetaCart
A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.
The Use of a Bayesian Neural Network Model for Classification Tasks
, 1997
"... This thesis deals with a Bayesian neural network model. The focus is on how to use the model for automatic classification, i.e. on how to train the neural network to classify objects from some domain, given a database of labeled examples from the domain. The original Bayesian neural network is a one ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
This thesis deals with a Bayesian neural network model. The focus is on how to use the model for automatic classification, i.e. on how to train the neural network to classify objects from some domain, given a database of labeled examples from the domain. The original Bayesian neural network is a onelayer network implementing a naive Bayesian classifier. It is based on the assumption that different attributes of the objects appear independently of each other. This work has been aimed at extending the original Bayesian neural network model, mainly focusing on three different aspects. First the model is extended to a multi-layer network, to relax the independence requirement. This is done by introducing a hidden layer of complex columns, groups of units which take input from the same set of input attributes. Two different types of complex column structures in the hidden layer are studied and compared. An information theoretic measure is used to decide which input attributes to consider toget...
Image compression with neural networks - A survey
- Signal Processing: Image Communication 14
, 1999
"... Apart from the existing technology on image compression represented by series of JPEG, MPEG and H.26x standards, new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. Successful applications of neural networks to vector quantization ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Apart from the existing technology on image compression represented by series of JPEG, MPEG and H.26x standards, new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. Successful applications of neural networks to vector quantization have now become well established, and other aspects of neural network involvement in this area are stepping up to play signi"cant roles in assisting with those traditional technologies. This paper presents an extensive survey on the development of neural networks for image compression which covers three categories: direct image compression by neural networks; neural network implementation of existing techniques, and neural network based technology which provide improvement over traditional algorithms. # 1999 Elsevier Science B.V. All rights reserved.
Hierarchical probabilistic neural network language model
- AISTATS’05
, 2005
"... In recent years, variants of a neural network architecture for statistical language modeling have been proposed and successfully applied, e.g. in the language modeling component of speech recognizers. The main advantage of these architectures is that they learn an embedding for words (or other symbo ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
In recent years, variants of a neural network architecture for statistical language modeling have been proposed and successfully applied, e.g. in the language modeling component of speech recognizers. The main advantage of these architectures is that they learn an embedding for words (or other symbols) in a continuous space that helps to smooth the language model and provide good generalization even when the number of training examples is insufficient. However, these models are extremely slow in comparison to the more commonly used n-gram models, both for training and recognition. As an alternative to an importance sampling method proposed to speed-up training, we introduce a hierarchical decomposition of the conditional probabilities that yields a speed-up of about 200 both during training and recognition. The hierarchical decomposition is a binary hierarchical clustering constrained by the prior knowledge extracted from the WordNet semantic hierarchy.
Extracting Finite State Representations from Recurrent Neural Networks trained on Chaotic Symbolic Sequences
- IEEE Transactions on Neural Networks
, 1999
"... While much work has been done in neural based modeling of real valued chaotic time series, little effort has been devoted to address similar problems in the symbolic domain. We investigate the knowledge induction process associated with training recurrent neural networks (RNNs) on single long chaoti ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
While much work has been done in neural based modeling of real valued chaotic time series, little effort has been devoted to address similar problems in the symbolic domain. We investigate the knowledge induction process associated with training recurrent neural networks (RNNs) on single long chaotic symbolic sequences. Even though training RNNs to predict the next symbol leaves the standard performance measures such as the mean square error on the network output virtually unchanged, the networks nevertheless do extract a lot of knowledge. We monitor the knowledge extraction process by considering the networks stochastic sources and letting them generate sequences which are then confronted with the training sequence via information theoretic entropy and cross-entropy measures. We also study the possibility of reformulating the knowledge gained by RNNs in a compact and easy-to-analyze form of finite state stochastic machines. The experiments are performed on two sequences with different...
Neural Predictors For Detecting And Removing Redundant Information
- IN ADAPTIVE BEHAVIOR AND LEARNING
, 1998
"... The components of most real-world patterns contain redundant information. However, most pattern classifiers (e.g., statistical classifiers and neural nets) work better if pattern components are nonredundant. I present various unsupervised nonlinear predictor-based "neural" learning algorithms that t ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The components of most real-world patterns contain redundant information. However, most pattern classifiers (e.g., statistical classifiers and neural nets) work better if pattern components are nonredundant. I present various unsupervised nonlinear predictor-based "neural" learning algorithms that transform patterns and pattern sequences into less redundant patterns without loss of information. The first part of the paper shows how a neural predictor can be used to remove redundant information from input sequences. Experiments with artificial sequences demonstrate that certain supervised classification techniques can greatly benefit from this kind of unsupervised preprocessing. In the second part of the paper, a neural predictor is used to remove redundant information from natural text. With certain short newspaper articles, the neural method can achieve better compression ratios than the widely used asymptotically optimal Lempel-Ziv string compression algorithm. The third part of the ...
Predictive Coding With Neural Nets: Application To Text Compression
- Advances in Neural Information Processing Systems 7
, 1995
"... To compress text files, a neural predictor network P is used to approximate the conditional probability distribution of possible "next characters", given n previous characters. P 's outputs are fed into standard coding algorithms that generate short codes for characters with high predicted probabili ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
To compress text files, a neural predictor network P is used to approximate the conditional probability distribution of possible "next characters", given n previous characters. P 's outputs are fed into standard coding algorithms that generate short codes for characters with high predicted probability and long codes for highly unpredictable characters. Tested on short German newspaper articles, our method outperforms widely used Lempel-Ziv algorithms (used in UNIX functions such as "compress" and "gzip"). 1 INTRODUCTION The method presented in this paper is an instance of a strategy known as "predictive coding" or "model-based coding". To compress text files, a neural predictor network P approximates the conditional probability distribution of possible "next characters", given n previous characters. P 's outputs are fed into algorithms that generate short codes for characters with low information content (characters with high predicted probability) and long codes for characters conv...
Text Compression via Alphabet Re-Representation (Extended Abstract)
"... We consider re-representing the alphabet so that a representation of a character reflects its properties as a predictor of future text. This enables us to use an estimator from a restricted class to map contexts to predictions of upcoming characters. We describe an algorithm that uses this idea in c ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We consider re-representing the alphabet so that a representation of a character reflects its properties as a predictor of future text. This enables us to use an estimator from a restricted class to map contexts to predictions of upcoming characters. We describe an algorithm that uses this idea in conjunction with neural networks. The performance of this implementation is compared to other compression methods, such as UNIX compress, gzip, PPMC, and an alternative neural network approach.
Online Symbolic-Sequence Prediction with Recurrent Neural Networks
, 2001
"... This paper studies the use of recurrent neural networks for predicting the next symbol in a sequence. ..."
Abstract
- Add to MetaCart
This paper studies the use of recurrent neural networks for predicting the next symbol in a sequence.
Artificial Scientists & Artists Based on the Formal Theory of Creativity
"... I have argued that a simple but general formal theory of creativity explains many essential aspects of intelligence including science, art, music, humor. It is based on the concept of maximizing reward for the creation or discovery of novel patterns allowing for improved data compression or predicti ..."
Abstract
- Add to MetaCart
I have argued that a simple but general formal theory of creativity explains many essential aspects of intelligence including science, art, music, humor. It is based on the concept of maximizing reward for the creation or discovery of novel patterns allowing for improved data compression or prediction. Here I discuss what kind of general bias towards algorithmic regularities we insert into our robots by implementing the principle, why that bias is good, and how the approach greatly generalizes the field of active learning. I emphasize the importance of limited computational resources for online prediction and compression, and provide discrete and continuous time formulations for ongoing work on building an Artificial General Intelligence (AGI) based on variants of the artificial creativity framework.

