Results 1  10
of
13
A Neural Probabilistic Language Model
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen ..."
Abstract

Cited by 407 (20 self)
 Add to MetaCart
A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on ngrams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on stateoftheart ngram models, and that the proposed approach allows to take advantage of longer contexts.
Data Mining in Soft Computing Framework: A Survey
 IEEE Transactions on Neural Networks
, 2001
"... The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the mode ..."
Abstract

Cited by 105 (3 self)
 Add to MetaCart
(Show Context)
The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in datarich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.
Quick Training of Probabilistic Neural Nets by Importance Sampling
, 2003
"... Our previous work on statistical language modeling introduced the use of probabilistic feedforward neural networks to help dealing with the curse of dimensionality. Training this model by maximum likelihood however requires for each example to perform as many network passes as there are words in the ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
(Show Context)
Our previous work on statistical language modeling introduced the use of probabilistic feedforward neural networks to help dealing with the curse of dimensionality. Training this model by maximum likelihood however requires for each example to perform as many network passes as there are words in the vocabulary. Inspired by the contrastive divergence model, we propose and evaluate samplingbased methods which require network passes only for the observed "positive example" and a few sampled negative example words. A very significant speedup is obtained with an adaptive importance sampling.
Lexical Approaches to Backoff in Statistical Parsing
, 2005
"... ii This thesis is an investigation of methods for improving the accuracy of a statistical parser. A statistical parser uses a probabilistic grammar derived from a training corpus of handparsed sentences. The grammar is represented as a set of constructions — in a simple case these might be context ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
ii This thesis is an investigation of methods for improving the accuracy of a statistical parser. A statistical parser uses a probabilistic grammar derived from a training corpus of handparsed sentences. The grammar is represented as a set of constructions — in a simple case these might be contextfree rules. The probability of each construction in the grammar is then estimated by counting its relative frequency in the corpus. A crucial problem when building a probabilistic grammar is to select an appropriate level of granularity for describing the constructions being learned. The more constructions we include in our grammar, the more sophisticated a model of the language we produce. However, if too many different constructions are included, then our corpus is unlikely to contain reliable information about the relative frequency of many constructions. In existing statistical parsers two main approaches have been taken to choosing
Probabilistic Neural Network Models for Sequential Data
"... It has already been shown how Artificial Neural Networks (ANNs) can be incorporated into probabilistic models. In this paper we review some of the approaches which have been proposed to incorporate them into probabilistic models of sequential data, such as Hidden Markov Models (HMMs). We also discus ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
It has already been shown how Artificial Neural Networks (ANNs) can be incorporated into probabilistic models. In this paper we review some of the approaches which have been proposed to incorporate them into probabilistic models of sequential data, such as Hidden Markov Models (HMMs). We also discuss new developments and new ideas in this area, in particular how ANNs can be used to model highdimensional discrete and continuous data to deal with the curse of dimensionality, and how the ideas proposed in these models could be applied to statistical language modeling to represent longerterm context than allowed by trigram models, while keeping wordorder information. 1
A Hybrid Intelligent Learning Algorithm to Identify the ECNS Based on FBP Optimized by GA
"... Abstract—Along with the development of computer network, the electronic commerce has become the new pattern to carry on the commercial activity gradually, but the security problem is also getting more and more prominent. How to identify the Ecommerce network security (ECNS) rating and establish a s ..."
Abstract
 Add to MetaCart
Abstract—Along with the development of computer network, the electronic commerce has become the new pattern to carry on the commercial activity gradually, but the security problem is also getting more and more prominent. How to identify the Ecommerce network security (ECNS) rating and establish a security convenient application environment for the electronic commerce has already become a major concern topic that needs to be settled urgently. To identify the ECNS rating scientifically and accurately, this paper proposes a hybrid intelligent learning algorithm which uses the genetic algorithm (GA) to optimize the fuzzy backpropagation (FBP) neural network. The algorithm not only can exert the unique advantages of BP neural network (BPNN), but also overcome the shortcoming to produce the local minimum points in the network modeling process and enhance the accuracy of network security identification greatly. The ECNS identification results for 14 Ecommerce systems show that the method is reliable and efficiency. Index Terms—hybrid intelligent algorithm, FBP, GA, ECNS, security rating identification I.
A Neural Probabilistic Language Model
"... Abstract A goal of statistical language modeling is to learn the joint probabilityfunction of sequences of words. This is intrinsically difficult because of the curse of dimensionality: we propose to fight it with its own weapons.In the proposed approach one learns simultaneously (1) a distributed r ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract A goal of statistical language modeling is to learn the joint probabilityfunction of sequences of words. This is intrinsically difficult because of the curse of dimensionality: we propose to fight it with its own weapons.In the proposed approach one learns simultaneously (1) a distributed representation for each word (i.e. a similarity between words) along with (2)the probability function for word sequences, expressed with these representations. Generalization is obtained because a sequence of words thathas never been seen before gets high probability if it is made of words that are similar to words forming an already seen sentence. We report onexperiments using neural networks for the probability function, showing on two text corpora that the proposed approach very significantly improves on a stateoftheart trigram model. 1 Introduction A fundamental problem that makes language modeling and other learning problems difficult is the curse of dimensionality. It is particularly obvious in the case when one wants to model the joint distribution between many discrete random variables (such as words in asentence, or discrete attributes in a datamining task). For example, if one wants to model the joint distribution of 10 consecutive words in a natural language with a vocabulary V ofsize 100,000, there are potentially
The Restricted Boltzmann Machine (Smolensky, 1986;
"... This is a discussion of Larochelle and Murray (2011). ..."
(Show Context)
The Application of Fuzzy Neural Network in Data Mining
"... Abstract. Fuzzy neural network, which can deal with complex data and prediction process that other algorithms can not accomplish, has become a focus in recent years in many fields. Data mining can extract such information and knowledge as data classification, spatial evolution and prediction and s ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Fuzzy neural network, which can deal with complex data and prediction process that other algorithms can not accomplish, has become a focus in recent years in many fields. Data mining can extract such information and knowledge as data classification, spatial evolution and prediction and so on, and in the huge cadastral data find the implied information which is helpful for our urban construction. I.
Input Variable Selection Using Parallel Processing of RBF Neural Networks
, 2007
"... Abstract: In this paper we propose a new technique focused on the selection of the important input variable for modelling complex systems of function approximation problems, in order to avoid the exponential increase in the complexity of the system that is usual when dealing with many input variable ..."
Abstract
 Add to MetaCart
Abstract: In this paper we propose a new technique focused on the selection of the important input variable for modelling complex systems of function approximation problems, in order to avoid the exponential increase in the complexity of the system that is usual when dealing with many input variables. The proposed parallel processing approach is composed of complete radial basis function neural networks that are in charge of a reduced set of input variables depending in the general behaviour of the problem. For the optimization of the parameters of each radial basis function neural networks in the system, we propose a new method to select the more important input variables which is capable of deciding which of the chosen variables go alone or together to each radial basis function neural networks to build the parallel structure, thus reducing the dimension of the input variable space for each radial basis function neural networks. We also provide an algorithm which automatically finds the most suitable topology of the proposed parallel processing structure and selects the more important input variables for it. Therefore, our goal is to find the most suitable of the proposed families of parallel processing architectures in order to approximate a system from which a set of input/output. So that the proposed parallel processing structure outperforms other algorithms not only with respect to the final approximation error but also with respect to the number of computation parameters of the system.