Results 11 - 20
of
22
1 Learning Deep Architectures for AI
"... Theoretical results suggest that in order to learn the kind of complicated functions that can represent highlevel abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as i ..."
Abstract
- Add to MetaCart
Theoretical results suggest that in order to learn the kind of complicated functions that can represent highlevel abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a difficult task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state-of-the-art in certain areas. This paper discusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks. 1
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 1 Brain–Computer Evolutionary Multiobjective Optimization (BC-EM
"... Abstract—The centrality of the decision maker (DM) is widely recognized in the multiple criteria decision-making community. This translates into emphasis on seamless human–computer interaction, and adaptation of the solution technique to the knowledge which is progressively acquired from the DM. Thi ..."
Abstract
- Add to MetaCart
Abstract—The centrality of the decision maker (DM) is widely recognized in the multiple criteria decision-making community. This translates into emphasis on seamless human–computer interaction, and adaptation of the solution technique to the knowledge which is progressively acquired from the DM. This paper adopts the methodology of reactive search optimization (RSO) for evolutionary interactive multiobjective optimization. RSO follows to the paradigm of “learning while optimizing,” through the use of online machine learning techniques as an integral part of a self-tuning optimization scheme. User judgments of couples of solutions are used to build robust incremental models of the user utility function, with the objective to reduce the cognitive burden required from the DM to identify a satisficing solution. The technique of support vector ranking is used together with a k-fold cross-validation procedure to select the best kernel for the problem at hand, during the utility function training procedure. Experimental results are presented for a series of benchmark problems. Index Terms—Interactive decision making, machine learning, reactive search optimization, support vector ranking. I.
Self-Paced Learning for Latent Variable Models
"... Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than conside ..."
Abstract
- Add to MetaCart
Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than considering all samples simultaneously, the algorithm should be presented with the training data in a meaningful order that facilitates learning. The order of the samples is determined by how easy they are. The main challenge is that often we are not provided with a readily computable measure of the easiness of samples. We address this issue by proposing a novel, iterative self-paced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector. The number of samples selected is governed by a weight that is annealed until the entire training data has been considered. We empirically demonstrate that the self-paced learning algorithm outperforms the state of the art method for learning a latent structural SVM on four applications: object localization, noun phrase coreference, motif finding and handwritten digit recognition. 1
A Practical and Conceptual Framework for Learning in Control
, 2010
"... We propose a fully Bayesian approach for efficient reinforcement learning (RL) in Markov decision processes with continuous-valued state and action spaces when no expert knowledge is available. Our framework is based on well-established ideas from statistics and machine learning and learns fast sinc ..."
Abstract
- Add to MetaCart
We propose a fully Bayesian approach for efficient reinforcement learning (RL) in Markov decision processes with continuous-valued state and action spaces when no expert knowledge is available. Our framework is based on well-established ideas from statistics and machine learning and learns fast since it carefully models, quantifies, and incorporates available knowledge when making decisions. The key ingredient of our framework is a probabilistic model, which is implemented using a Gaussian process (GP), a distribution over functions. In the context of dynamic systems, the GP models the transition function. By considering all plausible transition functions simultaneously, we reduce model bias, a problem that frequently occurs when deterministic models are used. Due to its generality and efficiency, our RL framework can be considered a conceptual and practical approach to learning models and controllers when
Language Models as Representations for Weakly-Supervised NLP Tasks
"... Finding the right representation for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This paper investigates language model representations, in which language models trained on unlabeled corpora are used to generate real-valued feature ve ..."
Abstract
- Add to MetaCart
Finding the right representation for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This paper investigates language model representations, in which language models trained on unlabeled corpora are used to generate real-valued feature vectors for words. We investigate ngram models and probabilistic graphical models, including a novel lattice-structured Markov Random Field. Experiments indicate that language model representations outperform traditional representations, and that graphical model representations outperform ngram models, especially on sparse and polysemous words. 1
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence On the Utility of Curricula in Unsupervised Learning of Probabilistic Grammars
"... We examine the utility of a curriculum (a means of presenting training samples in a meaningful order) in unsupervised learning of probabilistic grammars. We introduce the incremental construction hypothesis that explains the benefits of a curriculum in learning grammars and offers some useful insigh ..."
Abstract
- Add to MetaCart
We examine the utility of a curriculum (a means of presenting training samples in a meaningful order) in unsupervised learning of probabilistic grammars. We introduce the incremental construction hypothesis that explains the benefits of a curriculum in learning grammars and offers some useful insights into the design of curricula as well as learning algorithms. We present results of experiments with (a) carefully crafted synthetic data that provide support for our hypothesis and (b) natural language corpus that demonstrate the utility of curricula in unsupervised learning of probabilistic grammars. 1
Strategies for Training Large Scale Neural Network Language Models
"... Abstract—We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, ..."
Abstract
- Add to MetaCart
Abstract—We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. This leads to significant reduction of computational complexity. We achieved around 10 % relative reduction of word error rate on English Broadcast News speech recognition task, against large 4-gram model trained on 400M tokens. I.
LEVERAGING NOISY ONLINE DATABASES FOR USE IN CHORD RECOGNITION
"... The most significant problem faced by Machine Learningbased chord recognition systems is arguably the lack of highquality training examples. In this paper, we address this problem by leveraging the availability of chord annotations from guitarist websites. We show that such annotations can be used a ..."
Abstract
- Add to MetaCart
The most significant problem faced by Machine Learningbased chord recognition systems is arguably the lack of highquality training examples. In this paper, we address this problem by leveraging the availability of chord annotations from guitarist websites. We show that such annotations can be used as partial supervision of a semi-supervised chord recognition method—partial since accurate timing information is lacking. A particular challenge in the exploitation of these data is their low quality, potentially even leading to a performance degradation if used directly. We demonstrate however that a curriculum learning strategy can be used to automatically rank annotations according to their potential for improving the performance. Using this strategy, our experiments show a modest improvement for a simple major/minor chord alphabet, but a highly significant improvement for a much larger chord alphabet. 1.
Minimal Supervision . . . BOOTSTRAPPING GLOBAL PATTERNS FROM LOCAL KNOWLEDGE
, 2011
"... A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Each step depends on prior lexical and s ..."
Abstract
- Add to MetaCart
A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Each step depends on prior lexical and syntactic knowledge. Where do children begin in solving this problem when learning their first languages? To experiment with different representations that children may use to begin understanding language, we have built a computational model for this early point in language acquisition. This system, BabySRL, learns from transcriptions of natural child-directed speech and makes use of psycholinguistically plausible background knowledge and realistically noisy semantic feedback to begin to classify sentences at the level of “who does what to whom.” Starting with simple, psycholinguistically-motivated representations of sentence structure, the BabySRL is able to learn from full semantic feedback, as well as a supervision signal derived from partial semantic background knowledge. In addition we combine the BabySRL with an unsupervised Hidden Markov Model part-of-speech tagger, linking clusters with syntactic categories using background noun knowledge so that they can be used to parse input for the SRL system. The results show that proposed shallow representations of sentence structure are robust to reductions in parsing

