Results 1 - 10
of
39
Gradient-based learning applied to document recognition
- Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract
-
Cited by 487 (38 self)
- Add to MetaCart
Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2-D) shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.
A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA
, 1996
"... We present a statistical model of genes in DNA. A Generalized Hidden Markov Model (GHMM) provides the framework for describing the grammar of a legal parse of a DNA sequence (Stormo & Haussler 1994). Probabilities are assigned to transitions between states in the GHMM and to the generation of each n ..."
Abstract
-
Cited by 122 (13 self)
- Add to MetaCart
We present a statistical model of genes in DNA. A Generalized Hidden Markov Model (GHMM) provides the framework for describing the grammar of a legal parse of a DNA sequence (Stormo & Haussler 1994). Probabilities are assigned to transitions between states in the GHMM and to the generation of each nucleotide base given a particular state. Machine learning techniques are applied to optimize these probabilities using a standardized training set. Given a new candidate sequence, the best parse is deduced from the model using a dynamic programming algorithm to identify the path through the model with maximum probability. The GHMM is flexible and modular, so new sensors and additional states can be inserted easily. In addition, it provides simple solutions for integrating cardinality constraints, reading frame constraints, "indels", and homology searching. The description and results of an implementation of such a gene-finding model, called Genie, is presented. The exon sensor is a codon fre...
Learning Comprehensible Descriptions of Multivariate Time Series
- In Ivan Bratko and Saso Dzeroski, editors, Proceedings of the 16 th International Conference of Machine Learning (ICML-99
, 1999
"... Supervised classification is one of the most active areas of machine learning research. Most work has focused on classification in static domains, where an instantaneous snapshot of attributes is meaningful. In many domains, attributes are not static; in fact, it is the way they vary temporally that ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
Supervised classification is one of the most active areas of machine learning research. Most work has focused on classification in static domains, where an instantaneous snapshot of attributes is meaningful. In many domains, attributes are not static; in fact, it is the way they vary temporally that can make classification possible. Examples of such domains include speech recognition, gesture recognition and electrocardiograph classification. While it is possible to use ad hoc, domain-specific techniques for "flattening " the time series to a learner-friendly representation, this fails to take into account both the special problems and special heuristics applicable to temporal data and often results in unreadable concept descriptions. Though traditional time series techniques can sometimes produce accurate classifiers, few can provide comprehensible descriptions. We propose a general architecture for classification and description of multivariate time series. It employs event primitive...
Noisy Time Series Prediction using a Recurrent Neural Network and Grammatical Inference
- Machine Learning
, 2001
"... Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method proposed uses conversion into a symbolic representation with a selforganizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the direction of change for th...
A tutorial on energy-based learning
- Predicting Structured Data
, 2006
"... Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables are given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discriminative and generative approaches, as well as graph-transformer networks, conditional random fields, maximum margin Markov networks, and several manifold learning methods. Probabilistic models must be properly normalized, which sometimes requires evaluating intractable integrals over the space of all possible variable configurations. Since EBMs have no requirement for proper normalization, this problem is naturally circumvented. EBMs can be viewed as a form of non-probabilistic factor graphs, and they provide considerably more flexibility in the design of architectures and training criteria than probabilistic approaches. 1
Vapnik-Chervonenkis Dimension of Recurrent Neural Networks
, 1997
"... Most of the work on the Vapnik-Chervonenkis dimension of neural networks has been focused on feedforward networks. However, recurrent networks are also widely used in learning applications, in particular when time is a relevant parameter. This paper provides lower and upper bounds for the VC dimensi ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
Most of the work on the Vapnik-Chervonenkis dimension of neural networks has been focused on feedforward networks. However, recurrent networks are also widely used in learning applications, in particular when time is a relevant parameter. This paper provides lower and upper bounds for the VC dimension of such networks. Several types of activation functions are discussed, including threshold, polynomial, piecewisepolynomial and sigmoidal functions. The bounds depend on two independent parameters: the number w of weights in the network, and the length k of the input sequence. In contrast, for feedforward networks, VC dimension bounds can be expressed as a function of w only. An important difference between recurrent and feedforward nets is that a fixed recurrent net can receive inputs of arbitrary length. Therefore we are particularly interested in the case k AE w. Ignoring multiplicative constants, the main results say roughly the following: ffl For architectures with activation oe = a...
Sample Complexity for Learning Recurrent Perceptron Mappings
- IEEE Trans. Inform. Theory
, 1996
"... Recurrent perceptron classifiers generalize the classical perceptron model. They take into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to e ..."
Abstract
-
Cited by 22 (10 self)
- Add to MetaCart
Recurrent perceptron classifiers generalize the classical perceptron model. They take into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. Keywords: perceptrons, recurrent models, neural networks, learning, Vapnik-Chervonenkis dimension 1 Introduction One of the most popular approaches to binary pattern classification, underlying many statistical techniques, is based on perceptrons or linear discriminants ; see for instance the classical reference [9]. In this context, one is interested in classifying k-dimensional input patterns v = (v 1 ; : : : ; v k ) into two disjoint classes A + and A \Gamma . A perceptron P which classifies vectors into A + and A \Gamma is characterized by a vector (of "weights") ~c 2 R k , and operates as follows. One forms the inner product ~c:v = c 1 v 1 + : : : c k v k . I...
Temporal Classification: Extending the Classification Paradigm to Multivariate Time Series
, 2002
"... Machine learning research has, to a great extent, ignored an important aspect of many real world applications: time. Existing concept learners predominantly operate on a static set of attributes; for example, classifying flowers described by leaf size, petal colour and petal count. The values of the ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
Machine learning research has, to a great extent, ignored an important aspect of many real world applications: time. Existing concept learners predominantly operate on a static set of attributes; for example, classifying flowers described by leaf size, petal colour and petal count. The values of these attributes is assumed to be unchanging -- the flower never grows or loses leaves.
Automatic Genre Classification of MIDI Recordings
, 2004
"... A software system that automatically classifies MIDI files into hierarchically organized taxonomies of musical genres is presented. This extensible software includes an easy to use and flexible GUI. An extensive library of high-level musical features is compiled, including many original features. A ..."
Abstract
-
Cited by 20 (12 self)
- Add to MetaCart
A software system that automatically classifies MIDI files into hierarchically organized taxonomies of musical genres is presented. This extensible software includes an easy to use and flexible GUI. An extensive library of high-level musical features is compiled, including many original features. A novel hybrid classification system is used that makes use of hierarchical, flat and round robin classification. Both k-nearest neighbour and neural network-based classifiers are used, and feature selection and weighting are performed using genetic algorithms. A thorough review of previous research in automatic genre classification is presented, along with an overview of automatic feature selection and classification techniques. Also included is a discussion of the theoretical issues relating to musical genre, including but not limited to what mechanisms humans use to classify music by genre and how realistic genre taxonomies can be constructed.

