Results 1  10
of
36
Exploiting the Past and the Future in Protein Secondary Structure Prediction
, 1999
"... Motivation: Predicting the secondary structure of a protein (alphahelix, betasheet, coil) is an important step towards elucidating its three dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network archite ..."
Abstract

Cited by 116 (22 self)
 Add to MetaCart
Motivation: Predicting the secondary structure of a protein (alphahelix, betasheet, coil) is an important step towards elucidating its three dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network architectures with a fixed, and relatively short, input window of amino acids, centered at the prediction site. Although a fixed small window avoids overfitting problems, it does not permit to capture variable longranged information. Results: We introduce a family of novel architectures which can learn to make predictions based on variable ranges of dependencies. These architectures extend recurrent neural networks, introducing noncausal bidirectional dynamics to capture both upstream and downstream information. The prediction algorithm is completed by the use of mixtures of estimators that leverage evolutionary information, expressed in terms of multiple alignments, both at the input and output levels. While our system currently achieves an overall performance close to 76% correct predictionat least comparable to the best existing systemsthe main emphasis here is on the development of new algorithmic ideas. Availability: The executable program for predicting protein secondary structure is available from the authors free of charge. Contact: pfbaldi@ics.uci.edu, gpollast@ics.uci.edu, brunak@cbs.dtu.dk, paolo@dsi.unifi.it. 1
Learning to Forget: Continual Prediction with LSTM
 NEURAL COMPUTATION
, 1999
"... Long ShortTerm Memory (LSTM, Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences w ..."
Abstract

Cited by 51 (25 self)
 Add to MetaCart
Long ShortTerm Memory (LSTM, Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset. Without resets, the state may grow indenitely and eventually cause the network to break down. Our remedy is a novel, adaptive \forget gate" that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review illustrative benchmark problems on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve continual versions of these problems. LSTM with forget gates, however, easily solves them in an elegant way.
Gradient Flow in Recurrent Nets: the Difficulty of Learning LongTerm Dependencies
, 2001
"... Recurrent networks (crossreference Chapter 12) can, in principle, use their feedback connections to store representations of recent input events in the form of activations. The most widely used algorithms for learning what to put in shortterm memory, however, take too much time to be feasible or d ..."
Abstract

Cited by 47 (24 self)
 Add to MetaCart
Recurrent networks (crossreference Chapter 12) can, in principle, use their feedback connections to store representations of recent input events in the form of activations. The most widely used algorithms for learning what to put in shortterm memory, however, take too much time to be feasible or do not work well at all, especially when minimal time lags between inputs and corresponding teacher signals are long. Although theoretically fascinating, they do not provide clear practical advantages over, say, backprop in feedforward networks with limited time windows (see crossreference Chapters 11 and 12). With conventional "algorithms based on the computation of the complete gradient", such as "BackPropagation Through Time" (BPTT, e.g., [22, 27, 26]) or "RealTime Recurrent Learning" (RTRL, e.g., [21]) error signals "flowing backwards in time" tend to either (1) blow up or (2) vanish: the temporal evolution of the backpropagated error ex
A Taxonomy for Spatiotemporal Connectionist Networks Revisited: The Unsupervised Case
 Neural Computation
, 2003
"... Spatiotemporal connectionist networks (STCN's) comprise an important class of neural models that can deal with patterns distributed both in time and space. In this paper, we widen the application domain of the taxonomy for supervised STCN's recently proposed by Kremer (2001) to the unsupervised case ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
Spatiotemporal connectionist networks (STCN's) comprise an important class of neural models that can deal with patterns distributed both in time and space. In this paper, we widen the application domain of the taxonomy for supervised STCN's recently proposed by Kremer (2001) to the unsupervised case. This is possible through a reinterpretation of the state vector as a vector of latent (hidden) variables, as proposed by Meinicke (2000). The goal of this generalized taxonomy is then to provide a nonlinear generative framework for describing unsupervised spatiotemporal networks, making it easier to compare and contrast their representational and operational characteristics. Computational properties, representational issues and learning are also discussed and a number of references to the relevant source publications are provided. It is argued that the proposed approach is simple and more powerful than the previous attempts, from a descriptive and predictive viewpoint. We also discuss the relation of this taxonomy with automata theory and state space modeling, and suggest directions for further work.
Bidirectional dynamics for protein secondary structure prediction
 Sequence Learning: Paradigms, Algorithms, and Applications
, 2000
"... For certain categories of sequences, information from both the past and the future can be used for analysis and predictions at time t. This is the case for biological sequences where the nature and function of a region in a sequence may strongly depend on events located both upstream and downstream. ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
For certain categories of sequences, information from both the past and the future can be used for analysis and predictions at time t. This is the case for biological sequences where the nature and function of a region in a sequence may strongly depend on events located both upstream and downstream. We develop a new family of adaptive graphical model architectures for learning noncausal sequence translations. These architectures employ two chains of hidden variables that propagate information from the past and from the future, respectively. This general idea can be instantiated either as a stochastic model (generalizing input output hidden Markov models), or as a neural network (generalizing recurrent neural networks). We illustrate the methodology by applying bidirectional models to the problem of protein secondary structure prediction. 1
Efficient Evolution of Asymmetric Recurrent Neural Networks Using PDGPInspired . . .
, 1998
"... Recurrent neural networks are particularly useful for processing time sequences and simulating dynamical systems. However, methods for building recurrent architectures have been hindered by the fact that available training algorithms are considerably more complex than those for feedforward networ ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
Recurrent neural networks are particularly useful for processing time sequences and simulating dynamical systems. However, methods for building recurrent architectures have been hindered by the fact that available training algorithms are considerably more complex than those for feedforward networks. In this paper
Applying LSTM to time series predictable through timewindow approaches
 LECTURE NOTES IN COMPUTER SCIENCE
, 2001
"... Long ShortTerm Memory (LSTM) is able to solve many time series tasks unsolvable by feedforward networks using fixed size time windows. Here we find that LSTM's superiority does not carry over to certain simpler time series prediction tasks solvable by time window approaches: the MackeyGlass ser ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Long ShortTerm Memory (LSTM) is able to solve many time series tasks unsolvable by feedforward networks using fixed size time windows. Here we find that LSTM's superiority does not carry over to certain simpler time series prediction tasks solvable by time window approaches: the MackeyGlass series and the Santa Fe FIR laser emission series (Set A). This suggests to use LSTM only when simpler traditional approaches fail.
How Embedded Memory in Recurrent Neural Network Architectures Helps Learning Longterm Temporal Dependencies
, 1996
"... Learning longterm temporal dependencies with recurrent neural networks can be a difficult problem. It has recently been shown that a class of recurrent neural networks called NARX networks perform much better than conventional recurrent neural networks for learning certain simple longterm dependen ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Learning longterm temporal dependencies with recurrent neural networks can be a difficult problem. It has recently been shown that a class of recurrent neural networks called NARX networks perform much better than conventional recurrent neural networks for learning certain simple longterm dependency problems. The intuitive explanation for this behavior is that the output memories of a NARX network can be manifested as jumpahead connections in the timeunfolded network. These jumpahead connections can propagate gradient information more efficiently, thus reducing the sensitivity of the network to longterm dependencies. This work gives empirical justification to our hypothesis that similar improvements in learning longterm dependencies can be achieved with other classes of recurrent neural network architectures simply by increasing the order of the embedded memory. In particular we explore the impact of learning simple longterm dependency problems on three classes of recurrent neu...
The Vanishing Gradient Problem during Learning . . .
 INTERNATIONAL JOURNAL OF UNCERTAINTY, FUZZINESS AND KNOWLEDGEBASED SYSTEMS
"... ... In this article the decaying error flow is theoretically analyzed. Then methods trying to overcome vanishing gradients are briefly discussed. Finally, experiments comparing conventional algorithms and alternative methods are presented. With advanced methods long time lag problems can be solv ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
... In this article the decaying error flow is theoretically analyzed. Then methods trying to overcome vanishing gradients are briefly discussed. Finally, experiments comparing conventional algorithms and alternative methods are presented. With advanced methods long time lag problems can be solved in reasonable time.