@MISC{Ahr_multilayerneural, author = {M Ahr and M Biehl and E Schlösser}, title = {multilayer neural networks}, year = {} }

Share

OpenURL

Abstract

Abstract. We investigate layered neural networks with differentiable activation function and student vectors without normalization constraint by means of equilibrium statistical physics. We consider the learning of perfectly realizable rules and find that the length of student vectors becomes infinite, unless a proper weight decay term is added to the energy. Then, the system undergoes a first order phase transition between states with very long student vectors and states where the lengths are comparable to those of the teacher vectors. Additionally in both configurations there is a phase transition between a specialized and an unspecialized phase. An anti-specialized phase with long student vectors exists in networks with a small number of hidden units. Statistical physics has been applied successfully to the investigation of equilibrium states of neural networks. [1, 2] The by now standard analysis of off-line training from a fixed training set is based on the interpretation of training as a stochastic process which leads to a well-defined thermal equilibrium. Investigations of perceptrons [3, 4, 5] or committee machines [6, 7, 8, 9, 10] have widely improved understanding of learning in neural networks. Meanwhile these studies are being extended to the more application relevant scenario of networks with continuous activation function and output. [11, 12, 13] The soft-committee machine is a two-layered neural network which consists of a layer of K hidden units, all of which are connected with the entire N-dimensional input ξ. The total output σ ist proportional to the sum of outputs of all hidden units: σ(ξ) = 1 ∑