Results 1  10
of
32
SpatioTemporal Coding for Wireless Communication
 IEEE Trans. Commun
, 1998
"... Multipath signal propagation has long been viewed as an impairment to reliable communication in wireless channels. This paper shows that the presence of multipath greatly improves achievable data rate if the appropriate communication structure is employed. A compact model is developed for the multip ..."
Abstract

Cited by 276 (14 self)
 Add to MetaCart
Multipath signal propagation has long been viewed as an impairment to reliable communication in wireless channels. This paper shows that the presence of multipath greatly improves achievable data rate if the appropriate communication structure is employed. A compact model is developed for the multipleinput multipleoutput (MIMO) dispersive spatially selective wireless communication channel. The multivariate information capacity is analyzed. For high signaltonoise ratio (SNR) conditions, the MIMO channel can exhibit a capacity slope in bits per decibel of power increase that is proportional to the minimum of the number multipath components, the number of input antennas, or the number of output antennas. This desirable result is contrasted with the lower capacity slope of the wellstudied case with multiple antennas at only one side of the radio link. A spatiotemporal vectorcoding (STVC) communication structure is suggested as a means for achieving MIMO channel capacity. The complexity of STVC motivates a more practical reducedcomplexity discrete matrix multitone (DMMT) spacefrequency coding approach. Both of these structures are shown to be asymptotically optimum. An adaptivelattice trelliscoding technique is suggested as a method for coding across the space and frequency dimensions that exist in the DMMT channel. Experimental examples that support the theoretical results are presented. Index TermsAdaptive arrays, adaptive coding, adaptive modulation, antenna arrays, broadband communication, channel coding, digital modulation, information rates, MIMO systems, multipath channels. I.
SecretKey Reconciliation by Public Discussion
, 1994
"... . Assuming that Alice and Bob use a secret noisy channel (modelled by a binary symmetric channel) to send a key, reconciliation is the process of correcting errors between Alice's and Bob's version of the key. This is done by public discussion, which leaks some information about the secret key to an ..."
Abstract

Cited by 93 (3 self)
 Add to MetaCart
. Assuming that Alice and Bob use a secret noisy channel (modelled by a binary symmetric channel) to send a key, reconciliation is the process of correcting errors between Alice's and Bob's version of the key. This is done by public discussion, which leaks some information about the secret key to an eavesdropper. We show how to construct protocols that leak a minimum amount of information. However this construction cannot be implemented efficiently. If Alice and Bob are willing to reveal an arbitrarily small amount of additional information (beyond the minimum) then they can implement polynomialtime protocols. We also present a more efficient protocol, which leaks an amount of information acceptably close to the minimum possible for sufficiently reliable secret channels (those with probability of any symbol being transmitted incorrectly as large as 15%). This work improves on earlier reconciliation approaches [R, BBR, BBBSS]. 1 Introduction Unlike public key cryptosystems, the securi...
Discovering Neural Nets With Low Kolmogorov Complexity And High Generalization Capability
 Neural Networks
, 1997
"... Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universali ..."
Abstract

Cited by 50 (31 self)
 Add to MetaCart
Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov comple...
Hierarchies Of Generalized Kolmogorov Complexities And Nonenumerable Universal Measures Computable In The Limit
 INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE
, 2000
"... The traditional theory of Kolmogorov complexity and algorithmic probability focuses on monotone Turing machines with oneway writeonly output tape. This naturally leads to the universal enumerable SolomonoLevin measure. Here we introduce more general, nonenumerable but cumulatively enumerable m ..."
Abstract

Cited by 38 (20 self)
 Add to MetaCart
The traditional theory of Kolmogorov complexity and algorithmic probability focuses on monotone Turing machines with oneway writeonly output tape. This naturally leads to the universal enumerable SolomonoLevin measure. Here we introduce more general, nonenumerable but cumulatively enumerable measures (CEMs) derived from Turing machines with lexicographically nondecreasing output and random input, and even more general approximable measures and distributions computable in the limit. We obtain a natural hierarchy of generalizations of algorithmic probability and Kolmogorov complexity, suggesting that the "true" information content of some (possibly in nite) bitstring x is the size of the shortest nonhalting program that converges to x and nothing but x on a Turing machine that can edit its previous outputs. Among other things we show that there are objects computable in the limit yet more random than Chaitin's "number of wisdom" Omega, that any approximable measure of x is small for any x lacking a short description, that there is no universal approximable distribution, that there is a universal CEM, and that any nonenumerable CEM of x is small for any x lacking a short enumerating program. We briey mention consequences for universes sampled from such priors.
A computer scientist’s view of life, the universe, and everything
 Foundations of Computer Science: Potential  Theory  Cognition
, 1997
"... Is the universe computable? If so, it may be much cheaper in terms of information requirements to compute all computable universes instead of just ours. I apply basic concepts of Kolmogorov complexity theory to the set of possible universes, and chat about perceived and true randomness, life, genera ..."
Abstract

Cited by 38 (15 self)
 Add to MetaCart
Is the universe computable? If so, it may be much cheaper in terms of information requirements to compute all computable universes instead of just ours. I apply basic concepts of Kolmogorov complexity theory to the set of possible universes, and chat about perceived and true randomness, life, generalization, and learning in a given universe. Preliminaries Assumptions. A long time ago, the Great Programmer wrote a program that runs all possible universes on His Big Computer. “Possible ” means “computable”: (1) Each universe evolves on a discrete time scale. (2) Any universe’s state at a given time is describable by a finite number of bits. One of the many universes is ours, despite some who evolved in it and claim it is incomputable. Computable universes. Let TM denote an arbitrary universal Turing machine with unidirectional output tape. TM’s input and output symbols are “0”, “1”, and “, ” (comma). TM’s possible input programs can be ordered
On Learning How to Learn Learning Strategies
, 1995
"... This paper introduces the "incremental selfimprovement paradigm". Unlike previous methods, incremental selfimprovement encourages a reinforcement learning system to improve the way it learns, and to improve the way it improves the way it learns ..., without significant theoretical limitations  ..."
Abstract

Cited by 36 (15 self)
 Add to MetaCart
This paper introduces the "incremental selfimprovement paradigm". Unlike previous methods, incremental selfimprovement encourages a reinforcement learning system to improve the way it learns, and to improve the way it improves the way it learns ..., without significant theoretical limitations  the system is able to "shift its inductive bias" in a universal way. Its major features are: (1) There is no explicit difference between "learning", "metalearning", and other kinds of information processing. Using a Turing machine equivalent programming language, the system itself occasionally executes selfdelimiting, initially highly random "selfmodification programs" which modify the contextdependent probabilities of future action sequences (including future selfmodification programs). (2) The system keeps only those probability modifications computed by "useful" selfmodification programs: those which bring about more payoff (reward, reinforcement) per time than all previous selfmodi...
Formal Theory of Creativity, Fun, and Intrinsic Motivation (19902010)
"... The simple but general formal theory of fun & intrinsic motivation & creativity (1990) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditional fiel ..."
Abstract

Cited by 34 (14 self)
 Add to MetaCart
The simple but general formal theory of fun & intrinsic motivation & creativity (1990) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditional field of active learning, and is related to old but less formal ideas in aesthetics theory and developmental psychology. It has been argued that the theory explains many essential aspects of intelligence including autonomous development, science, art, music, humor. This overview first describes theoretically optimal (but not necessarily practical) ways of implementing the basic computational principles on exploratory, intrinsically motivated agents or robots, encouraging them to provoke event sequences exhibiting previously unknown but learnable algorithmic regularities. Emphasis is put on the importance of limited computational resources for online prediction and compression. Discrete and continuous time formulations are given. Previous practical but nonoptimal implementations (1991, 1995, 19972002) are reviewed, as well as several recent variants by others (2005). A simplified typology addresses current confusion concerning the precise nature of intrinsic motivation.
Flat Minima
, 1997
"... this paper (available on the WorldWide Web; see our home pages) contains pseudocode of an efficient implementation. It is based on fast multiplication of the Hessian and a vector due to Pearlmutter (1994) and Mller (1993). Acknowledgments ..."
Abstract

Cited by 32 (14 self)
 Add to MetaCart
this paper (available on the WorldWide Web; see our home pages) contains pseudocode of an efficient implementation. It is based on fast multiplication of the Hessian and a vector due to Pearlmutter (1994) and Mller (1993). Acknowledgments
Algorithmic Theories Of Everything
, 2000
"... The probability distribution P from which the history of our universe is sampled represents a theory of everything or TOE. We assume P is formally describable. Since most (uncountably many) distributions are not, this imposes a strong inductive bias. We show that P(x) is small for any universe x lac ..."
Abstract

Cited by 31 (15 self)
 Add to MetaCart
The probability distribution P from which the history of our universe is sampled represents a theory of everything or TOE. We assume P is formally describable. Since most (uncountably many) distributions are not, this imposes a strong inductive bias. We show that P(x) is small for any universe x lacking a short description, and study the spectrum of TOEs spanned by two Ps, one reflecting the most compact constructive descriptions, the other the fastest way of computing everything. The former derives from generalizations of traditional computability, Solomonoff’s algorithmic probability, Kolmogorov complexity, and objects more random than Chaitin’s Omega, the latter from Levin’s universal search and a natural resourceoriented postulate: the cumulative prior probability of all x incomputable within time t by this optimal algorithm should be 1/t. Between both Ps we find a universal cumulatively enumerable measure that dominates traditional enumerable measures; any such CEM must assign low probability to any universe lacking a short enumerating program. We derive Pspecific consequences for evolving observers, inductive reasoning, quantum physics, philosophy, and the expected duration of our universe.
Learning Unambiguous Reduced Sequence Descriptions
 Advances in Neural Information Processing Systems 4
, 1992
"... You want your neural net algorithm to learn sequences? Do not just use conventional gradient descent (or approximations thereof) in recurrent nets, timedelay nets etc. Instead, use your sequence learning algorithm to implement the following method: No matter what your final goals are, train a netwo ..."
Abstract

Cited by 27 (7 self)
 Add to MetaCart
You want your neural net algorithm to learn sequences? Do not just use conventional gradient descent (or approximations thereof) in recurrent nets, timedelay nets etc. Instead, use your sequence learning algorithm to implement the following method: No matter what your final goals are, train a network to predict its next input from the previous ones. Since only unpredictable inputs convey new information, ignore all predictable inputs but let all unexpected inputs (plus information about the time step at which they occurred) become inputs to a higherlevel network of the same kind (working on a slower, selfadjusting time scale). Go on building a hierarchy of such networks. This principle reduces the descriptions of event sequences without loss of information, thus easing supervised or reinforcement learning tasks. Experiments show that systems based on this principle can require less computation per time step and many fewer training sequences than conventional training algorithms for ...