Results 1  10
of
72
The Google similarity distance
, 2005
"... Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of ‘society ’ is ‘database, ’ and the equivalent of ‘use ’ is ‘way to search the database. ’ We present a new theory of similarity between ..."
Abstract

Cited by 314 (9 self)
 Add to MetaCart
Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of ‘society ’ is ‘database, ’ and the equivalent of ‘use ’ is ‘way to search the database. ’ We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the worldwideweb as database, and Google as search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the worldwideweb using Google page counts. The worldwideweb is the largest database on earth, and the context information entered by millions of independent users averages out to provide
An efficient, probabilistically sound algorithm for segmentation and word discovery
 MACHINE LEARNING
, 1999
"... This paper presents a modelbased, unsupervised algorithm for recovering word boundaries in a naturallanguage text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstract ..."
Abstract

Cited by 197 (2 self)
 Add to MetaCart
This paper presents a modelbased, unsupervised algorithm for recovering word boundaries in a naturallanguage text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstractly so that the detailed component models of phonology, wordorder, and word frequency can be replaced in a modular fashion. The model yields a languageindependent, prior probability distribution on all possible sequences of all possible words over a given alphabet, based on the assumption that the input was generated by concatenating words from a fixed but unknown lexicon. The model is unusual in that it treats the generation of a complete corpus, regardless of length, as a single event in the probability space. Accordingly, the algorithm does not estimate a probability distribution on words; instead, it attempts to calculate the prior probabilities of various word sequences that could underlie the observed text. Experiments on phonemic transcripts of spontaneous speech by parents to young children suggest that our algorithm is more effective than other proposed algorithms, at least when utterance boundaries are given and the text includes a substantial number of short utterances.
The Speed Prior: A New Simplicity Measure Yielding NearOptimal Computable Predictions
 Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Lecture Notes in Artificial Intelligence
, 2002
"... Solomonoff's optimal but noncomputable method for inductive inference assumes that observation sequences x are drawn from an recursive prior distribution p(x). Instead of using the unknown p() he predicts using the celebrated universal enumerable prior M() which for all exceeds any recursiv ..."
Abstract

Cited by 63 (21 self)
 Add to MetaCart
(Show Context)
Solomonoff's optimal but noncomputable method for inductive inference assumes that observation sequences x are drawn from an recursive prior distribution p(x). Instead of using the unknown p() he predicts using the celebrated universal enumerable prior M() which for all exceeds any recursive p(), save for a constant factor independent of x. The simplicity measure M() naturally implements "Occam's razor " and is closely related to the Kolmogorov complexity of . However, M assigns high probability to certain data that are extremely hard to compute. This does not match our intuitive notion of simplicity. Here we suggest a more plausible measure derived from the fastest way of computing data. In absence of contrarian evidence, we assume that the physical world is generated by a computational process, and that any possibly infinite sequence of observations is therefore computable in the limit (this assumption is more radical and stronger than Solomonoff's).
Fifty Years of Shannon Theory
, 1998
"... A brief chronicle is given of the historical development of the central problems in the theory of fundamental limits of data compression and reliable communication. ..."
Abstract

Cited by 49 (1 self)
 Add to MetaCart
A brief chronicle is given of the historical development of the central problems in the theory of fundamental limits of data compression and reliable communication.
The Fastest And Shortest Algorithm For All WellDefined Problems
, 2002
"... An algorithm M is described that solves any welldefined problem p as quickly as the fastest algorithm computing a solution to p, save for a factor of 5 and loworder additive terms. M optimally distributes resources between the execution of provably correct psolving programs and an enumeration of ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
(Show Context)
An algorithm M is described that solves any welldefined problem p as quickly as the fastest algorithm computing a solution to p, save for a factor of 5 and loworder additive terms. M optimally distributes resources between the execution of provably correct psolving programs and an enumeration of all proofs, including relevant proofs of program correctness and of time bounds on program runtimes. M avoids Blum's speedup theorem by ignoring programs without correctness proof. M has broader applicability and can be faster than Levin's universal search, the fastest method for inverting functions save for a large multiplicative constant. An extension of Kolmogorov complexity and two novel natural measures of function complexity are used to show that the most efficient program computing some function f is also among the shortest programs provably computing f.
P.: Automatic meaning discovery using google
 Centrum Wiskunde & Informatica (CWI
, 2004
"... We have found a method to automatically extract the meaning of words and phrases from the worldwideweb using Google page counts. The approach is novel in its unrestricted problem domain, simplicity of implementation, and manifestly ontological underpinnings. The worldwideweb is the largest datab ..."
Abstract

Cited by 40 (3 self)
 Add to MetaCart
(Show Context)
We have found a method to automatically extract the meaning of words and phrases from the worldwideweb using Google page counts. The approach is novel in its unrestricted problem domain, simplicity of implementation, and manifestly ontological underpinnings. The worldwideweb is the largest database on earth, and the latent semantic context information entered by millions of independent users averages out to provide automatic meaning of useful quality. We demonstrate positive correlations, evidencing an underlying semantic structure, in both numerical symbol notations and numbername words in a variety of natural languages and contexts. Next, we demonstrate the ability to distinguish between colors and numbers, and to distinguish between 17th century Dutch painters; the ability to understand electrical terms, religious terms, emergency incidents, and we conduct a massive experiment in understanding WordNet categories; the ability to do a simple automatic EnglishSpanish translation. 1
Algorithmic Theories Of Everything
, 2000
"... The probability distribution P from which the history of our universe is sampled represents a theory of everything or TOE. We assume P is formally describable. Since most (uncountably many) distributions are not, this imposes a strong inductive bias. We show that P(x) is small for any universe x lac ..."
Abstract

Cited by 36 (15 self)
 Add to MetaCart
(Show Context)
The probability distribution P from which the history of our universe is sampled represents a theory of everything or TOE. We assume P is formally describable. Since most (uncountably many) distributions are not, this imposes a strong inductive bias. We show that P(x) is small for any universe x lacking a short description, and study the spectrum of TOEs spanned by two Ps, one reflecting the most compact constructive descriptions, the other the fastest way of computing everything. The former derives from generalizations of traditional computability, Solomonoff’s algorithmic probability, Kolmogorov complexity, and objects more random than Chaitin’s Omega, the latter from Levin’s universal search and a natural resourceoriented postulate: the cumulative prior probability of all x incomputable within time t by this optimal algorithm should be 1/t. Between both Ps we find a universal cumulatively enumerable measure that dominates traditional enumerable measures; any such CEM must assign low probability to any universe lacking a short enumerating program. We derive Pspecific consequences for evolving observers, inductive reasoning, quantum physics, philosophy, and the expected duration of our universe.
The miraculous universal distribution
 Mathematical Intelligencer
, 1997
"... scientific hypothesis formulated? How does one choose one hypothesis over another? It may be surprising that questions such as these are still discussed. Even more surprising, perhaps, is the fact that the discussion is still moving forward, that new ideas are still being added to the debate. Certai ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
(Show Context)
scientific hypothesis formulated? How does one choose one hypothesis over another? It may be surprising that questions such as these are still discussed. Even more surprising, perhaps, is the fact that the discussion is still moving forward, that new ideas are still being added to the debate. Certainly most surprising of all is the fact that, over the last thirty years or so, the normally concrete field of computer science has provided fundamental new insights. Scientists engage in what is usually called inductive reasoning. Inductive reasoning entails making predictions about future behavior based on past observations. But defining the proper method of formulating such predictions has occupied philosophers throughout the ages. In fact, the British philosopher David Hume (1711 – 1776) has argued convincingly that in some sense proper induction is impossible, [3]. It is impossible because we can only reach conclusions by using known data and methods. Therefore, the conclusion is logically already contained in the start configuration. Consequently, the only form of induction possible is deduction. Philosophers have tried to find
A Highly Random Number
, 2001
"... In his celebrated 1936 paper Turing defined a machine to be circular iff it performs an infinite computation outputting only finitely many symbols. We define ( as the probability that an arbitrary machine be circular and we prove that is a random number that goes beyond $2, the probability that ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
In his celebrated 1936 paper Turing defined a machine to be circular iff it performs an infinite computation outputting only finitely many symbols. We define ( as the probability that an arbitrary machine be circular and we prove that is a random number that goes beyond $2, the probability that a universal self alelimiting machine halts. The algorithmic complexity of c is strictly greater than that of $2, but similar to the algorithmic complexity of 2 , the halting probability of an oracle machine. What makes ( interesting is that it is an example of a highly random number definable without considering oracles.