Results 1  10
of
110
A Theory of Program Size Formally Identical to Information Theory
, 1975
"... A new definition of programsize complexity is made. H(A;B=C;D) is defined to be the size in bits of the shortest selfdelimiting program for calculating strings A and B if one is given a minimalsize selfdelimiting program for calculating strings C and D. This differs from previous definitions: (1) ..."
Abstract

Cited by 333 (16 self)
 Add to MetaCart
A new definition of programsize complexity is made. H(A;B=C;D) is defined to be the size in bits of the shortest selfdelimiting program for calculating strings A and B if one is given a minimalsize selfdelimiting program for calculating strings C and D. This differs from previous definitions: (1) programs are required to be selfdelimiting, i.e. no program is a prefix of another, and (2) instead of being given C and D directly, one is given a program for calculating them that is minimal in size. Unlike previous definitions, this one has precisely the formal 2 G. J. Chaitin properties of the entropy concept of information theory. For example, H(A;B) = H(A) + H(B=A) + O(1). Also, if a program of length k is assigned measure 2 \Gammak , then H(A) = \Gamma log 2 (the probability that the standard universal computer will calculate A) +O(1). Key Words and Phrases: computational complexity, entropy, information theory, instantaneous code, Kraft inequality, minimal program, probab...
A Decomposition of MultiDimensional Point Sets with Applications to kNearestNeighbors and nBody Potential Fields
 J. ACM
, 1992
"... We define the notion of a wellseparated pair decomposition of points in ddimensional space. We then develop efficient sequential and parallel algorithms for computing such a decomposition. We apply the resulting decomposition to the efficient computation of knearest neighbors and nbody potential ..."
Abstract

Cited by 244 (4 self)
 Add to MetaCart
We define the notion of a wellseparated pair decomposition of points in ddimensional space. We then develop efficient sequential and parallel algorithms for computing such a decomposition. We apply the resulting decomposition to the efficient computation of knearest neighbors and nbody potential fields.
A Maximum Entropy Approach to Adaptive Statistical Language Modeling
 Computer, Speech and Language
, 1996
"... An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's histor ..."
Abstract

Cited by 242 (11 self)
 Add to MetaCart
An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's history, we propose and use trigger pairs as the basic information bearing elements. This allows the model to adapt its expectations to the topic of discourse. Next, statistical evidence from multiple sources must be combined. Traditionally, linear interpolation and its variants have been used, but these are shown here to be seriously deficient. Instead, we apply the principle of Maximum Entropy (ME). Each information source gives rise to a set of constraints, to be imposed on the combined estimate. The intersection of these constraints is the set of probability functions which are consistent with all the information sources. The function with the highest entropy within that set is the ME solution...
Arithmetic coding
 IBM J. Res. Develop
, 1979
"... Arithmetic coding is a data compression technique that encodes data (the data string) by creating a code string which represents a fractional value on the number line between 0 and 1. The coding algorithm is symbolwise recursive; i.e., it operates upon and encodes (decodes) one data symbol per itera ..."
Abstract

Cited by 195 (0 self)
 Add to MetaCart
Arithmetic coding is a data compression technique that encodes data (the data string) by creating a code string which represents a fractional value on the number line between 0 and 1. The coding algorithm is symbolwise recursive; i.e., it operates upon and encodes (decodes) one data symbol per iteration or recursion. On each recursion, the algorithm successively partitions an interval
On the Marginal Utility of Network Topology Measurements
, 2001
"... The cost and complexity of deploying measurement infrastructure in the Internet for the purpose of analyzing its structure and behavior is considerable. Basic questions about the utility of increasing the number of measurements and measurement sites have not yet been addressed which has led to a "mo ..."
Abstract

Cited by 100 (11 self)
 Add to MetaCart
The cost and complexity of deploying measurement infrastructure in the Internet for the purpose of analyzing its structure and behavior is considerable. Basic questions about the utility of increasing the number of measurements and measurement sites have not yet been addressed which has led to a "more is better" approach to widearea measurement studies. In this paper, we step toward a more quantifiable understanding of the marginal utility of performing widearea measurements in the context of Internet topology discovery. We characterize the observable topology in terms of nodes, links, node degree distribution, and distribution of endtoend flows using statistical and informationtheoretic techniques. We classify nodes discovered on the routes between a set of 8 sources and 1277 destinations to differentiate nodes which make up the so called "backbone" from those which border the backbone and those on links between the border nodes and destination nodes. This process includes reducing nodes that advertise multiple interfaces to single IP addresses. We show that the utility of adding sources beyond the second source quickly diminishes from the perspective of interface, node, link and node degree discovery. We also show that the utility of adding destinations is constant for interfaces, nodes, links and node degree indicating that it is more important to add destinations than sources.
Data Compression
 ACM Computing Surveys
, 1987
"... This paper surveys a variety of data compression methods spanning almost forty years of research, from the work of Shannon, Fano and Huffman in the late 40's to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effectiv ..."
Abstract

Cited by 87 (3 self)
 Add to MetaCart
This paper surveys a variety of data compression methods spanning almost forty years of research, from the work of Shannon, Fano and Huffman in the late 40's to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effective data density. Data compression has important application in the areas of file storage and distributed systems. Concepts from information theory, as they relate to the goals and evaluation of data compression methods, are discussed briefly. A framework for evaluation and comparison of methods is constructed and applied to the algorithms presented. Comparisons of both theoretical and empirical natures are reported and possibilities for future research are suggested. INTRODUCTION Data compression is often referred to as coding, where coding is a very general term encompassing any special representation of data which satisfies a given need. Information theory is defined to be the study of eff...
Using Web Structure for Classifying and Describing Web Pages
, 2002
"... The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link to the target document) for search. We analyze the relative utility of document text, and the text i ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link to the target document) for search. We analyze the relative utility of document text, and the text in citing documents near the citation, for classification and description. Results show that the text in citing documents, when available, often has greater discriminative and descriptive power than the text in the target document itself. The combination of evidence from a document and citing documents can improve on either information source alone. Moreover, by ranking words and phrases in the citing documents according to expected entropy loss, we are able to accurately name clusters of web pages, even with very few positive examples. Our results confirm, quantify, and extend previous research using web sn'ucture in these areas, introducing new methods for classification and description of pages.
The Context Tree Weighting Method: Basic Properties
 IEEE Transactions on Information Theory
, 1995
"... We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture". Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding ..."
Abstract

Cited by 79 (1 self)
 Add to MetaCart
We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture". Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding distribution for tree sources with an unknown model and unknown parameters. Computational and storage complexity of the proposed procedure are both linear in the source sequence length. We derive a natural upper bound on the cumulative redundancy of our method for individual sequences. The three terms in this bound can be identified as coding, parameter and model redundancy. The bound holds for all source sequence lengths, not only for asymptotically large lengths. The analysis that leads to this bound is based on standard techniques and turns out to be extremely simple. Our upper bound on the redundancy shows that the proposed context tree weighting procedure is optimal in the sense that i...
Improving Category Specific Web Search by Learning Query Modifications
 In Symposium on Applications and the Internet
, 2001
"... A user searching for documents' within a specific category using a general purpose search engine might have a difficult time finding valuable documents '. To improve category specific search, we show that a trained classifier can recognize pages of a specified category with high precision by using t ..."
Abstract

Cited by 70 (8 self)
 Add to MetaCart
A user searching for documents' within a specific category using a general purpose search engine might have a difficult time finding valuable documents '. To improve category specific search, we show that a trained classifier can recognize pages of a specified category with high precision by using tex tual content, text location, and HTML structure. We show that query modifications to web search engines increase the probability that the documents' returned are of the specific category.
A New Method of Ngram Statistics for Large Number of n and Automatic Extraction of Words and Phrases from Large Text Data of Japanese
 In COLING94
, 1994
"... In the process of establishing the information theory, C. E. Shannon proposed the Markov process as a good model to characterize a natural language. The core of this idea is to calculate the frequencies of strings composed of n characters (ngrams), but this statistical analysis of large text data a ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
In the process of establishing the information theory, C. E. Shannon proposed the Markov process as a good model to characterize a natural language. The core of this idea is to calculate the frequencies of strings composed of n characters (ngrams), but this statistical analysis of large text data and for a large n has never been carried out because of the memory limitation of computer and the shortage of text data. Taking advantage of the recent powerful computers we developed a new algorithm of ngrams of large text data for arbitrary large n and calculated successfully, within relatively short time, ngrams of some Japanese text data containing between two and thirty million characters. From this experiment it became clear that the automatic extraction or determination of words, compound words and collocations is possible by mutually comparing ngram statistics for different values of n.