Results 1  10
of
12
Algorithmic Statistics
 IEEE Transactions on Information Theory
, 2001
"... While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or ..."
Abstract

Cited by 52 (14 self)
 Add to MetaCart
While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or probability distribution) where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to classical statistical theory that deals with relations between probabilistic ensembles. We develop the algorithmic theory of statistic, sufficient statistic, and minimal sufficient statistic. This theory is based on twopart codes consisting of the code for the statistic (the model summarizing the regularity, the meaningful information, in the data) and the modeltodata code. In contrast to the situation in probabilistic statistical theory, the algorithmic relation of (minimal) sufficiency is an absolute relation between the individual model and the individual data sample. We distinguish implicit and explicit descriptions of the models. We give characterizations of algorithmic (Kolmogorov) minimal sufficient statistic for all data samples for both description modes in the explicit mode under some constraints. We also strengthen and elaborate earlier results on the "Kolmogorov structure function" and "absolutely nonstochastic objects" those rare objects for which the simplest models that summarize their relevant information (minimal sucient statistics) are at least as complex as the objects themselves. We demonstrate a close relation between the probabilistic notions and the algorithmic ones: (i) in both cases there is an "information nonincrease" law; (ii) it is shown that a function is a...
Kolmogorov’s structure functions and model selection
 IEEE Trans. Inform. Theory
"... approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The “structure function ” of the given data expresses the relation between the complexity l ..."
Abstract

Cited by 32 (14 self)
 Add to MetaCart
approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The “structure function ” of the given data expresses the relation between the complexity level constraint on a model class and the least logcardinality of a model in the class containing the data. We show that the structure function determines all stochastic properties of the data: for every constrained model class it determines the individual bestfitting model in the class irrespective of whether the “true ” model is in the model class considered or not. In this setting, this happens with certainty, rather than with high probability as is in the classical case. We precisely quantify the goodnessoffit of an individual model with respect to individual data. We show that—within the obvious constraints—every graph is realized by the structure function of some data. We determine the (un)computability properties of the various functions contemplated and of the “algorithmic minimal sufficient statistic.” Index Terms— constrained minimum description length (ML) constrained maximum likelihood (MDL) constrained bestfit model selection computability lossy compression minimal sufficient statistic nonprobabilistic statistics Kolmogorov complexity, Kolmogorov Structure function prediction sufficient statistic
The Generalized Universal Law of Generalization
 Journal of Mathematical Psychology
, 2001
"... It has been argued by Shepard that there is a robust psychological law that relates the distance between a pair of items in psychological space and the probability that they will be confused with each other. Specifically, the probability of confusion is a negative exponential function of the dista ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
It has been argued by Shepard that there is a robust psychological law that relates the distance between a pair of items in psychological space and the probability that they will be confused with each other. Specifically, the probability of confusion is a negative exponential function of the distance between the pair of items. In experimental contexts, distance is typically defined in terms of a multidimensional Euclidean spacebut this assumption seems unlikely to hold for complex stimuli. We show that, nonetheless, the Universal Law of Generalization can be derived in the more complex setting of arbitrary stimuli, using a much more universal measure of distance. This universal distance is defined as the length of the shortest program that transforms the representations of the two items of interest into one another: the algorithmic information distance. It is universal in the sense that it minorizes every computable distance: it is the smallest computable distance. We show ...
Predicting and Controlling Resource Usage in a Heterogeneous Active Network
 In Proceedings of the Third International Workshop on Active Middleware Services
, 2001
"... Active network technology envisions deployment of virtual execution environments within network elements, such as switches and routers. As a result, inhomogeneous processing can be applied to network traffic. To use such technology safely and efficiently, individual nodes must provide mechanisms to ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Active network technology envisions deployment of virtual execution environments within network elements, such as switches and routers. As a result, inhomogeneous processing can be applied to network traffic. To use such technology safely and efficiently, individual nodes must provide mechanisms to enforce resource limits. This implies that each node must understand the varying resource requirements for specific network traffic. This paper presents an approach to model the CPU time requirements of active applications in a form that can be interpreted among heterogeneous nodes. Further, the paper demonstrates how this approach can be used successfully to control resources consumed at an activenetwork node and to predict load among nodes in an active network, when integrated within the Active Virtual Network Management Prediction system. 1.
Minimum description length principle: Generators are preferable to closed patterns
 in Proceedigns of AAAI Conference on Artificial Intelligence
, 2006
"... The generators and the unique closed pattern of an equivalence class of itemsets share a common set of transactions. The generators are the minimal ones among the equivalent itemsets, while the closed pattern is the maximum one. As a generator is usually smaller than the closed pattern in cardinalit ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
The generators and the unique closed pattern of an equivalence class of itemsets share a common set of transactions. The generators are the minimal ones among the equivalent itemsets, while the closed pattern is the maximum one. As a generator is usually smaller than the closed pattern in cardinality, by the Minimum Description Length Principle, the generator is preferable to the closed pattern in inductive inference and classification. To efficiently discover frequent generators from a large dataset, we develop a depthfirst algorithm called Grgrowth. The idea is novel in contrast to traditional breadthfirst bottomup generatormining algorithms. Our extensive performance study shows that Grgrowth is significantly faster (an order or even two orders of magnitudes when the support thresholds are low) than the existing generator mining algorithms. It can be also faster than the stateoftheart frequent closed itemset mining algorithms such as FPclose and CLOSET+.
Kolmogorov’s structure functions with an application to the foundations of model selection
 In Proc. 43rd Symposium on Foundations of Computer Science
, 2002
"... We vindicate, for the first time, the rightness of the original “structure function”, proposed by Kolmogorov in 1974, by showing that minimizing a twopart code consisting of a model subject to (Kolmogorov) complexity constraints, together with a datatomodel code, produces a model of best fit (for ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We vindicate, for the first time, the rightness of the original “structure function”, proposed by Kolmogorov in 1974, by showing that minimizing a twopart code consisting of a model subject to (Kolmogorov) complexity constraints, together with a datatomodel code, produces a model of best fit (for which the data is maximally “typical”). The method thus separates all possible model information from the remaining accidental information. This result gives a foundation for MDL, and related methods, in model selection. Settlement of this longstanding question is the more remarkable since the minimal randomness deficiency function (measuring maximal “typicality”) itself cannot be monotonically approximated, but the shortest twopart code can. We furthermore show that both the structure function and the minimum randomness deficiency function can assume all shapes over their full domain (improving an independent unpublished result of Levin on the former function of the early 70s, and extending a partial result of V’yugin on the latter function of the late 80s and also recent results on prediction loss measured by “snooping curves”). We give an explicit realization of optimal twopart codes at all levels of model complexity. We determine the (un)computability properties of the various functions and “algorithmic sufficient statistic ” considered. In our setting the models are finite sets, but the analysis is valid, up to logarithmic additive terms, for the model class of computable probability density functions, or the model class of total recursive functions. 1
Temporal Scale of Processes in Dynamic Networks
"... Abstract—Temporal streams of interactions are commonly aggregated into dynamic networks for temporal analysis. Results of this analysis are greatly affected by the resolution at which the original data are aggregated. The mismatch between the inherent temporal scale of the underlying process and tha ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract—Temporal streams of interactions are commonly aggregated into dynamic networks for temporal analysis. Results of this analysis are greatly affected by the resolution at which the original data are aggregated. The mismatch between the inherent temporal scale of the underlying process and that at which the analysis is performed can obscure important insights and lead to wrong conclusions. To this day, there is no established framework for choosing the appropriate scale for temporal analysis of streams of interactions. Our paper offers the first step towards the formalization of this problem. We show that for a general class of interaction streams it is possible to identify, in a principled way, the inherent temporal scale of the underlying dynamic processes. Moreover, we state important properties of these processes that can be used to develop an algorithm to identify this scale. Additionally, these properties can be used to separate interaction streams for which no level of aggregation is meaningful versus those that have a natural level of aggregation.
Abstract Hierarchy in Cognitive Maps and the Simplicity Principle
, 2002
"... The aim of this paper is to relate the research on cognitive maps, mental representations of the external world, which are thought to be organized hierarchically, with the simplicity principle, a formal version of Occam’s Razor inductive bias, which states that short hypotheses are to be preferred o ..."
Abstract
 Add to MetaCart
The aim of this paper is to relate the research on cognitive maps, mental representations of the external world, which are thought to be organized hierarchically, with the simplicity principle, a formal version of Occam’s Razor inductive bias, which states that short hypotheses are to be preferred over longer ones. First, I will describe some important properties of hierarchical structures and define what is meant by simplicity. Next, I will review some results supporting the view that cognitive maps are structured hierarchically, not following Euclidean metrics, and that some cognitive functions seem to prefer simpler representations over complex ones. This hints at the possibility that the hierarchical organization of cognitive maps may also follow the simplicity principle. Finally, I will suggest an experiment in which such hypothesis could be tested and discuss the implications and limitations of integrating both approaches. 1 What are hierarchies and why are they important? Hierarchies are ordered structures of objects arranged over several distinct levels. Individuals at the same level share common properties and are related to each other through a partial ordering relation. Ob
Towards an Algorithmic Statistics (Extended Abstract)
"... ) Peter G'acs ? , John Tromp, and Paul Vit'anyi ?? Abstract. While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model su ..."
Abstract
 Add to MetaCart
) Peter G'acs ? , John Tromp, and Paul Vit'anyi ?? Abstract. While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to ordinary statistical theory that deals with relations between probabilistic ensembles. We develop a new algorithmic theory of typical statistic, sufficient statistic, and minimal sufficient statistic. 1 Introduction We take statistical theory to ideally consider the following problem: Given a data sample and a family of models (hypotheses) one wants to select the model that produced the data. But a priori it is possible that the data is atypical for the...
An Active ModelBased Prototype for Predictive Network Management
, 2005
"... If current trends continue, the next generation of enterprise networks is likely to become a more complex mixture of hardware, communication media, architectures, protocols, and standards. One approach toward reducing the management burden caused by growing complexity is to integrate management supp ..."
Abstract
 Add to MetaCart
If current trends continue, the next generation of enterprise networks is likely to become a more complex mixture of hardware, communication media, architectures, protocols, and standards. One approach toward reducing the management burden caused by growing complexity is to integrate management support into the inherent function of network operation. In this paper, management support is provided in the form of network components that, simultaneously with their network function, collaboratively project and adjust projections of future state based upon actual network state. It is well known that more accurate predictions over a longer time horizon enables better control decisions. This paper focuses upon improving prediction; the many potential uses of predictive capabilities for predictive network control will be addressed in future work.