Results 1 -
7 of
7
The Similarity Metric
- IEEE TRANSACTIONS ON INFORMATION THEORY
, 2003
"... A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new "normalized information distance", based on the noncomputable notion of Kolmogorov complexity, and show that it is in this class and it min ..."
Abstract
-
Cited by 137 (15 self)
- Add to MetaCart
A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new "normalized information distance", based on the noncomputable notion of Kolmogorov complexity, and show that it is in this class and it minorizes every computable distance in the class (that is, it is universal in that it discovers all computable similarities). We demonstrate that it is a metric and call it the similarity metric. This theory forms the foundation for a new practical tool. To evidence generality and robustness we give two distinctive applications in widely divergent areas using standard compression programs like gzip and GenCompress. First, we compare whole mitochondrial genomes and infer their evolutionary history. This results in a first completely automatic computed whole mitochondrial phylogeny tree. Secondly, we fully automatically compute the language tree of 52 different languages.
Dualities Between Entropy Functions and Network Codes
, 2008
"... Characterization of the set of entropy functions Γ ∗ is an important open problem in information theory. The region Γ ∗ is central to the theory of information inequalities, and as such could be regarded as a key to the basic laws of information theory. Characterization of Γ ∗ has several important ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Characterization of the set of entropy functions Γ ∗ is an important open problem in information theory. The region Γ ∗ is central to the theory of information inequalities, and as such could be regarded as a key to the basic laws of information theory. Characterization of Γ ∗ has several important conse-quences. In probability theory, it would provide a solution for the implication problem of conditional independence. In communications networks, the capacity region of multi-source network coding is given in terms of Γ ∗. More broadly, determination of Γ ∗ would have an impact on converse theorems for multi-terminal problems in information theory. This paper provides several new dualities between entropy functions and network codes. Given a function g ≥ 0 defined on all proper subsets of N random variables, we provide a construction for a network multicast problem which is ”solvable ” if and only if g is the entropy function of a set of quasi-uniform random variables. The underlying network topology is fixed and the multicast problem depends on g only through link capacities and source rates. A corresponding duality is developed for linear networks codes, where the constructed multicast problem is linearly solvable if and only if g is linear group characterizable. Relaxing the requirement that the domain of g be subsets of random variables, we obtain a similar duality between polymatroids and the linear programming bound. These duality results provide an alternative proof of the insufficiency of linear (and abelian) network codes, and demonstrate the utility of non-Shannon inequalities to tighten outer bounds on network coding capacity regions.
A new class of non-Shannon-type inequalities for entropies
- Communications in Information and Systems
, 2002
"... Abstract. In this paper we prove a countable set of non-Shannon-type linear information inequalities for entropies of discrete random variables, i.e., information inequalities which cannot be reduced to the “basic ” inequality I(X: Y |Z) ≥ 0. Our results generalize the inequalities of Z. Zhang and ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. In this paper we prove a countable set of non-Shannon-type linear information inequalities for entropies of discrete random variables, i.e., information inequalities which cannot be reduced to the “basic ” inequality I(X: Y |Z) ≥ 0. Our results generalize the inequalities of Z. Zhang and R. Yeung (1998) who found the first examples of non-Shannon-type information inequalities. 1. Introduction. A central notion of information theory is Shannon’s entropy 1. Given a set of jointly distributed random variables x1,..., xn, we can consider entropies of all random variables H(xi), entropies of all pairs H(xi, xj), etc. (2 n − 1 entropy values for all nonempty subsets of {x1,..., xn}). For every n-tuple of random variables we get a point in R 2n −1, representing entropies of the given distribution.
On the combinatorial representation of information
- The Twelfth Annual International Computing and Combinatorics Conference (COCOON’06), volume LNCS 4112
, 2006
"... Abstract. Kolmogorov introduced a combinatorial measure of the information I(x: y) about the unknown value of a variable y conveyed by an input variable x taking a given value x. The paper extends this definition of information to a more general setting where ‘x = x ’ may provide a vaguer descriptio ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. Kolmogorov introduced a combinatorial measure of the information I(x: y) about the unknown value of a variable y conveyed by an input variable x taking a given value x. The paper extends this definition of information to a more general setting where ‘x = x ’ may provide a vaguer description of the possible value of y. As an application, the space P({0, 1} n) of classes of binary functions f: [n] → {0, 1}, [n] = {1,..., n}, is considered where y represents an unknown function t ∈ {0, 1} [n] and as input, two extreme cases are considered: x = xM d and x = x M ′ d which indicate that t is an element of a set G ⊆ {0, 1} n that satisfies a property Md or M ′ d respectively. Property Md (or M ′ d) means that there exists an E ⊆ [n], |E | = d, such that |trE(G) | = 1 (or 2 d) where trE(G) denotes the trace of G on E. Estimates of the information value I(xM d: t) and I(x M ′ d: t) are obtained. When d is fixed, it is shown that I(xM d: t) ≈ d and I(x M ′ d: t) ≈ 1 as n → ∞. Key words: Information theory, combinatorial complexity, VC-dimension 1
Similarity Distance and Phylogeny
, 2002
"... A new class of similarity measures appropriate for measuring relations between sequences is studied. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A new class of similarity measures appropriate for measuring relations between sequences is studied.
Partitioning multi-dimensional sets in a small number of “uniform ” parts
"... Our main result implies the following easily formulated statement. The set of edges E of every finite bipartite graph can be split into poly(log |E|) subsets so that all the resulting bipartite graphs are almost regular. The latter means that the ratio between the maximal and minimal non-zero degree ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Our main result implies the following easily formulated statement. The set of edges E of every finite bipartite graph can be split into poly(log |E|) subsets so that all the resulting bipartite graphs are almost regular. The latter means that the ratio between the maximal and minimal non-zero degree of the left nodes is bounded by a constant and the same condition holds for the right nodes. Stated differently, every finite 2-dimensional set S ⊂ N 2 can be partitioned into poly(log |S|) parts so that in every part the ratio between the maximal size and the minimal size of non-empty horizontal section is bounded by a constant and the same condition holds for vertical sections. We prove a similar statement for n-dimensional sets for any n and show how it can be used to relate information inequalities for Shannon entropy of random variables to inequalities between sizes of sections and their projections of multi-dimensional finite sets. Let S be a finite n-dimensional set, that is, a subset of X1 ×X2 × · · ·×Xn for some X1, X2,..., Xn. For every set of indices A ⊂ {1, 2,..., n} = [n]
Partitioning Multi-Dimensional Sets in a Small
"... Our main result implies the following easily formulated statement. ..."

