Results 1  10
of
149
Supervised Learning of Quantizer Codebooks by Information Loss Minimization
, 2007
"... This paper proposes a technique for jointly quantizing continuous features and the posterior distributions of their class labels based on minimizing empirical information loss, such that the index K of the quantizer region to which a given feature X is assigned approximates a sufficient statistic fo ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
This paper proposes a technique for jointly quantizing continuous features and the posterior distributions of their class labels based on minimizing empirical information loss, such that the index K of the quantizer region to which a given feature X is assigned approximates a sufficient statistic for its class label Y. We derive an alternating minimization procedure for simultaneously learning codebooks in the Euclidean feature space and in the simplex of posterior class distributions. The resulting quantizer can be used to encode unlabeled points outside the training set and to predict their posterior class distributions, and has an elegant interpretation in terms of lossless source coding. The proposed method is extensively validated on synthetic and real datasets, and is applied to two diverse problems: learning discriminative visual vocabularies for bagoffeatures image classification, and image segmentation.
Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation
"... Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively timeconsuming to do separately for each species, or unreliable for small or biased ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively timeconsuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use ‘‘default settings’’, tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presenceonly data. We evaluate our method on independently collected highquality presenceabsence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce ‘‘hinge features’ ’ that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore ‘‘background sampling’’ strategies that cope with sample selection bias and decrease modelbuilding time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presenceonly data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model
An Improved Outer Bound for the Multiterminal SourceCoding Problem
 Proc. of the Intl. Symp. on Info. Theory
, 2005
"... Abstract—We prove a new outer bound on the rate–distortion region for the multiterminal sourcecoding problem. This bound subsumes the best outer bound in the literature and improves upon it strictly in some cases. The improved bound enables us to obtain a new, conclusive result for the binary erasu ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Abstract—We prove a new outer bound on the rate–distortion region for the multiterminal sourcecoding problem. This bound subsumes the best outer bound in the literature and improves upon it strictly in some cases. The improved bound enables us to obtain a new, conclusive result for the binary erasure version of the “CEO problem. ” The bound recovers many of the converse results that have been established for special cases of the problem, including the recent one for the Gaussian twoencoder problem. Index Terms—CEO problem, erasure distortion, multiterminal source coding, outer bound, rate region, rate–distortion. Fig. 1. Separate encoding of correlated sources.
Nonlinear Extraction of Independent Components of Natural Images Using Radial Gaussianization
, 2009
"... We consider the problem of efficiently encoding a signal by transforming it to a new representation whose components are statistically independent. A widely studied linear solution, known as independent component analysis (ICA), exists for the case when the signal is generated as a linear transforma ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
We consider the problem of efficiently encoding a signal by transforming it to a new representation whose components are statistically independent. A widely studied linear solution, known as independent component analysis (ICA), exists for the case when the signal is generated as a linear transformation of independent nongaussian sources. Here, we examine a complementary case, in which the source is nongaussian and elliptically symmetric. In this case, no invertible linear transform suffices to decompose the signal into independent components, but we show that a simple nonlinear transformation, which we call radial gaussianization (RG), is able to remove all dependencies. We then examine this methodology in the context of natural image statistics. We first show that distributions of spatially proximal bandpass filter responses are better described as elliptical than as linearly transformed independent sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either nearby pairs or blocks of bandpass filter responses is significantly greater than that achieved by ICA. Finally, we show that the RG transformation may be closely approximated by divisive normalization, which has been used to model the nonlinear response properties of visual neurons.
Capacity of cognitive interference channels with and without secrecy,” submitted for publication
"... Abstract—Like the conventional twouser interference channel, the cognitive interference channel consists of two transmitters whose signals interfere at two receivers. It is assumed that there is a common message (message 1) known to both transmitters, and an additional independent message (message ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Abstract—Like the conventional twouser interference channel, the cognitive interference channel consists of two transmitters whose signals interfere at two receivers. It is assumed that there is a common message (message 1) known to both transmitters, and an additional independent message (message 2) known only to the cognitive transmitter (transmitter 2). The cognitive receiver (receiver 2) needs to decode messages 1 and 2, while the noncognitive receiver (receiver 1) should decode only message 1. Furthermore, message 2 is assumed to be a confidential message which needs to be kept as secret as possible from receiver 1, which is viewed as an eavesdropper with regard to message 2. The level of secrecy is measured by the equivocation rate. In this paper, a singleletter expression for the capacityequivocation region of the discrete memoryless cognitive interference channel is obtained. The capacityequivocation region for the Gaussian cognitive interference channel is also obtained explicitly. Moreover, particularizing the capacityequivocation region to the case without a secrecy constraint, the capacity region for the twouser cognitive interference channel is obtained, by providing a converse theorem. Index Terms—Capacityequivocation region, cognitive communication, confidential messages, interference channel, rate splitting, secrecy capacity region. I.
On secret sharing schemes, matroids and polymatroids
 Journal of Mathematical Cryptology
"... The complexity of a secret sharing scheme is defined as the ratio between the maximum length of the shares and the length of the secret. The optimization of this parameter for general access structures is an important and very difficult open problem in secret sharing. We explore in this paper the co ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
The complexity of a secret sharing scheme is defined as the ratio between the maximum length of the shares and the length of the secret. The optimization of this parameter for general access structures is an important and very difficult open problem in secret sharing. We explore in this paper the connections of this open problem with matroids and polymatroids. Matroid ports were introduced by Lehman in 1964. A forbidden minor characterization of matroid ports was given by Seymour in 1976. These results are previous to the invention of secret sharing by Shamir in 1979. Important connections between ideal secret sharing schemes and matroids were discovered by Brickell and Davenport in 1991. Their results can be restated as follows: every ideal secret sharing scheme defines a matroid, and its access structure is a port of that matroid. In spite of this, the results by Lehman and Seymour and other subsequent results on matroid ports have not been noticed until now by the researchers interested in secret sharing. Lower bounds on the optimal complexity of access structures can be found by taking into account that the joint Shannon entropies of a set of random variables define a polymatroid.
CodeOR: Opportunistic Routing in Wireless Mesh Networks with Segmented Network Coding
"... Abstract—Opportunistic routing significantly increases unicast throughput in wireless mesh networks by effectively utilizing the wireless broadcast medium. With network coding, opportunistic routing can be implemented in a simple and practical way without resorting to a complicated scheduling protoc ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract—Opportunistic routing significantly increases unicast throughput in wireless mesh networks by effectively utilizing the wireless broadcast medium. With network coding, opportunistic routing can be implemented in a simple and practical way without resorting to a complicated scheduling protocol. Due to constraints of computational complexity, a protocol utilizing network coding needs to perform segmented network coding, which partitions the data into multiple segments and encode only packets in the same segment. However, existing designs transmit only one segment at any given time while waiting for its acknowledgment, which degrades performance as the size of the network scales up. In this paper, we propose CodeOR, a new protocol that uses network coding in opportunistic routing to improve throughput. By transmitting a window of multiple segments concurrently, it improves the performance of existing work by a factor of two on average (and a factor of four in some cases). CodeOR is especially appropriate for realtime multimedia applications through the use of a small segment size to decrease decoding delay, and is able to further increase network throughput with a smaller packet size and a larger window size. I.
Nonlinear sparsegraph codes for lossy compression of discrete nonredundant sources
 in Proc. Information Theory Workshop, Lake Tahoe, CA
, 2007
"... Abstract—We propose a scheme for lossy compression of discrete memoryless sources: The compressor is the decoder of a nonlinear channel code, constructed from a sparse graph. We prove asymptotic optimality of the scheme for any separable (letterbyletter) bounded distortion criterion. We also prese ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Abstract—We propose a scheme for lossy compression of discrete memoryless sources: The compressor is the decoder of a nonlinear channel code, constructed from a sparse graph. We prove asymptotic optimality of the scheme for any separable (letterbyletter) bounded distortion criterion. We also present a suboptimal compression algorithm, which exhibits nearoptimal performance for moderate block lengths. Index Terms—Discrete memoryless sources, lossy data compression, rate–distortion theory, source–channel coding duality, sparsegraph codes. I.
Finding good itemsets by packing data
 In ICDM
, 2008
"... The problem of selecting small groups of itemsets that represent the data well has recently gained a lot of attention. We approach the problem by searching for the itemsets that compress the data efficiently. As a compression technique we use decision trees combined with a refined version of MDL. Mo ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
The problem of selecting small groups of itemsets that represent the data well has recently gained a lot of attention. We approach the problem by searching for the itemsets that compress the data efficiently. As a compression technique we use decision trees combined with a refined version of MDL. More formally, assuming that the items are ordered, we create a decision tree for each item that may only depend on the previous items. Our approach allows us to find complex interactions between the attributes, not just cooccurrences of 1s. Further, we present a link between the itemsets and the decision trees and use this link to export the itemsets from the decision trees. In this paper we present two algorithms. The first one is a simple greedy approach that builds a family of itemsets directly from data. The second one, given a collection of candidate itemsets, selects a small subset of these itemsets. Our experiments show that these approaches result in compact and high quality descriptions of the data. 1