• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

C.: Information Theory, Inference and Learning Algorithms (2003)

by D J MacKay
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 1,937
Next 10 →

Bayesian Data Analysis

by Andrew Gelman, Christian Robert, Nicolas Chopin, Judith Rousseau , 1995
"... I actually own a copy of Harold Jeffreys’s Theory of Probability but have only read small bits of it, most recently over a decade ago to confirm that, indeed, Jeffreys was not too proud to use a classical chi-squared p-value when he wanted to check the misfit of a model to data (Gelman, Meng and Ste ..."
Abstract - Cited by 2194 (63 self) - Add to MetaCart
I actually own a copy of Harold Jeffreys’s Theory of Probability but have only read small bits of it, most recently over a decade ago to confirm that, indeed, Jeffreys was not too proud to use a classical chi-squared p-value when he wanted to check the misfit of a model to data (Gelman, Meng and Stern, 2006). I do, however, feel that it is important to understand where our probability models come from, and I welcome the opportunity to use the present article by Robert, Chopin and Rousseau as a platform for further discussion of foundational issues. 2 In this brief discussion I will argue the following: (1) in thinking about prior distributions, we should go beyond Jeffreys’s principles and move toward weakly informative priors; (2) it is natural for those of us who work in social and computational sciences to favor complex models, contra Jeffreys’s preference for simplicity; and (3) a key generalization of Jeffreys’s ideas is to explicitly include model checking in the process of data analysis.
(Show Context)

Citation Context

...check the fit of the model. There are many arguments which make such an approach compelling. Without entering into philosophical and epistemological arguments on the nature of Science (Jeffreys 1939, =-=MacKay 2002-=-, Jaynes 2003), we briefly state what we view as the main practical appealing features of introducing a prior probability on θ. First such an approach allows to incorporate prior information in a natu...

Community detection in graphs

by Santo Fortunato , 2009
"... The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of th ..."
Abstract - Cited by 821 (1 self) - Add to MetaCart
The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such

Power-law distributions in empirical data

by Aaron Clauset, Cosma Rohilla Shalizi, M. E. J. Newman - ISSN 00361445. doi: 10.1137/ 070710111. URL http://dx.doi.org/10.1137/070710111 , 2009
"... Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the empirical detection and characterization of power laws is made difficult by the large fluctuations that occur in the t ..."
Abstract - Cited by 607 (7 self) - Add to MetaCart
Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the empirical detection and characterization of power laws is made difficult by the large fluctuations that occur in the tail of the distribution. In particular, standard methods such as least-squares fitting are known to produce systematically biased estimates of parameters for power-law distributions and should not be used in most circumstances. Here we describe statistical techniques for making accurate parameter estimates for power-law data, based on maximum likelihood methods and the Kolmogorov-Smirnov statistic. We also show how to tell whether the data follow a power-law distribution at all, defining quantitative measures that indicate when the power law is a reasonable fit to the data and when it is not. We demonstrate these methods by applying them to twentyfour real-world data sets from a range of different disciplines. Each of the data sets has been conjectured previously to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out.
(Show Context)

Citation Context

... as a parameter because we know its value automatically once we are given a list of the other parameters—it is just the length of that list.Power-law distributions in empirical data 11 the evidence) =-=[30, 35]-=-, i.e., the likelihood of the data given the number of model parameters, integrated over the parameters’ possible values. Unfortunately, the integral cannot usually be performed analytically, but one ...

Overview of the scalable video coding extension of the H.264/AVC standard

by Heiko Schwarz, Detlev Marpe, Thomas Wiegand - IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY IN CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2007
"... With the introduction of the H.264/AVC video coding standard, significant improvements have recently been demonstrated in video compression capability. The Joint Video Team of the ITU-T VCEG and the ISO/IEC MPEG has now also standardized a Scalable Video Coding (SVC) extension of the H.264/AVC stand ..."
Abstract - Cited by 522 (6 self) - Add to MetaCart
With the introduction of the H.264/AVC video coding standard, significant improvements have recently been demonstrated in video compression capability. The Joint Video Team of the ITU-T VCEG and the ISO/IEC MPEG has now also standardized a Scalable Video Coding (SVC) extension of the H.264/AVC standard. SVC enables the transmission and decoding of partial bit streams to provide video services with lower temporal or spatial resolutions or reduced fidelity while retaining a reconstruction quality that is high relative to the rate of the partial bit streams. Hence, SVC provides functionalities such as graceful degradation in lossy transmission environments as well as bit rate, format, and power adaptation. These functionalities provide enhancements to transmission and storage applications. SVC has achieved significant improvements in coding efficiency with an increased degree of supported scalability relative to the scalable profiles of prior video coding standards. This paper provides an overview of the basic concepts for extending H.264/AVC towards SVC. Moreover, the basic tools for providing temporal, spatial, and quality scalability are described in detail and experimentally analyzed regarding their efficiency and complexity.

Near-optimal sensor placements in gaussian processes

by Andreas Krause, Ajit Singh, Carlos Guestrin, Chris Williams - In ICML , 2005
"... When monitoring spatial phenomena, which can often be modeled as Gaussian processes (GPs), choosing sensor locations is a fundamental task. There are several common strategies to address this task, for example, geometry or disk models, placing sensors at the points of highest entropy (variance) in t ..."
Abstract - Cited by 342 (34 self) - Add to MetaCart
When monitoring spatial phenomena, which can often be modeled as Gaussian processes (GPs), choosing sensor locations is a fundamental task. There are several common strategies to address this task, for example, geometry or disk models, placing sensors at the points of highest entropy (variance) in the GP model, and A-, D-, or E-optimal design. In this paper, we tackle the combinatorial optimization problem of maximizing the mutual information between the chosen locations and the locations which are not selected. We prove that the problem of finding the configuration that maximizes mutual information is NP-complete. To address this issue, we describe a polynomial-time approximation that is within (1 − 1/e) of the optimum by exploiting the submodularity of mutual information. We also show how submodularity can be used to obtain online bounds, and design branch and bound search procedures. We then extend our algorithm to exploit lazy evaluations and local structure in the GP, yielding significant speedups. We also extend our approach to find placements which are robust against node failures and uncertainties in the model. These extensions are again associated with rigorous theoretical approximation guarantees, exploiting the submodularity of the objective function. We demonstrate the advantages of our approach towards optimizing mutual information in a very extensive empirical study on two real-world data sets.
(Show Context)

Citation Context

...cular locations, and thus optimize their positions. This initial GP is, of course, a rough model, and a sensor placement strategy can be viewed as an inner-loop step for an active learning algorithm (=-=MacKay, 2003-=-). Typical sensor placement techniques greedily add sensors where uncertainty about the phenomena is highest, i.e., the highest entropy location of the GP (Cressie, 1991). Unfortunately, this criterio...

An Empirical Bayes Approach to Inferring Large-Scale Gene Association Networks

by Juliane Schäfer, Korbinian Strimmer - BIOINFORMATICS , 2004
"... Motivation: Genetic networks are often described statistically by graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standar ..."
Abstract - Cited by 237 (6 self) - Add to MetaCart
Motivation: Genetic networks are often described statistically by graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standard algorithms for graphical models inapplicable, and inferring genetic networks an “ill-posed” inverse problem. Methods: We introduce a novel framework for small-sample inference of graphical models from gene expression data. Specifically, we focus on so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes. Our new approach is based on (i) improved (regularized) small-sample point estimates of partial correlation, (ii) an exact test of edge inclusion with adaptive estimation of the degree of freedom, and (iii) a heuristic network search based on false discovery rate multiple testing. Steps (ii) and (iii) correspond to an empirical Bayes estimate of the network topology. Results: Using computer simulations we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for small-sample data sets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding large-scale gene association network for 3,883 genes. Availability: The authors have implemented the approach in the R package “GeneTS ” that is freely available from

Practical privacy: the sulq framework

by Avrim Blum, Cynthia Dwork, Frank Mcsherry, Kobbi Nissim - In PODS ’05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems , 2005
"... We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping ..."
Abstract - Cited by 223 (35 self) - Add to MetaCart
We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping database rows to {0, 1}. The true answer is P i∈S f(di), and a noisy version is released as the response to the query. Results of Dinur, Dwork, and Nissim show that a strong form of privacy can be maintained using a surprisingly small amount of noise – much less than the sampling error – provided the total number of queries is sublinear in the number of database rows. We call this query and (slightly) noisy reply the SuLQ (Sub-Linear Queries) primitive. The assumption of sublinearity becomes reasonable as databases grow increasingly large. We extend this work in two ways. First, we modify the privacy analysis to real-valued functions f and arbitrary row types, as a consequence greatly improving the bounds on noise required for privacy. Second, we examine the computational power of the SuLQ primitive. We show that it is very powerful indeed, in that slightly noisy versions of the following computations can be carried out with very few invocations of the primitive: principal component analysis, k means clustering, the Perceptron Algorithm, the ID3 algorithm, and (apparently!) all algorithms that operate in the in the statistical query learning model [11].
(Show Context)

Citation Context

...e rows are drawn from independent probability distributions, we assume that the adversary holds beliefs about the rows that are similarly independent. If the reader is uncomfortable with beliefs (see =-=[16]-=-, section 2.2 for a good introduction), they can simply assume that the data is drawn from a distribution without missing much, aside from the generality. The intent of the independence assumption is ...

A Contextual-Bandit Approach to Personalized News Article Recommendation

by Lihong Li, Wei Chu, John Langford, Robert E. Schapire
"... Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamic ..."
Abstract - Cited by 178 (16 self) - Add to MetaCart
Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation. In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks. The contributions of this work are three-fold. First, we propose a new, general contextual bandit algorithm that is computationally efficient and well motivated from learning theory. Second, we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic. Finally, using this offline evaluation method, we successfully applied our new algorithm to a Yahoo! Front Page Today Module dataset containing over 33 million events. Results showed a 12.5 % click lift compared to a standard context-free bandit algorithm, and the advantage becomes even greater when data gets more scarce.
(Show Context)

Citation Context

... model, the predictive variance of the expected payoff x ⊤ t,aθ ∗ a is evaluated as x ⊤ t,aA −1 a xt,a, and then q x⊤ t,aA −1 a xt,a becomes the standard deviation. Furthermore, in information theory =-=[19]-=-, the differential entropy of p(θa) is defined as − 1 2 ln((2π)d detAa). The entropy of p(θa) when updated by the inclusion of the new point xt,a then becomes − 1 2 ln((2π)d det (Aa + xt,ax ⊤ t,a)). T...

Towards highly reliable enterprise network services via inference of multi-level dependencies

by Paramvir Bahl, Ranveer Chandra, Albert Greenberg, Srikanth Kandula, David A. Maltz, Ming Zhang - IN SIGCOMM , 2007
"... Localizing the sources of performance problems in large enterprise networks is extremely challenging. Dependencies are numerous, complex and inherently multi-level, spanning hardware and software components across the network and the computing infrastructure. To exploit these dependencies for fast, ..."
Abstract - Cited by 161 (10 self) - Add to MetaCart
Localizing the sources of performance problems in large enterprise networks is extremely challenging. Dependencies are numerous, complex and inherently multi-level, spanning hardware and software components across the network and the computing infrastructure. To exploit these dependencies for fast, accurate problem localization, we introduce an Inference Graph model, which is well-adapted to user-perceptible problems rooted in conditions giving rise to both partial service degradation and hard faults. Further, we introduce the Sherlock system to discover Inference Graphs in the operational enterprise, infer critical attributes, and then leverage the result to automatically detect and localize problems. To illuminate strengths and limitations of the approach, we provide results from a prototype deployment in a large enterprise network, as well as from testbed emulations and simulations. In particular, we find that taking into account multi-level structure leads to a 30 % improvement in fault localization, as compared to two-level approaches.

Gaussian process dynamical models for human motion

by Jack M. Wang, David J. Fleet, Aaron Hertzmann - IEEE TRANS. PATTERN ANAL. MACHINE INTELL , 2008
"... We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from high-dimensional motion capture data. A GPDM is a latent variable model. It comprises a lowdimensional latent space with associated dynamics, ..."
Abstract - Cited by 158 (5 self) - Add to MetaCart
We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from high-dimensional motion capture data. A GPDM is a latent variable model. It comprises a lowdimensional latent space with associated dynamics, as well as a map from the latent space to an observation space. We marginalize out the model parameters in closed form by using Gaussian process priors for both the dynamical and the observation mappings. This results in a nonparametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach and compare four learning algorithms on human motion capture data, in which each pose is 50-dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces.
(Show Context)

Citation Context

...unction of the columns of B. Therefore, with an isotropic Gaussian prior on the columns of B and the Gaussian noise assumption above, one can show that marginalizing over g can be done in closed form =-=[47]-=-, [48]. In doing so, we obtain a Gaussian density over the observations Y y 1; ...; y NŠ T , which can be expressed as a product of GPs (one for each of the D data dimensions) pðY j X; ; WÞ jWj N f...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University