Results 1  10
of
31
Multiresolution markov models for signal and image processing
 Proceedings of the IEEE
, 2002
"... This paper reviews a significant component of the rich field of statistical multiresolution (MR) modeling and processing. These MR methods have found application and permeated the literature of a widely scattered set of disciplines, and one of our principal objectives is to present a single, coheren ..."
Abstract

Cited by 125 (17 self)
 Add to MetaCart
(Show Context)
This paper reviews a significant component of the rich field of statistical multiresolution (MR) modeling and processing. These MR methods have found application and permeated the literature of a widely scattered set of disciplines, and one of our principal objectives is to present a single, coherent picture of this framework. A second goal is to describe how this topic fits into the even larger field of MR methods and concepts–in particular making ties to topics such as wavelets and multigrid methods. A third is to provide several alternate viewpoints for this body of work, as the methods and concepts we describe intersect with a number of other fields. The principle focus of our presentation is the class of MR Markov processes defined on pyramidally organized trees. The attractiveness of these models stems from both the very efficient algorithms they admit and their expressive power and broad applicability. We show how a variety of methods and models relate to this framework including models for selfsimilar and 1/f processes. We also illustrate how these methods have been used in practice. We discuss the construction of MR models on trees and show how questions that arise in this context make contact with wavelets, state space modeling of time series, system and parameter identification, and hidden
Injecting utility into anonymized datasets
 In SIGMOD
, 2006
"... Limiting disclosure in data publishing requires a careful balance between privacy and utility. Information about individuals must not be revealed, but a dataset should still be useful for studying the characteristics of a population. Privacy requirements such as kanonymity and ℓdiversity are desig ..."
Abstract

Cited by 102 (5 self)
 Add to MetaCart
(Show Context)
Limiting disclosure in data publishing requires a careful balance between privacy and utility. Information about individuals must not be revealed, but a dataset should still be useful for studying the characteristics of a population. Privacy requirements such as kanonymity and ℓdiversity are designed to thwart attacks that attempt to identify individuals in the data and to discover their sensitive information. On the other hand, the utility of such data has not been wellstudied. In this paper we will discuss the shortcomings of current heuristic approaches to measuring utility and we will introduce a formal approach to measuring utility. Armed with this utility metric, we will show how to inject additional information into kanonymous and ℓdiverse tables. This information has an intuitive semantic meaning, it increases the utility beyond what is possible in the original kanonymity and ℓdiversity frameworks, and it maintains the privacy guarantees of kanonymity and ℓdiversity. 1.
Independence is Good: DependencyBased Histogram Synopses for HighDimensional Data
 In SIGMOD
, 2001
"... Approximating the joint data distribution of a multidimensional data set through a compact and accurate histogram synopsis is a fundamental problem arising in numerous practical scenarios, including query optimization and approximate query answering. Existing solutions either rely on simplistic ind ..."
Abstract

Cited by 67 (12 self)
 Add to MetaCart
(Show Context)
Approximating the joint data distribution of a multidimensional data set through a compact and accurate histogram synopsis is a fundamental problem arising in numerous practical scenarios, including query optimization and approximate query answering. Existing solutions either rely on simplistic independence assumptions or try to directly approximate the full joint data distribution over the complete set of attributes. Unfortunately, both approaches are doomed to fail for highdimensional data sets with complex correlation patterns between attributes. In this paper, we propose a novel approach to histogrambased synopses that employs the solid foundation of statistical interaction models to explicitly identify and exploit the statistical characteristics of the data. Abstractly, our key idea is to break the synopsis into (1) a statistical interaction model that accurately captures significant correlation and independence patterns in data, and (2) a collection of histograms on lowdimensional marginals that, based on the model, can provide accurate approximations of the overall joint data distribution. Extensive experimental results with several reallife data sets verify the effectiveness of our approach. An important aspect of our general, modelbased methodology is that it can be used to enhance the performance of other synopsis techniques that are based on dataspace partitioning (e.g., wavelets) by providing an effective tool to deal with the “dimensionality curse”. 1.
Statistical Synopses for GraphStructured XML Databases
 In SIGMOD
, 2002
"... Effective support for XML query languages is becoming increasingly important with the emergence of new applications that access large volumes of XML data. All existing proposals for querying XML (e.g., XQuery) rely on a patternspecification language that allows path navigation and branching through ..."
Abstract

Cited by 55 (5 self)
 Add to MetaCart
Effective support for XML query languages is becoming increasingly important with the emergence of new applications that access large volumes of XML data. All existing proposals for querying XML (e.g., XQuery) rely on a patternspecification language that allows path navigation and branching through the XML data graph in order to reach the desired data elements. Optimizing such queries depends crucially on the existence of concise synopsis structures that enable accurate compiletime selectivity estimates for complex path expressions over graphstructured XML data. In this paper, we introduce a novel approach to building and using statistical summaries of large XML data graphs for effective pathexpression selectivity estimation. Our proposed graphsynopsis model (termed XSKETCH) exploits localized graph stability to accurately approximate (in limited space) the path and branching distribution in the data graph. To estimate the selectivities of complex path expressions over concise XSKETCH synopses, we develop an estimation framework that relies on appropriate statistical (uniformity and independence) assumptions to compensate for the lack of detailed distribution information. Given our estimation framework, we demonstrate that the problem of building an accuracyoptimal XSKETCH for a given amount of space is ### hard, and propose an efficient heuristic algorithm based on greedy forward selection. Briefly, our algorithm constructs an XSKETCH synopsis by successive refinements of the labelsplit graph, the coarsest summary of the XML data graph. Our refinement operations act locally and attempt to capture important statistical correlations between data paths. Extensive experimental results with synthetic as well as reallife data sets verify the effectiveness of our app...
Learning factor graphs in polynomial time and sample complexity
 JMLR
, 2006
"... We study the computational and sample complexity of parameter and structure learning in graphical models. Our main result shows that the class of factor graphs with bounded degree can be learned in polynomial time and from a polynomial number of training examples, assuming that the data is generated ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
We study the computational and sample complexity of parameter and structure learning in graphical models. Our main result shows that the class of factor graphs with bounded degree can be learned in polynomial time and from a polynomial number of training examples, assuming that the data is generated by a network in this class. This result covers both parameter estimation for a known network structure and structure learning. It implies as a corollary that we can learn factor graphs for both Bayesian networks and Markov networks of bounded degree, in polynomial time and sample complexity. Importantly, unlike standard maximum likelihood estimation algorithms, our method does not require inference in the underlying network, and so applies to networks where inference is intractable. We also show that the error of our learned model degrades gracefully when the generating distribution is not a member of the target class of networks. In addition to our main result, we show that the sample complexity of parameter learning in graphical models has an O(1) dependence on the number of variables in the model when using the KLdivergence normalized by the number of variables as the performance criterion.
Maximum likelihood bounded treewidth markov networks
 Artificial Intelligence
, 2001
"... We study the problem of projecting a distribution onto (or finding a maximum likelihood distribution among) Markov networks of bounded treewidth. By casting it as the combinatorial optimization problem of finding a maximum weight hypertree, we prove that it is NPhard to solve exactly and provide a ..."
Abstract

Cited by 46 (4 self)
 Add to MetaCart
We study the problem of projecting a distribution onto (or finding a maximum likelihood distribution among) Markov networks of bounded treewidth. By casting it as the combinatorial optimization problem of finding a maximum weight hypertree, we prove that it is NPhard to solve exactly and provide an approximation algorithm with a provable performance guarantee.
Efficient stepwise selection in decomposable models
 In Proc. UAI
, 2001
"... In this paper, we present an efficient algorithm for performing stepwise selection in the class of decomposable models. We focus on the forward selection procedure, but we also discuss how backward selection and the combination of the two can be performed efficiently. The main contributions of this ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
In this paper, we present an efficient algorithm for performing stepwise selection in the class of decomposable models. We focus on the forward selection procedure, but we also discuss how backward selection and the combination of the two can be performed efficiently. The main contributions of this paper are (1) a simple characterization for the edges that can be added to a decomposable model while retaining its decomposability and (2) an efficient algorithm for enumerating all such edges for a given decomposable model in O(n2) time, where n is the number of variables in the model. We also analyze the complexity of the overall stepwise selection procedure (which includes the complexity of enumerating eligible edges as well as the complexity of deciding how to “progress”). We use the KL divergence of the model from the saturated model as our metric, but the results we present here extend to many other metrics as well. 1
The Posterior Probability of Bayes Nets with Strong Dependences
 Soft Computing
, 1999
"... Stochastic independence is an idealized relationship located at one end of a continuum of values measuring degrees of dependence. Modeling real world systems, we are often not interested in the distinction between exact independence and any degree of dependence, but between weak ignorable and strong ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
Stochastic independence is an idealized relationship located at one end of a continuum of values measuring degrees of dependence. Modeling real world systems, we are often not interested in the distinction between exact independence and any degree of dependence, but between weak ignorable and strong substantial dependence. Good models map significant deviance from independence and neglect approximate independence or dependence weaker than a noise threshold. This intuition is applied to learning the structure of Bayes nets from data. We determine the conditional posterior probabilities of structures given that the degree of dependence at each of their nodes exceeds a critical noise level. Deviance from independence is measured by mutual information. Arc probabilities are determined by the amount of mutual information the neighbors contribute to a node, is greater than a critical minimum deviance from independence. A Ø 2 approximation for the probability density function of mutual info...
Probabilistic transfer matrices in symbolic reliability analysis of logic circuits
 ACM Transactions on Design Automation of Electronic Systems
"... We propose the probabilistic transfer matrix (PTM) framework to capture nondeterministic behavior in logic circuits. PTMs provide a concise description of both normal and faulty behavior, and are wellsuited to reliability and error susceptibility calculations. A few simple composition rules based o ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
We propose the probabilistic transfer matrix (PTM) framework to capture nondeterministic behavior in logic circuits. PTMs provide a concise description of both normal and faulty behavior, and are wellsuited to reliability and error susceptibility calculations. A few simple composition rules based on connectivity can be used to recursively build larger PTMs (representing entire logic circuits) from smaller gate PTMs. PTMs for gates in series are combined using matrix multiplication, and PTMs for gates in parallel are combined using the tensor product operation. PTMs can accurately calculate joint output probabilities in the presence of reconvergent fanout and inseparable joint input distributions. To improve computational efficiency, we encode PTMs as algebraic decision diagrams (ADDs). We also develop equivalent ADD algorithms for newly defined matrix operations such as eliminate variables and eliminate redundant variables, which aid in the numerical computation of circuit PTMs. We use PTMs to evaluate circuit reliability and derive polynomial approximations for circuit error probabilities in terms of gate error probabilities. PTMs can also analyze the effects of logic and electrical masking on error mitigation. We show that ignoring logic masking can overestimate errors by an order of magnitude. We incorporate electrical masking by computing error attenuation probabilities, based on analytical models, into an extended PTM