Results 11 - 20
of
1,254
Learning the structure of dynamic probabilistic networks
, 1998
"... Dynamic probabilistic networks are a compact representation of complex stochastic processes. In this paper we examine how to learn the structure of a DPN from data. We extend structure scoring rules for standard probabilistic networks to the dynamic case, and show how to search for structure when so ..."
Abstract
-
Cited by 161 (13 self)
- Add to MetaCart
Dynamic probabilistic networks are a compact representation of complex stochastic processes. In this paper we examine how to learn the structure of a DPN from data. We extend structure scoring rules for standard probabilistic networks to the dynamic case, and show how to search for structure when some of the variables are hidden. Finally, we examine two applications where such a technology might be useful: predicting and classifying dynamic behaviors, and learning causal orderings in biological processes. We provide empirical results that demonstrate the applicability of our methods in both domains. 1
Probabilistic independence networks for hidden Markov probability models
- Lifestyles() • Vendor() • AssortmentDefault() • Assortment(Assortment) • ProductDetailLegcareDefault() • ProductDetailLegcare(Product) • ProductDetailLegwearDefault() • ProductDetailLegwearProduct(Product) • ProductDetailLegwearAssortment(Assortment) • Pr
, 1997
"... Graphical techniques for modeling the dependencies of random variables have been explored in a variety of di erent areas including statistics, statistical physics, arti-cial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed ..."
Abstract
-
Cited by 155 (13 self)
- Add to MetaCart
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of di erent areas including statistics, statistical physics, arti-cial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper contains a self-contained review of the basic principles of PINs. It is shown that the well-known forward-backward (F-B) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate the advantages of the general approach. 1
Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables
- Machine Learning
, 1997
"... We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MD ..."
Abstract
-
Cited by 155 (9 self)
- Add to MetaCart
We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naive-Bayes models having a hidden root node, we find that (1) the BIC/MDL measure is the least accurate, having a bias in favor of simple models, and (2) the Draper and CS measures are the most accurate. 1
Analysis Of Multiresolution Image Denoising Schemes Using Generalized-Gaussian Priors
- IEEE TRANS. INFO. THEORY
, 1998
"... In this paper, we investigate various connections between wavelet shrinkage methods in image processing and Bayesian estimation using Generalized Gaussian priors. We present fundamental properties of the shrinkage rules implied by Generalized Gaussian and other heavy-tailed priors. This allows us to ..."
Abstract
-
Cited by 146 (7 self)
- Add to MetaCart
In this paper, we investigate various connections between wavelet shrinkage methods in image processing and Bayesian estimation using Generalized Gaussian priors. We present fundamental properties of the shrinkage rules implied by Generalized Gaussian and other heavy-tailed priors. This allows us to show a simple relationship between differentiability of the log-prior at zero and the sparsity of the estimates, as well as an equivalence between universal thresholding schemes and Bayesian estimation using a certain Generalized Gaussian prior.
Bayesian Model Averaging for Linear Regression Models
- Journal of the American Statistical Association
, 1997
"... We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. A Bayesian solution to this problem in ..."
Abstract
-
Cited by 133 (12 self)
- Add to MetaCart
We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. A Bayesian solution to this problem involves averaging over all possible models (i.e., combinations of predictors) when making inferences about quantities of
Polynomial Splines and Their Tensor Products in Extended Linear Modeling
- Ann. Statist
, 1997
"... ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to m ..."
Abstract
-
Cited by 121 (14 self)
- Add to MetaCart
ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to model the main effects, and their tensor products are used to model any interaction components that are included. In the special context of survival analysis, the baseline hazard function is modeled and nonproportionality is allowed. In general, the theory involves the L 2 rate of convergence for the fitted model and its components. The methodology involves least squares and maximum likelihood estimation, stepwise addition of basis functions using Rao statistics, stepwise deletion using Wald statistics, and model selection using BIC, cross-validation or an independent test set. Publically available software, written in C and interfaced to S/S-PLUS, is used to apply this methodology to...
Model Selection and the Principle of Minimum Description Length
- Journal of the American Statistical Association
, 1998
"... This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This ..."
Abstract
-
Cited by 114 (4 self)
- Add to MetaCart
This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we find many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can co-exist and be compared. We illustrate th...
Learning to Probabilistically Identify Authoritative Documents
- In Proceedings of the 17th International Conference on Machine Learning
, 2000
"... We describe a model of document citation that learns to identify hubs and authorities in a set of linked documents, such as pages retrieved from the world wide web, or papers retrieved from a research paper archive. Unlike the popular HITS algorithm, which relies on dubious statistical assumpt ..."
Abstract
-
Cited by 109 (2 self)
- Add to MetaCart
We describe a model of document citation that learns to identify hubs and authorities in a set of linked documents, such as pages retrieved from the world wide web, or papers retrieved from a research paper archive. Unlike the popular HITS algorithm, which relies on dubious statistical assumptions, our model provides probabilistic estimates that have clear semantics. We also find that in general, the identified authoritative documents correspond better to human intuition. 1. Introduction Bibliometrics has been described as a "series of techniques that seek to quantify the process of written communication" (Ikpaahindi, 1985). It typically attempts to give quantified answers to questions involving the relationships among documents, or authors and documents: "Who are the most authoritative authors in this field?" "What are the seminal papers?" "How many distinct communities are studying this subject?" and many others (see White & McCain, 1989 for details). Traditionally, the s...
Learning Belief Networks in the Presence of Missing Values and Hidden Variables
- Proceedings of the Fourteenth International Conference on Machine Learning
, 1997
"... In recent years there has been a flurry of works on learning probabilistic belief networks. Current state of the art methods have been shown to be successful for two learning scenarios: learning both network structure and parameters from complete data, and learning parameters for a fixed network fr ..."
Abstract
-
Cited by 107 (14 self)
- Add to MetaCart
In recent years there has been a flurry of works on learning probabilistic belief networks. Current state of the art methods have been shown to be successful for two learning scenarios: learning both network structure and parameters from complete data, and learning parameters for a fixed network from incomplete data---that is, in the presence of missing values or hidden variables. However, no method has yet been demonstrated to effectively learn network structure from incomplete data. In this paper, we propose a new method for learning network structure from incomplete data. This method is based on an extension of the Expectation-Maximization (EM) algorithm for model selection problems that performs search for the best structure inside the EM procedure. We prove the convergence of this algorithm, and adapt it for learning belief networks. We then describe how to learn networks in two scenarios: when the data contains missing values, and in the presence of hidden variables. We provide...
Universal Prediction
- IEEE Transactions on Information Theory
, 1998
"... This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. ..."
Abstract
-
Cited by 99 (6 self)
- Add to MetaCart
This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression.

