Results 1  10
of
38
Probabilistic Inference Using Markov Chain Monte Carlo Methods
, 1993
"... Probabilistic inference is an attractive approach to uncertain reasoning and empirical learning in artificial intelligence. Computational difculties arise, however, because probabilistic models with the necessary realism and flexibility lead to complex distributions over highdimensional spaces. Rel ..."
Abstract

Cited by 567 (20 self)
 Add to MetaCart
Probabilistic inference is an attractive approach to uncertain reasoning and empirical learning in artificial intelligence. Computational difculties arise, however, because probabilistic models with the necessary realism and flexibility lead to complex distributions over highdimensional spaces. Related problems in other fields have been tackled using Monte Carlo methods based on sampling using Markov chains, providing a rich array of techniques that can be applied to problems in artificial intelligence. The "Metropolis algorithm" has been used to solve difficult problems in statistical physics for over forty years, and, in the last few years, the related method of "Gibbs sampling" has been applied to problems of statistical inference. Concurrently, an alternative method for solving problems in statistical physics by means of dynamical simulation has been developed as well, and has recently been unified with the Metropolis algorithm to produce the "hybrid Monte Carlo" method. In computer science, Markov chain sampling is the basis of the heuristic optimization technique of "simulated annealing", and has recently been used in randomized algorithms for approximate counting of large sets. In this review, I outline the role of probabilistic inference in artificial intelligence, and present the theory of Markov chains, and describe various Markov chain Monte Carlo algorithms, along with a number of supporting techniques. I try to present a comprehensive picture of the range of methods that have been developed, including techniques from the varied literature that have not yet seen wide application in artificial intelligence, but which appear relevant. As illustrative examples, I use the problems of probabilitic inference in expert systems, discovery of latent classes from data, and Bayesian learning for neural networks.
Stacked generalization
 Neural Networks
, 1992
"... Abstract: This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a s ..."
Abstract

Cited by 550 (7 self)
 Add to MetaCart
Abstract: This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of crossvalidation, exploiting a strategy more sophisticated than crossvalidation’s crude winnertakesall for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surfacefitter. With the other experimental evidence in the literature, the usual arguments supporting crossvalidation, and the abstract justifications presented in this paper, the conclusion is that for almost any realworld generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory. Key Words: generalization and induction, combining generalizers, learning set preprocessing, crossvalidation, error estimation and correction.
Power laws, Pareto distributions and Zipf’s law
 Contemporary Physics
, 2005
"... When the probability of measuring a particular value of some quantity varies inversely as a power of that value, the quantity is said to follow a power law, also known variously as Zipf’s law or the Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences, econ ..."
Abstract

Cited by 170 (0 self)
 Add to MetaCart
When the probability of measuring a particular value of some quantity varies inversely as a power of that value, the quantity is said to follow a power law, also known variously as Zipf’s law or the Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. For instance, the distributions of the sizes of cities, earthquakes, solar flares, moon craters, wars and people’s personal fortunes all appear to follow power laws. The origin of powerlaw behaviour has been a topic of debate in the scientific community for more than a century. Here we review some of the empirical evidence for the existence of powerlaw forms and the theories proposed to explain them. I.
ConstrainedRealization MonteCarlo method for Hypothesis Testing
 Physica D
"... : We compare two theoretically distinct approaches to generating artificial (or "surrogate") data for testing hypotheses about a given data set. The first and more straightforward approach is to fit a single "best" model to the original data, and then to generate surrogate data sets that are "typica ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
: We compare two theoretically distinct approaches to generating artificial (or "surrogate") data for testing hypotheses about a given data set. The first and more straightforward approach is to fit a single "best" model to the original data, and then to generate surrogate data sets that are "typical realizations" of that model. The second approach concentrates not on the model but directly on the original data; it attempts to constrain the surrogate data sets so that they exactly agree with the original data for a specified set of sample statistics. Examples of these two approaches are provided for two simple cases: a test for deviations from a gaussian distribution, and a test for serial dependence in a time series. Additionally, we consider tests for nonlinearity in time series based on a Fourier transform (FT) method and on more conventional autoregressive movingaverage (ARMA) fits to the data. The comparative performance of hypothesis testing schemes based on these two approaches...
What is Special About Spatial Data? Alternative Perspectives on Spatial Data Analysis
, 1989
"... The analysis of spatial data has always played a central role in the quantitative scientific tradition in geography. Recently, there have appeared a considerable number of publications devoted to presenting research results and to assessing the state of the art. ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
The analysis of spatial data has always played a central role in the quantitative scientific tradition in geography. Recently, there have appeared a considerable number of publications devoted to presenting research results and to assessing the state of the art.
Prototype Selection for Composite Nearest Neighbor Classifiers
, 1997
"... Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of realworld applications, ranging from protein sequence identificatio ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of realworld applications, ranging from protein sequence identification to determining the fat content of ground meat. Despite such individual successes, the answers are not known to fundamental questions about classifier combination, such as "Can classifiers from any given model class be combined to create a composite classifier with higher accuracy?" or "Is it possible to increase the accuracy of a given classifier by combining its predictions with those of only a small number o...
How to Fit a Response Time Distribution
"... Among the most valuable tools in behavioral science is statistically fitting mathematical models of cognition to data, response time distributions in particular. However, techniques for fitting distributions vary widely and little is known about the efficacy of different techniques. In this article, ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
Among the most valuable tools in behavioral science is statistically fitting mathematical models of cognition to data, response time distributions in particular. However, techniques for fitting distributions vary widely and little is known about the efficacy of different techniques. In this article, we assessed several fitting techniques by simulating six widely cited models of response time and using the fitting procedures to recover model parameters. The techniques include the maximization of likelihood and leastsquares fits of the theoretical distributions to different empirical estimates of the simulated distributions. A running example was used to illustrate the different estimation and fitting procedures. The simulation studies revealed that empirical density estimates are biased even for very large sample sizes. Some fitting techniques yielded more accurate and less variable parameter estimates than others. Methods that involved leastsquares fits to density estimates generally yielded very poor parameter estimates. How to Fit a Response Time Distribution The importance of considering the entire response time (RT) distribution in testing formal models of cognition is now widely appreciated. Fitting a model to mean RT alone can mask important details of the data that examination of the entire distribution would reveal, such as the behavior of fast and slow responses across the conditions of an experiment (e.g., Heathcote, Popiel & Mewhort, 1991), the extent of facilitation between perceptual channels (Miller, 1982), and the effects of practice on RT quantiles (Logan, 1992). Techniques for testing hypotheses based on the RT distribution have been developed (Townsend, 1990). In addition, the RT distribution provides an important meeting ground between theory and da...
Detecting Nonlinearity in Data with Long Coherence Times
, 1992
"... this article, we will describe (yet) another source of difficulty that arises in the analysis of time series data. The particular problem of detecting nonlinear structure  either by comparison of the data to linear surrogate data, or by comparing linear and nonlinear predictors  is seen to be ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
this article, we will describe (yet) another source of difficulty that arises in the analysis of time series data. The particular problem of detecting nonlinear structure  either by comparison of the data to linear surrogate data, or by comparing linear and nonlinear predictors  is seen to be complicated when the data exhibits long coherence times. In this section we define some terms and discuss linear modeling of time series. Section 2 describes the method of surrogate data, and compares two approaches to generating surrogate data. We find that both have difficulties trying to mimic data with long coherence time. We illustrate these problems with real and computergenerated time series in Section 3, including the time series E.dat from the the SFI competition. In the last section, we discuss what it is about the analysis or the data that is problematic.
MetricBased Methods for Adaptive Model Selection and Regularization
 Machine Learning
, 2001
"... We present a general approach to model selection and regularization that exploits unlabeled data to adaptively control hypothesis complexity in supervised learning tasks. The idea is to impose a metric structure on hypotheses by determining the discrepancy between their predictions across the di ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
We present a general approach to model selection and regularization that exploits unlabeled data to adaptively control hypothesis complexity in supervised learning tasks. The idea is to impose a metric structure on hypotheses by determining the discrepancy between their predictions across the distribution of unlabeled data. We show how this metric can be used to detect untrustworthy training error estimates, and devise novel model selection strategies that exhibit theoretical guarantees against overtting (while still avoiding under tting). We then extend the approach to derive a general training criterion for supervised learningyielding an adaptive regularization method that uses unlabeled data to automatically set regularization parameters. This new criterion adjusts its regularization level to the specic set of training data received, and performs well on a variety of regression and conditional density estimation tasks. The only proviso for these methods is that s...
Testing For Nonlinearity Using Redundancies: Quantitative and Qualitative Aspects
 Physica D
, 1995
"... A method for testing nonlinearity in time series is described based on informationtheoretic functionals  redundancies, linear and nonlinear forms of which allow either qualitative, or, after incorporating the surrogate data technique, quantitative evaluation of dynamical properties of scrutinized ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
A method for testing nonlinearity in time series is described based on informationtheoretic functionals  redundancies, linear and nonlinear forms of which allow either qualitative, or, after incorporating the surrogate data technique, quantitative evaluation of dynamical properties of scrutinized data. An interplay of quantitative and qualitative testing on both the linear and nonlinear levels is analyzed and robustness of this combined approach against spurious nonlinearity detection is demonstrated. Evaluation of redundancies and redundancybased statistics as functions of time lag and embedding dimension can further enhance insight into dynamics of a system under study. Keywords: time series, nonlinearity, mutual information, redundancy, surrogate data 1 Introduction The problem of inferring the dynamics of a system from measured data is a perpetual challenge for time series analysts. Ideas and concepts from nonlinear dynamics and theory of deterministic chaos have led to a num...