Results 1  10
of
46
Toward a method of selecting among computational models of cognition
 Psychological Review
, 2002
"... The question of how one should decide among competing explanations of data is at the heart of the scientific enterprise. Computational models of cognition are increasingly being advanced as explanations of behavior. The success of this line of inquiry depends on the development of robust methods to ..."
Abstract

Cited by 74 (4 self)
 Add to MetaCart
The question of how one should decide among competing explanations of data is at the heart of the scientific enterprise. Computational models of cognition are increasingly being advanced as explanations of behavior. The success of this line of inquiry depends on the development of robust methods to guide the evaluation and selection of these models. This article introduces a method of selecting among mathematical models of cognition known as minimum description length, which provides an intuitive and theoretically wellgrounded understanding of why one model should be chosen. A central but elusive concept in model selection, complexity, can also be derived with the method. The adequacy of the method is demonstrated in 3 areas of cognitive modeling: psychophysics, information integration, and categorization. How should one choose among competing theoretical explanations of data? This question is at the heart of the scientific enterprise, regardless of whether verbal models are being tested in an experimental setting or computational models are being evaluated in simulations. A number of criteria have been proposed to assist in this endeavor, summarized nicely by Jacobs and Grainger
MDL Denoising
 IEEE Transactions on Information Theory
, 1999
"... The socalled denoising problem, relative to normal models for noise, is formalized such that `noise' is defined as the incompressible part in the data while the compressible part defines the meaningful information bearing signal. Such a decomposition is effected by minimization of the ideal code ..."
Abstract

Cited by 49 (9 self)
 Add to MetaCart
The socalled denoising problem, relative to normal models for noise, is formalized such that `noise' is defined as the incompressible part in the data while the compressible part defines the meaningful information bearing signal. Such a decomposition is effected by minimization of the ideal code length, called for by the Minimum Description Length (MDL) principle, and obtained by an application of the normalized maximum likelihood technique to the primary parameters, their range, and their number. For any orthonormal regression matrix, such as defined by wavelet transforms, the minimization can be done with a threshold for the squared coefficients resulting from the expansion of the data sequence in the basis vectors defined by the matrix. keywords: linear regression, wavelet transforms, threshold, stochastic complexity, Kolmogorov sufficient statistics 1 Introduction Intuitively speaking the socalled `denoising' problem is to separate an observed data sequence x 1 ; x 2 ; ...
Discovering Clusters in Motion TimeSeries Data
 In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2003
"... A new approach is proposed for clustering timeseries data. The approach can be used to discover groupings of similar object motions that were observed in a video collection. A finite mixture of hidden Markov models (HMMs) is fitted to the motion data using the expectationmaximization (EM) framewor ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
A new approach is proposed for clustering timeseries data. The approach can be used to discover groupings of similar object motions that were observed in a video collection. A finite mixture of hidden Markov models (HMMs) is fitted to the motion data using the expectationmaximization (EM) framework. Previous approaches for HMMbased clustering employ a kmeans formulation, where each sequence is assigned to only a single HMM. In contrast, the formulation presented in this paper allows each sequence to belong to more than a single HMM with some probability, and the hard decision about the sequence class membership can be deferred until a later time when such a decision is required. Experiments with simulated data demonstrate the benefit of using this EMbased approach when there is more "overlap" in the processes generating the data. Experiments with real data show the promising potential of HMMbased motion clustering in a number of applications. 1.
Theoretical Framework for Data Mining
 SIGKDD Explorations
"... Research in data mining and knowledge discovery in databases has mostly concentrated on developing good algorithms for various data mining tasks (see for example the recent proceedings of KDD conferences). Some parts of the research effort have gone ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
Research in data mining and knowledge discovery in databases has mostly concentrated on developing good algorithms for various data mining tasks (see for example the recent proceedings of KDD conferences). Some parts of the research effort have gone
Efficient Computation of Stochastic Complexity
 Proceedings of the Ninth International Conference on Artificial Intelligence and Statistics
, 2003
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing ..."
Abstract

Cited by 15 (11 self)
 Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Therefore, in order to be able to apply the stochastic complexity measure in practice, in most cases it has to be approximated. In this paper, we show that for some interesting and important cases with multinomial data sets, the exponentiality can be removed without loss of accuracy. We also introduce a new computationally efficient approximation scheme based on analytic combinatorics and assess its accuracy, together with earlier approximations, by comparing them to the exact form.
Streamwise Feature Selection
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In streamwise feature selection, new features are sequentially considered for addition to a predictive model. When the space of potential features is large, streamwise feature selection offers many advantages over traditional feature selection methods, which assume that all features are known in ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
In streamwise feature selection, new features are sequentially considered for addition to a predictive model. When the space of potential features is large, streamwise feature selection offers many advantages over traditional feature selection methods, which assume that all features are known in advance. Features can be generated dynamically, focusing the search for new features on promising subspaces, and overfitting can be controlled by dynamically adjusting the threshold for adding features to the model. In contrast to traditional forward feature selection algorithms such as stepwise regression in which at each step all possible features are evaluated and the best one is selected, streamwise feature selection only evaluates each feature once when it is generated. We describe informationinvesting and #investing, two adaptive complexity penalty methods for streamwise feature selection which dynamically adjust the threshold on the error reduction required for adding a new feature. These two methods give false discovery rate style guarantees against overfitting. They differ
Learning probabilistic tree grammars for genetic programming
 In Parallel Problem Solving from Nature
, 2004
"... Abstract. Genetic Programming (GP) provides evolutionary methods for problems with tree representations. A recent development in Genetic Algorithms (GAs) has led to principled algorithms called Estimation–of– Distribution Algorithms (EDAs). EDAs identify and exploit structural features of a problem’ ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Abstract. Genetic Programming (GP) provides evolutionary methods for problems with tree representations. A recent development in Genetic Algorithms (GAs) has led to principled algorithms called Estimation–of– Distribution Algorithms (EDAs). EDAs identify and exploit structural features of a problem’s structure during optimization. Here, we investigate the use of a specific EDA for GP. We develop a probabilistic model that employs transformations of production rules in a context–free grammar to represent local structures. The results of performing experiments on two benchmark problems demonstrate the feasibility of the approach. 1
Univariate Polynomial Inference by Monte Carlo Message Length Approximation
 in Int. Conf. Machine Learning
, 2002
"... We apply the Message from Monte Carlo (MMC) algorithm to inference of univariate polynomials. MMC is an algorithm for point estimation from a Bayesian posterior sample. ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
We apply the Message from Monte Carlo (MMC) algorithm to inference of univariate polynomials. MMC is an algorithm for point estimation from a Bayesian posterior sample.
Data Smoothing Regularization, MultiSetsLearning, and Problem Solving Strategies
, 2003
"... First, we briefly introduce the basic idea of data smoothing regularization, which was firstly proposed by Xu [Brainlike computing and intelligent information systems (1997) 241] for parameter learning in a way similar to Tikhonov regularization but with an easy solution to the difficulty of determ ..."
Abstract

Cited by 9 (9 self)
 Add to MetaCart
First, we briefly introduce the basic idea of data smoothing regularization, which was firstly proposed by Xu [Brainlike computing and intelligent information systems (1997) 241] for parameter learning in a way similar to Tikhonov regularization but with an easy solution to the difficulty of determining an appropriate hyperparameter. Also, the roles of this regularization are demonstrated on Gaussianmixture via smoothed versions of the EM algorithm, the BYY model selection criterion, adaptive harmony algorithm as well as its related Rival penalized competitive learning. Second, these studies are extended to a mixture of reconstruction errors of Gaussian types, which provides a new probabilistic formulation for the multisets learning approach [Proc. IEEE ICNN94 I (1994) 315] that learns multiple objects in typical geometrical structures such as points, lines, hyperplanes, circles, ellipses, and templates of given shapes. Finally, insights are provided on three problem solving strategies, namely the competitionpenalty adaptation based learning, the global evidence accumulation based selection, and the guesstest based decision, with a general problem solving paradigm suggested.
Independent component analysis and extensions with noise and time: A Bayesian Ying–Yang learning perspective
 Neural Information Processing Letters and Reviews
, 2003
"... Abstract — After summarizing typical approaches for solving independent component analysis (ICA) problems, advances on the ICA studies that consider hybrid sources of both subGaussians and superGaussians and the ICA extensions that consider noise and temporal dependence among observations have been ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
Abstract — After summarizing typical approaches for solving independent component analysis (ICA) problems, advances on the ICA studies that consider hybrid sources of both subGaussians and superGaussians and the ICA extensions that consider noise and temporal dependence among observations have been overviewed from the perspective of Bayesian YingYang independence learning. Not only new insights are provided on existing results in literature, but also a number of further results are presented.