Results 1  10
of
61
Toward a method of selecting among computational models of cognition
 Psychological Review
, 2002
"... The question of how one should decide among competing explanations of data is at the heart of the scientific enterprise. Computational models of cognition are increasingly being advanced as explanations of behavior. The success of this line of inquiry depends on the development of robust methods to ..."
Abstract

Cited by 80 (4 self)
 Add to MetaCart
The question of how one should decide among competing explanations of data is at the heart of the scientific enterprise. Computational models of cognition are increasingly being advanced as explanations of behavior. The success of this line of inquiry depends on the development of robust methods to guide the evaluation and selection of these models. This article introduces a method of selecting among mathematical models of cognition known as minimum description length, which provides an intuitive and theoretically wellgrounded understanding of why one model should be chosen. A central but elusive concept in model selection, complexity, can also be derived with the method. The adequacy of the method is demonstrated in 3 areas of cognitive modeling: psychophysics, information integration, and categorization. How should one choose among competing theoretical explanations of data? This question is at the heart of the scientific enterprise, regardless of whether verbal models are being tested in an experimental setting or computational models are being evaluated in simulations. A number of criteria have been proposed to assist in this endeavor, summarized nicely by Jacobs and Grainger
MDL Denoising
 IEEE Transactions on Information Theory
, 1999
"... The socalled denoising problem, relative to normal models for noise, is formalized such that `noise' is defined as the incompressible part in the data while the compressible part defines the meaningful information bearing signal. Such a decomposition is effected by minimization of the ideal ..."
Abstract

Cited by 50 (9 self)
 Add to MetaCart
The socalled denoising problem, relative to normal models for noise, is formalized such that `noise' is defined as the incompressible part in the data while the compressible part defines the meaningful information bearing signal. Such a decomposition is effected by minimization of the ideal code length, called for by the Minimum Description Length (MDL) principle, and obtained by an application of the normalized maximum likelihood technique to the primary parameters, their range, and their number. For any orthonormal regression matrix, such as defined by wavelet transforms, the minimization can be done with a threshold for the squared coefficients resulting from the expansion of the data sequence in the basis vectors defined by the matrix. keywords: linear regression, wavelet transforms, threshold, stochastic complexity, Kolmogorov sufficient statistics 1 Introduction Intuitively speaking the socalled `denoising' problem is to separate an observed data sequence x 1 ; x 2 ; ...
Discovering Clusters in Motion TimeSeries Data
 In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2003
"... A new approach is proposed for clustering timeseries data. The approach can be used to discover groupings of similar object motions that were observed in a video collection. A finite mixture of hidden Markov models (HMMs) is fitted to the motion data using the expectationmaximization (EM) framewor ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
A new approach is proposed for clustering timeseries data. The approach can be used to discover groupings of similar object motions that were observed in a video collection. A finite mixture of hidden Markov models (HMMs) is fitted to the motion data using the expectationmaximization (EM) framework. Previous approaches for HMMbased clustering employ a kmeans formulation, where each sequence is assigned to only a single HMM. In contrast, the formulation presented in this paper allows each sequence to belong to more than a single HMM with some probability, and the hard decision about the sequence class membership can be deferred until a later time when such a decision is required. Experiments with simulated data demonstrate the benefit of using this EMbased approach when there is more "overlap" in the processes generating the data. Experiments with real data show the promising potential of HMMbased motion clustering in a number of applications. 1.
Theoretical Framework for Data Mining
 SIGKDD Explorations
"... Research in data mining and knowledge discovery in databases has mostly concentrated on developing good algorithms for various data mining tasks (see for example the recent proceedings of KDD conferences). Some parts of the research effort have gone ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
Research in data mining and knowledge discovery in databases has mostly concentrated on developing good algorithms for various data mining tasks (see for example the recent proceedings of KDD conferences). Some parts of the research effort have gone
Efficient Computation of Stochastic Complexity
 Proceedings of the Ninth International Conference on Artificial Intelligence and Statistics
, 2003
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing ..."
Abstract

Cited by 16 (11 self)
 Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Therefore, in order to be able to apply the stochastic complexity measure in practice, in most cases it has to be approximated. In this paper, we show that for some interesting and important cases with multinomial data sets, the exponentiality can be removed without loss of accuracy. We also introduce a new computationally efficient approximation scheme based on analytic combinatorics and assess its accuracy, together with earlier approximations, by comparing them to the exact form.
Streamwise Feature Selection
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In streamwise feature selection, new features are sequentially considered for addition to a predictive model. When the space of potential features is large, streamwise feature selection offers many advantages over traditional feature selection methods, which assume that all features are known in ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
In streamwise feature selection, new features are sequentially considered for addition to a predictive model. When the space of potential features is large, streamwise feature selection offers many advantages over traditional feature selection methods, which assume that all features are known in advance. Features can be generated dynamically, focusing the search for new features on promising subspaces, and overfitting can be controlled by dynamically adjusting the threshold for adding features to the model. In contrast to traditional forward feature selection algorithms such as stepwise regression in which at each step all possible features are evaluated and the best one is selected, streamwise feature selection only evaluates each feature once when it is generated. We describe informationinvesting and #investing, two adaptive complexity penalty methods for streamwise feature selection which dynamically adjust the threshold on the error reduction required for adding a new feature. These two methods give false discovery rate style guarantees against overfitting. They differ
Learning probabilistic tree grammars for genetic programming
 In Parallel Problem Solving from Nature
, 2004
"... Abstract. Genetic Programming (GP) provides evolutionary methods for problems with tree representations. A recent development in Genetic Algorithms (GAs) has led to principled algorithms called Estimation–of– Distribution Algorithms (EDAs). EDAs identify and exploit structural features of a problem’ ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Abstract. Genetic Programming (GP) provides evolutionary methods for problems with tree representations. A recent development in Genetic Algorithms (GAs) has led to principled algorithms called Estimation–of– Distribution Algorithms (EDAs). EDAs identify and exploit structural features of a problem’s structure during optimization. Here, we investigate the use of a specific EDA for GP. We develop a probabilistic model that employs transformations of production rules in a context–free grammar to represent local structures. The results of performing experiments on two benchmark problems demonstrate the feasibility of the approach. 1
Data Smoothing Regularization, MultiSetsLearning, and Problem Solving Strategies
, 2003
"... First, we briefly introduce the basic idea of data smoothing regularization, which was firstly proposed by Xu [Brainlike computing and intelligent information systems (1997) 241] for parameter learning in a way similar to Tikhonov regularization but with an easy solution to the difficulty of determ ..."
Abstract

Cited by 10 (10 self)
 Add to MetaCart
First, we briefly introduce the basic idea of data smoothing regularization, which was firstly proposed by Xu [Brainlike computing and intelligent information systems (1997) 241] for parameter learning in a way similar to Tikhonov regularization but with an easy solution to the difficulty of determining an appropriate hyperparameter. Also, the roles of this regularization are demonstrated on Gaussianmixture via smoothed versions of the EM algorithm, the BYY model selection criterion, adaptive harmony algorithm as well as its related Rival penalized competitive learning. Second, these studies are extended to a mixture of reconstruction errors of Gaussian types, which provides a new probabilistic formulation for the multisets learning approach [Proc. IEEE ICNN94 I (1994) 315] that learns multiple objects in typical geometrical structures such as points, lines, hyperplanes, circles, ellipses, and templates of given shapes. Finally, insights are provided on three problem solving strategies, namely the competitionpenalty adaptation based learning, the global evidence accumulation based selection, and the guesstest based decision, with a general problem solving paradigm suggested.
Univariate Polynomial Inference by Monte Carlo Message Length Approximation
 in Int. Conf. Machine Learning
, 2002
"... We apply the Message from Monte Carlo (MMC) algorithm to inference of univariate polynomials. MMC is an algorithm for point estimation from a Bayesian posterior sample. ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
We apply the Message from Monte Carlo (MMC) algorithm to inference of univariate polynomials. MMC is an algorithm for point estimation from a Bayesian posterior sample.
A Matrix Approach for Finding Extrema: PROBLEMS WITH MODULARITY, HIERARCHY, AND OVERLAP
, 2006
"... Unlike most simple textbook examples, the real world is full with complex systems, and researchers in many different fields are often confronted by problems arising from such systems. Simple heuristics or even enumeration works quite well on small and easy problems; however, to efficiently solve lar ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Unlike most simple textbook examples, the real world is full with complex systems, and researchers in many different fields are often confronted by problems arising from such systems. Simple heuristics or even enumeration works quite well on small and easy problems; however, to efficiently solve large and difficult problems, proper decomposition according to the complex system is the key. In this research project, investigating and analyzing interactions between components of complex systems shed some light on problem decomposition. By recognizing three barebone types of interactions—modularity, hierarchy, and overlap, theories and models are developed to dissect and inspect problem decomposition in the context of genetic algorithms. This dissertation presents a research project to develop a competent optimization method to solve boundedly difficult problems with modularity, hierarchy, and overlap by explicit problem decomposition. The proposed genetic algorithm design utilizes a matrix representation of an interaction graph to analyze and decompose the problem. The results from this thesis should benefit research both technically and scientifically. Technically, this thesis develops an automated dependency structure matrix clustering technique and utilizes it to design a competent blackbox problem solver. Scientifically, the explicit interaction