Results 1 - 10
of
18
The role of Occam’s Razor in knowledge discovery
- Data Mining and Knowledge Discovery
, 1999
"... Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite di ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite different ways. The first interpretation (simplicity is a goal in itself) is essentially correct, but is at heart a preference for more comprehensible models. The second interpretation (simplicity leads to greater accuracy) is much more problematic. A critical review of the theoretical arguments for and against it shows that it is unfounded as a universal principle, and demonstrably false. A review of empirical evidence shows that it also fails as a practical heuristic. This article argues that its continued use in KDD risks causing significant opportunities to be missed, and should therefore be restricted to the comparatively few applications where it is appropriate. The article proposes and reviews the use of domain constraints as an alternative for avoiding overfitting, and examines possible methods for handling the accuracy–comprehensibility trade-off.
Occam's Two Razors: The Sharp and the Blunt
- In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining
, 1998
"... Occam's razor has been the subject of much controversy. This paper argues that this is partly because it has been interpreted in two quite different ways, the first of which (simplicity is a goal in itself) is essentially correct, while the second (simplicity leads to greater accuracy) is not. The p ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
Occam's razor has been the subject of much controversy. This paper argues that this is partly because it has been interpreted in two quite different ways, the first of which (simplicity is a goal in itself) is essentially correct, while the second (simplicity leads to greater accuracy) is not. The paper reviews the large variety of theoretical arguments and empirical evidence for and against the "second razor," and concludes that the balance is strongly against it. In particular, it builds on the case of (Schaffer, 1993) and (Webb, 1996) by considering additional theoretical arguments and recent empirical evidence that the second razor fails in most domains. A version of the first razor more appropriate to KDD is proposed, and we argue that continuing to apply the second razor risks causing significant opportunities to be missed. 1 Occam's Two Razors William of Occam's famous razor states that "Nunquam ponenda est pluralitas sin necesitate," which, approximately translated, means "En...
Applications of recursive segmentation to the analysis of DNA sequences
, 2002
"... Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G+C)/weak(A+T) sequence, to a binary se ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G+C)/weak(A+T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicating both the base and the codon position information. We apply various conversion schemes in order to address the following five DNA sequence analysis problems: isochore mapping, CpG island detection, locating the origin and terminus of replication in bacterial genomes, finding complex repeats in telomere sequences, and delineating coding and noncoding regions. We find that the recursive segmentation procedure can successfully detect isochore borders, CpG islands, and the origin and terminus of replication, but it needs improvement for detecting complex repeats as well as borders between coding and noncoding regions. 2002 Elsevier Science Ltd. All rights reserved.
A Process-Oriented Heuristic for Model Selection
, 1998
"... Current methods to avoid overfitting are either data-oriented (using separate data for validation) or representation-oriented (penalizing complexity in the model). This paper proposes process-oriented evaluation, where a model's expected generalization error is computed as a function of the search p ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
Current methods to avoid overfitting are either data-oriented (using separate data for validation) or representation-oriented (penalizing complexity in the model). This paper proposes process-oriented evaluation, where a model's expected generalization error is computed as a function of the search process that led to it. The paper develops the necessary theoretical framework, and applies it to one type of learning: rule induction. A process-oriented version of the CN2 rule learner is empirically compared with the default CN2. The process-oriented version is more accurate in a large majority of the datasets, with high significance, and also produces simpler models. Experiments in artificial domains suggest that processoriented evaluation is particularly useful in high-dimensional domains. 1 INTRODUCTION Overfitting avoidance is often considered the central problem of machine learning (e.g., (Cheeseman & Oldford, 1994)). If a learner is sufficiently powerful, it must guard against selec...
Range Image Segmentation by an Effective Jump-Diffusion Method
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2004
"... Abstract—This paper presents an effective jump-diffusion method for segmenting a range image and its associated reflectance image in the Bayesian framework. The algorithm works on complex real-world scenes (indoor and outdoor), which consist of an unknown number of objects (or surfaces) of various s ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Abstract—This paper presents an effective jump-diffusion method for segmenting a range image and its associated reflectance image in the Bayesian framework. The algorithm works on complex real-world scenes (indoor and outdoor), which consist of an unknown number of objects (or surfaces) of various sizes and types, such as planes, conics, smooth surfaces, and cluttered objects (like trees and bushes). Formulated in the Bayesian framework, the posterior probability is distributed over a solution space with a countable number of subspaces of varying dimensions. The algorithm simulates Markov chains with both reversible jumps and stochastic diffusions to traverse the solution space. The reversible jumps realize the moves between subspaces of different dimensions, such as switching surface models and changing the number of objects. The stochastic Langevin equation realizes diffusions within each subspace. To achieve effective computation, the algorithm precomputes some importance proposal probabilities over multiple scales through Hough transforms, edge detection, and data clustering. The latter are used by the Markov chains for fast mixing. The algorithm is tested on 100 1D simulated data sets for performance analysis on both accuracy and speed. Then, the algorithm is applied to three data sets of range images under the same parameter setting. The results are satisfactory in comparison with manual segmentations.
Kinky Tomographic Reconstruction
, 1996
"... We address the issue of how to make decisions about the degree of smoothness demanded of a flexible contour used to model the boundary of a 2D object. We demonstrate the use of a Bayesian approach to set the strength of the smoothness prior for a tomographic reconstruction problem. The Akaike Inform ..."
Abstract
-
Cited by 13 (10 self)
- Add to MetaCart
We address the issue of how to make decisions about the degree of smoothness demanded of a flexible contour used to model the boundary of a 2D object. We demonstrate the use of a Bayesian approach to set the strength of the smoothness prior for a tomographic reconstruction problem. The Akaike Information Criterion is used to determine whether to allow a kink in the contour.
Robust model-based vasculature detection in noisy biomedical images
- IEEE Transactions on Information Technology in Biomedicine
, 2004
"... Abstract—This paper presents a set of algorithms for robust detection of vasculature in noisy retinal video images. Three methods are studied for effective handling of outliers. The first method is based on Huber’s censored likelihood ratio test. The second is based on the use of a-trimmed test stat ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Abstract—This paper presents a set of algorithms for robust detection of vasculature in noisy retinal video images. Three methods are studied for effective handling of outliers. The first method is based on Huber’s censored likelihood ratio test. The second is based on the use of a-trimmed test statistic. The third is based on robust model selection algorithms. All of these algorithms rely on a mathematical model for the vasculature that accounts for the expected variations in intensity/texture profile, width, orientation, scale, and imaging noise. These unknown parameters are estimated implicitly within a robust detection and estimation framework. The proposed algorithms are also useful as nonlinear vessel enhancement filters. The proposed algorithms were evaluated over carefully constructed phantom images, where the ground truth is known a priori, as well as clinically recorded images for which the ground truth was manually compiled. A comparative evaluation of the proposed approaches is presented. Collectively, these methods outperformed prior approaches based on Chaudhuri et al. (1989) matched filtering, as well as the verification methods used by prior exploratory tracing algorithms, such as the work of Can et al. (1999). The Huber censored likelihood test yielded the best overall improvement, with a 145.7 % improvement over the exploratory tracing algorithm, and a 43.7 % improvement in detection rates over the matched filter. Index Terms—Hypothesis testing, mathematical models of vasculature, retinal fundus images, robust model selection, vasculature detection and segmentation, vessel enhancement, vessel segmentation. I.
GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE
- SUBMITTED TO THE ANNALS OF STATISTICS
, 2007
"... Let Y be a Gaussian vector whose components are independent with a common unknown variance. We consider the problem of estimating the mean µ of Y by model selection. More precisely, we start with a collection S = {Sm, m ∈ M} of linear subspaces of R n and associate to each of these the least-squares ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Let Y be a Gaussian vector whose components are independent with a common unknown variance. We consider the problem of estimating the mean µ of Y by model selection. More precisely, we start with a collection S = {Sm, m ∈ M} of linear subspaces of R n and associate to each of these the least-squares estimator of µ on Sm. Then, we use a data driven penalized criterion in order to select one estimator among these. Our first objective is to analyze the performance of estimators associated to classical criteria such as FPE, AIC, BIC and AMDL. Our second objective is to propose better penalties that are versatile enough to take into account both the complexity of the collection S and the sample size. Then we apply those to solve various statistical problems such as variable selection, change point detections and signal estimation among others. Our results are based on a non-asymptotic risk bound with respect to the Euclidean loss for the selected estimator. Some analogous results are also established for the Kullback loss.
Likelihood inference in nearest-neighbour classification models
, 2003
"... Traditionally the neighbourhood size k in the k-nearest-neighbour algorithm is either fixed at the first nearest neighbour or is selected on the basis of a crossvalidation study. In this paper we present an alternative approach that develops the k-nearest-neighbour algorithm using likelihood-based i ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Traditionally the neighbourhood size k in the k-nearest-neighbour algorithm is either fixed at the first nearest neighbour or is selected on the basis of a crossvalidation study. In this paper we present an alternative approach that develops the k-nearest-neighbour algorithm using likelihood-based inference. Our method takes the form of a generalised linear regression on a set of k-nearest-neighbour autocovariates. By defining the k-nearest-neighbour algorithm in this way we are able to extend the method to accommodate the original predictor variables as possible linear effects as well as allowing for the inclusion of multiple nearest-neighbour terms. The choice of the final model proceeds via a stepwise regression procedure. It is shown that our method incorporates a conventional generalised linear model and a conventional k-nearest-neighbour algorithm as special cases. Empirical results suggest that the method out-performs the standard k-nearest-neighbour method in terms of misclassification rate on a wide variety of datasets.
An improved Akaike Information Criterion for State-space Model Selection
- Comput. Stat. Data An
, 2006
"... Following the work of Hurvich, Shumway, and Tsai (1990), we propose an “improved ” variant of the Akaike information criterion, AICi, for state-space model selection. The variant is based on Akaike’s (1973) objective of estimating the Kullback-Leibler information (Kullback 1968) between the densitie ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Following the work of Hurvich, Shumway, and Tsai (1990), we propose an “improved ” variant of the Akaike information criterion, AICi, for state-space model selection. The variant is based on Akaike’s (1973) objective of estimating the Kullback-Leibler information (Kullback 1968) between the densities corresponding to the fitted model and the generating or true model. The development of AICi proceeds by decomposing the expected information into two terms. The first term suggests that the empirical log likelihood can be used to form a biased estimator of the information; the second term provides the bias adjustment. Exact computation of the bias adjustment requires the values of the true model parameters, which are inaccessible in practical applications. Yet for fitted models in the candidate class that are correctly specified or overfit, the adjustment is asymptotically independent of the true parameters. Thus, in certain settings, the adjustment may be estimated via Monte Carlo simulations by using conveniently chosen simulation parameters as proxies for the true parameters. We present simulation results to evaluate the performance of AICi both as an estimator of the Kullback-Leibler information and as a model selection criterion. Our results indicate that AICi estimates the information with less bias than traditional AIC. Furthermore, AICi serves as an effective tool for selecting a model of appropriate dimension. Keywords: AIC, Kullback-Leibler information, Kullback’s directed divergence, state-space model, time series analysis.

