Results 1  10
of
25
The role of Occam’s Razor in knowledge discovery
 Data Mining and Knowledge Discovery
, 1999
"... Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite di ..."
Abstract

Cited by 78 (3 self)
 Add to MetaCart
Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite different ways. The first interpretation (simplicity is a goal in itself) is essentially correct, but is at heart a preference for more comprehensible models. The second interpretation (simplicity leads to greater accuracy) is much more problematic. A critical review of the theoretical arguments for and against it shows that it is unfounded as a universal principle, and demonstrably false. A review of empirical evidence shows that it also fails as a practical heuristic. This article argues that its continued use in KDD risks causing significant opportunities to be missed, and should therefore be restricted to the comparatively few applications where it is appropriate. The article proposes and reviews the use of domain constraints as an alternative for avoiding overfitting, and examines possible methods for handling the accuracy–comprehensibility tradeoff.
Applications of recursive segmentation to the analysis of DNA sequences
, 2002
"... Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G+C)/weak(A+T) sequence, to a binary se ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G+C)/weak(A+T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicating both the base and the codon position information. We apply various conversion schemes in order to address the following five DNA sequence analysis problems: isochore mapping, CpG island detection, locating the origin and terminus of replication in bacterial genomes, finding complex repeats in telomere sequences, and delineating coding and noncoding regions. We find that the recursive segmentation procedure can successfully detect isochore borders, CpG islands, and the origin and terminus of replication, but it needs improvement for detecting complex repeats as well as borders between coding and noncoding regions. 2002 Elsevier Science Ltd. All rights reserved.
Occam's Two Razors: The Sharp and the Blunt
 In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining
, 1998
"... Occam's razor has been the subject of much controversy. This paper argues that this is partly because it has been interpreted in two quite different ways, the first of which (simplicity is a goal in itself) is essentially correct, while the second (simplicity leads to greater accuracy) is not. The p ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
Occam's razor has been the subject of much controversy. This paper argues that this is partly because it has been interpreted in two quite different ways, the first of which (simplicity is a goal in itself) is essentially correct, while the second (simplicity leads to greater accuracy) is not. The paper reviews the large variety of theoretical arguments and empirical evidence for and against the "second razor," and concludes that the balance is strongly against it. In particular, it builds on the case of (Schaffer, 1993) and (Webb, 1996) by considering additional theoretical arguments and recent empirical evidence that the second razor fails in most domains. A version of the first razor more appropriate to KDD is proposed, and we argue that continuing to apply the second razor risks causing significant opportunities to be missed. 1 Occam's Two Razors William of Occam's famous razor states that "Nunquam ponenda est pluralitas sin necesitate," which, approximately translated, means "En...
Range Image Segmentation by an Effective JumpDiffusion Method
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2004
"... Abstract—This paper presents an effective jumpdiffusion method for segmenting a range image and its associated reflectance image in the Bayesian framework. The algorithm works on complex realworld scenes (indoor and outdoor), which consist of an unknown number of objects (or surfaces) of various s ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
Abstract—This paper presents an effective jumpdiffusion method for segmenting a range image and its associated reflectance image in the Bayesian framework. The algorithm works on complex realworld scenes (indoor and outdoor), which consist of an unknown number of objects (or surfaces) of various sizes and types, such as planes, conics, smooth surfaces, and cluttered objects (like trees and bushes). Formulated in the Bayesian framework, the posterior probability is distributed over a solution space with a countable number of subspaces of varying dimensions. The algorithm simulates Markov chains with both reversible jumps and stochastic diffusions to traverse the solution space. The reversible jumps realize the moves between subspaces of different dimensions, such as switching surface models and changing the number of objects. The stochastic Langevin equation realizes diffusions within each subspace. To achieve effective computation, the algorithm precomputes some importance proposal probabilities over multiple scales through Hough transforms, edge detection, and data clustering. The latter are used by the Markov chains for fast mixing. The algorithm is tested on 100 1D simulated data sets for performance analysis on both accuracy and speed. Then, the algorithm is applied to three data sets of range images under the same parameter setting. The results are satisfactory in comparison with manual segmentations.
Kinky Tomographic Reconstruction
, 1996
"... We address the issue of how to make decisions about the degree of smoothness demanded of a flexible contour used to model the boundary of a 2D object. We demonstrate the use of a Bayesian approach to set the strength of the smoothness prior for a tomographic reconstruction problem. The Akaike Inform ..."
Abstract

Cited by 17 (10 self)
 Add to MetaCart
We address the issue of how to make decisions about the degree of smoothness demanded of a flexible contour used to model the boundary of a 2D object. We demonstrate the use of a Bayesian approach to set the strength of the smoothness prior for a tomographic reconstruction problem. The Akaike Information Criterion is used to determine whether to allow a kink in the contour.
A ProcessOriented Heuristic for Model Selection
, 1998
"... Current methods to avoid overfitting are either dataoriented (using separate data for validation) or representationoriented (penalizing complexity in the model). This paper proposes processoriented evaluation, where a model's expected generalization error is computed as a function of the search p ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
Current methods to avoid overfitting are either dataoriented (using separate data for validation) or representationoriented (penalizing complexity in the model). This paper proposes processoriented evaluation, where a model's expected generalization error is computed as a function of the search process that led to it. The paper develops the necessary theoretical framework, and applies it to one type of learning: rule induction. A processoriented version of the CN2 rule learner is empirically compared with the default CN2. The processoriented version is more accurate in a large majority of the datasets, with high significance, and also produces simpler models. Experiments in artificial domains suggest that processoriented evaluation is particularly useful in highdimensional domains. 1 INTRODUCTION Overfitting avoidance is often considered the central problem of machine learning (e.g., (Cheeseman & Oldford, 1994)). If a learner is sufficiently powerful, it must guard against selec...
GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2007
"... Let Y be a Gaussian vector whose components are independent with a common unknown variance. We consider the problem of estimating the mean µ of Y by model selection. More precisely, we start with a collection S = {Sm, m ∈ M} of linear subspaces of R n and associate to each of these the leastsquares ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
Let Y be a Gaussian vector whose components are independent with a common unknown variance. We consider the problem of estimating the mean µ of Y by model selection. More precisely, we start with a collection S = {Sm, m ∈ M} of linear subspaces of R n and associate to each of these the leastsquares estimator of µ on Sm. Then, we use a data driven penalized criterion in order to select one estimator among these. Our first objective is to analyze the performance of estimators associated to classical criteria such as FPE, AIC, BIC and AMDL. Our second objective is to propose better penalties that are versatile enough to take into account both the complexity of the collection S and the sample size. Then we apply those to solve various statistical problems such as variable selection, change point detections and signal estimation among others. Our results are based on a nonasymptotic risk bound with respect to the Euclidean loss for the selected estimator. Some analogous results are also established for the Kullback loss.
Robust modelbased vasculature detection in noisy biomedical images
 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE
, 2004
"... This paper presents a set of algorithms for robust detection of vasculature in noisy retinal video images. Three methods are studied for effective handling of outliers. The first method is based on Huber’s censored likelihood ratio test. The second is based on the use of atrimmed test statistic. T ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
This paper presents a set of algorithms for robust detection of vasculature in noisy retinal video images. Three methods are studied for effective handling of outliers. The first method is based on Huber’s censored likelihood ratio test. The second is based on the use of atrimmed test statistic. The third is based on robust model selection algorithms. All of these algorithms rely on a mathematical model for the vasculature that accounts for the expected variations in intensity/texture profile, width, orientation, scale, and imaging noise. These unknown parameters are estimated implicitly within a robust detection and estimation framework. The proposed algorithms are also useful as nonlinear vessel enhancement filters. The proposed algorithms were evaluated over carefully constructed phantom images, where the ground truth is known a priori, as well as clinically recorded images for which the ground truth was manually compiled. A comparative evaluation of the proposed approaches is presented. Collectively, these methods outperformed prior approaches based on Chaudhuri et al. (1989) matched filtering, as well as the verification methods used by prior exploratory tracing algorithms, such as the work of Can et al. (1999). The Huber censored likelihood test yielded the best overall improvement, with a 145.7 % improvement over the exploratory tracing algorithm, and a 43.7 % improvement in detection rates over the matched filter.
Likelihood inference in nearestneighbour classification models
, 2003
"... Traditionally the neighbourhood size k in the knearestneighbour algorithm is either fixed at the first nearest neighbour or is selected on the basis of a crossvalidation study. In this paper we present an alternative approach that develops the knearestneighbour algorithm using likelihoodbased i ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Traditionally the neighbourhood size k in the knearestneighbour algorithm is either fixed at the first nearest neighbour or is selected on the basis of a crossvalidation study. In this paper we present an alternative approach that develops the knearestneighbour algorithm using likelihoodbased inference. Our method takes the form of a generalised linear regression on a set of knearestneighbour autocovariates. By defining the knearestneighbour algorithm in this way we are able to extend the method to accommodate the original predictor variables as possible linear effects as well as allowing for the inclusion of multiple nearestneighbour terms. The choice of the final model proceeds via a stepwise regression procedure. It is shown that our method incorporates a conventional generalised linear model and a conventional knearestneighbour algorithm as special cases. Empirical results suggest that the method outperforms the standard knearestneighbour method in terms of misclassification rate on a wide variety of datasets.
The Value of Shared Visual Information for TaskOriented Collaboration
, 2006
"... and by an IBM Ph.D. Fellowship. Any opinions, findings, conclusions, or recommendations expressed in this material ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
and by an IBM Ph.D. Fellowship. Any opinions, findings, conclusions, or recommendations expressed in this material