Results 11  20
of
85
Supervised modelbased visualization of highdimensional data
, 2000
"... When highdimensional data vectors are visualized on a two or threedimensional display, the goal is that two vectors close to each other in the multidimensional space should also be close to each other in the lowdimensional space. Traditionally, closeness is defined in terms of some standard ge ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
When highdimensional data vectors are visualized on a two or threedimensional display, the goal is that two vectors close to each other in the multidimensional space should also be close to each other in the lowdimensional space. Traditionally, closeness is defined in terms of some standard geometric distance measure, such as the Euclidean distance, based on a more or less straightforward comparison between the contents of the data vectors. However, such distances do not generally reflect properly the properties of complex problem domains, where changing one bit in a vector may completely change the relevance of the vector. What is more, in realworld situations the similarity of two vectors is not a universal property: even if two vectors can be regarded as similar from one point of view, from another point of view they may appear quite dissimilar. In order to capture these requirements for building a pragmatic and flexible similarity measure, we propose a data visualization scheme where the similarity of two vectors is determined indirectly by using a formal model of the problem domain; in our case, a Bayesian network model. In this scheme, two vectors are considered similar if they lead to similar predictions, when given as input to a Bayesian network model. The scheme is supervised in the sense that different perspectives can be taken into account by using different predictive distributions, i.e., by changing what is to be predicted. In addition, the modeling framework can also be used for validating the rationality of the resulting visualization. This modelbased visualization scheme has been implemented and tested on realworld domains with encouraging results.
Predicting the Stock Market
, 1998
"... This paper presents a tuturial introduction to predictions of stock time series. The various approaches of technical and fundamental analysis is presented and the prediction problem is formulated as a special case of inductive learning. The problems with performance evaluation of nearrandomwalk pr ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
This paper presents a tuturial introduction to predictions of stock time series. The various approaches of technical and fundamental analysis is presented and the prediction problem is formulated as a special case of inductive learning. The problems with performance evaluation of nearrandomwalk processes are illustrated with examples together with guidelines for avoiding the risk of datasnooping. The connections to concepts like "the biasvariance dilemma", overtraining and model complexity are further covered. Existing benchmarks and testing metrics are surveyed and some new measures are introduced.
Consistency of cross validation for comparing regression procedures. Annals of Statistics, Accepted paper
"... Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finitedimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistenc ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finitedimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.
Adaptive Regularization in Neural Network Modeling
, 1997
"... . In this paper we address the important problem of optimizing regularization parameters in neural network modeling. The suggested optimization scheme is an extended version of the recently presented algorithm [24]. The idea is to minimize an empirical estimate  like the crossvalidation estimate ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
. In this paper we address the important problem of optimizing regularization parameters in neural network modeling. The suggested optimization scheme is an extended version of the recently presented algorithm [24]. The idea is to minimize an empirical estimate  like the crossvalidation estimate  of the generalization error with respect to regularization parameters. This is done by employing a simple iterative gradient descent scheme using virtually no additional programming overhead compared to standard training. Experiments with feedforward neural network models for time series prediction and classification tasks showed the viability and robustness of the algorithm. Moreover, we provided some simple theoretical examples in order to illustrate the potential and limitations of the proposed regularization framework. 1 Introduction Neural networks are flexible tools for time series processing and pattern recognition. By increasing the number of hidden neurons in a 2layer architec...
Asymptotic optimality of likelihoodbased crossvalidation
 STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY
, 2003
"... Likelihoodbased crossvalidation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection o ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Likelihoodbased crossvalidation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel) density estimator. In this article, we establish a finite sample result for a general class of likelihoodbased crossvalidation procedures (as indexed by the type of sample splitting used, e.g. Vfold crossvalidation). This result implies that the crossvalidation selector performs asymptotically as well (w.r.t. to the KullbackLeibler distance to the true density) as a benchmark model selector which is optimal for each given dataset and depends on the true density. Crucial conditions of our theorem are that the size of the validation sample converges to infinity, which excludes leaveoneout crossvalidation, and that the candidate density estimates are bounded away from zero and infinity. We illustrate these asymptotic results and the practical performance of likelihoodbased crossvalidation for the purpose of bandwidth selection with a simulation study. Moreover, we use likelihoodbased crossvalidation in the context of regulatory motif detection in DNA sequences.
Predictive Approaches For Choosing Hyperparameters in Gaussian Processes
 Neural Computation
, 1999
"... Gaussian Processes are powerful regression models specified by parametrized mean and covariance functions. Standard approaches to estimate these parameters (known by the name Hyperparameters) are Maximum Likelihood (ML) and Maximum APosterior (MAP) approaches. In this paper, we propose and investiga ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Gaussian Processes are powerful regression models specified by parametrized mean and covariance functions. Standard approaches to estimate these parameters (known by the name Hyperparameters) are Maximum Likelihood (ML) and Maximum APosterior (MAP) approaches. In this paper, we propose and investigate predictive approaches, namely, maximization of Geisser's Surrogate Predictive Probability (GPP) and minimization of mean square error with respect to GPP (referred to as Geisser's Predictive mean square Error (GPE)) to estimate the hyperparameters. We also derive results for the standard CrossValidation (CV) error and make a comparison. These approaches are tested on a number of problems and experimental results show that these approaches are strongly competitive to existing approaches.
Predictive Data Mining with Finite Mixtures
 In Proceedings of The Second International Conference on Knowledge Discovery and Data Mining
, 1996
"... In data mining the goal is to develop methods for discovering previously unknown regularities from databases. The resulting models are interpreted and evaluated by domain experts, but some model evaluation criterion is needed also for the model construction process. The optimal choice would be to us ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
In data mining the goal is to develop methods for discovering previously unknown regularities from databases. The resulting models are interpreted and evaluated by domain experts, but some model evaluation criterion is needed also for the model construction process. The optimal choice would be to use the same criterion as the human experts, but this is usually impossible as the experts are not capable of expressing their evaluation criteria formally. On the other hand, it seems reasonable to assume that any model possessing the capability of making good predictions also captures some structure of the reality. For this reason, in predictive data mining the search for good models is guided by the expected predictive error of the models. In this paper we describe the Bayesian approach to predictive data mining in the finite mixture modeling framework. The finite mixture model family is a natural choice for domains where the data exhibits a clustering structure. In many real world domains ...
Performance Prediction for Exponential Language Models
"... We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set crossentropy for ngram language models. We build models over varying domains, data set sizes, and ngram orders, an ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set crossentropy for ngram language models. We build models over varying domains, data set sizes, and ngram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including classbased models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance. 1
Evaluation of Learning Schemes Used in Information Retrieval
, 1996
"... Searching within the context of information retrieval may be viewed as a communication process between the users and the indexers (or the authors). It is known that in expressing the same concept or idea, different people tend to use different words or phrases, and also that the meaning of words att ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Searching within the context of information retrieval may be viewed as a communication process between the users and the indexers (or the authors). It is known that in expressing the same concept or idea, different people tend to use different words or phrases, and also that the meaning of words attached to document surrogates tends to change over time. To overcome these phenomena, various learning schemes have been designed so as to automatically infer knowledge about document content from the relevance assessments of past queries. Thus, in contrast to most retrieval models that represent the semantic content of documents as static entities, these adaptive search models might change the descriptions of documents through an inductive learning scheme. The evaluation of such dynamic document space strategies may be based on retrospective tests within which the same set of queries is applied to train and test the system. Based on crossvalidation principles, this paper suggests a more "ho...