Results 1  10
of
23
Subspace information criterion for model selection
 Neural Computation
, 2001
"... The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC), which is a generalization of Mallows ’ C L. It is a ..."
Abstract

Cited by 41 (28 self)
 Add to MetaCart
The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC), which is a generalization of Mallows ’ C L. It is assumed that the learning target function belongs to a specified functional Hilbert space and the generalization error is defined as the Hilbert space squared norm of the difference between the learning result function and target function. SIC gives an unbiased estimate of the generalization error so defined. SIC assumes the availability of an unbiased estimate of the target function and the noise covariance matrix, which are generally unknown. A practical calculation method of SIC for least mean squares learning is provided under the assumption that the dimension of the Hilbert space is less than the number of training examples. Finally, computer simulations in two examples show that SIC works well even when the number of training examples is small.
Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network
 Proc. 1st IEEE Computer Society Bioinformatics Conference
, 2002
"... We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is in the estimation of the conditional distribution of each random variable. We consider fitting nonparametric ..."
Abstract

Cited by 41 (18 self)
 Add to MetaCart
We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is in the estimation of the conditional distribution of each random variable. We consider fitting nonparametric regression models with heterogeneous error variances to the microarray gene expression data to capture the nonlinear structures between genes. A problem still remains to be solved in selecting an optimal graph, which gives the best representation of the system among genes. We theoretically derive a new graph selection criterion from Bayes approach in general situations. The proposed method includes previous methods based on Bayesian networks. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae gene expression data newly obtained by disrupting 100 genes. 1.
A solution to the dynamical inverse problem of EEG generation using spatiotemporal Kalman filtering
 NeuroImage
"... www.elsevier.com/locate/ynimg ..."
Algebraic Geometrical Methods for Hierarchical Learning Machines
, 2001
"... Hierarchical learning machines such as layered perceptrons, radial basis functions, gaussian mixtures are nonidentifiable learning machines, whose Fisher information matrices are not positive definite. This fact shows that conventional statistical asymptotic theory can not be applied to the neural ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
Hierarchical learning machines such as layered perceptrons, radial basis functions, gaussian mixtures are nonidentifiable learning machines, whose Fisher information matrices are not positive definite. This fact shows that conventional statistical asymptotic theory can not be applied to the neural network learning theory, for example, either the Bayesian a posteriori probability distribution does not converge to the gaussian distribution, or the generalization error is not in proportion to the number of parameters. The purpose of this paper is to overcome this problem and to clarify the relation between the learning curve of a hierarchical learning machine and the algebraic geometrical structure of the parameter space. We establish an algorithm to calculate the Bayesian stochastic complexity based on blowingup technology in algebraic geometry and prove that the Bayesian generalization error of a hierarchical learning machine is smaller than that of a regular statistical model, even if the true distribution is not contained in the parametric model.
Theoretical and Experimental Evaluation of Subspace Information Criterion
, 2001
"... Recently, a new model selection criterion called the subspace information criterion (SIC) was proposed. SIC works well with small samples since it gives an unbiased estimate of the generalization error with finite samples. In this paper, we theoretically and experimentally evaluate the e#ectiveness ..."
Abstract

Cited by 14 (11 self)
 Add to MetaCart
Recently, a new model selection criterion called the subspace information criterion (SIC) was proposed. SIC works well with small samples since it gives an unbiased estimate of the generalization error with finite samples. In this paper, we theoretically and experimentally evaluate the e#ectiveness of SIC in comparison with existing model selection techniques including the traditional leaveoneout crossvalidation (CV), Mallows's C P , Akaike's information criterion (AIC), Sugiura's corrected AIC (cAIC), Schwarz's Bayesian information criterion (BIC), Rissanen's minimum description length criterion (MDL), and Vapnik's measure (VM). Theoretical evaluation includes the comparison of the generalization measure, approximation method, and restriction on model candidates and learning methods. Experimentally, the performance of SIC in various situations is investigated. The simulations show that SIC outperforms existing techniques especially when the number of training examples is small and the noise variance is large. Keywords supervised learning, generalization capability, model selection, subspace information criterion, small samples Theoretical and Experimental Evaluation of Subspace Information Criterion 2 1
Optimal design of regularization term and regularization parameter by subspace information criterion
 Neural Networks
, 2000
"... The problem of designing the regularization term and regularization parameter for linear regression models is discussed. Previously, we derived an approximation to the generalization error called the subspace information criterion (SIC), which is an unbiased estimator of the generalization error wit ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
The problem of designing the regularization term and regularization parameter for linear regression models is discussed. Previously, we derived an approximation to the generalization error called the subspace information criterion (SIC), which is an unbiased estimator of the generalization error with finite samples under certain conditions. In this paper, we apply SIC to regularization learning and use it for (a) choosing the optimal regularization term and regularization parameter from given candidates, and (b) obtaining the closed form of the optimal regularization parameter for a fixed regularization term. The effectiveness of SIC is demonstrated through computer simulations with artificial and real data. Keywords supervised learning, generalization error, linear regression, regularization learning, ridge regression, model selection, regularization parameter, subspace information criterion Optimal Regularization by SIC 2 Nomenclature f(x) : learning target function D: domain of f(x) xm: mth sample point ym: mth sample value ɛm: mth noise (xm,ym) : mth training example M: the number of training examples y: Mdimensional vector consisting of {ym} M m=1 ɛ: Mdimensional vector consisting of {ɛm} M m=1 ϕp(x) : pth basis function θp: pth coefficient µ: the number of basis functions JG: generalization error JTE: training error JR: regularized training error T: regularization matrix α: regularization parameter A: design matrix XT,α: regularization learning matrix U: µdimensional matrix θ: true parameter ˆθT,α: regularization estimate ˆθu: unbiased estimate σ 2: noise variance 1
Trading Variance Reduction with Unbiasedness  The Regularized Subspace Information Criterion for Robust Model Selection in Kernel Regression
 NEURAL COMPUTATION
, 2004
"... A wellknown result by Stein (1956) shows that in particular situations, biased estimators can yield better parameter estimates than their generally preferred unbiased counterparts. This paper follows the same spirit as we will stabilize the unbiased generalization error estimates by regularizati ..."
Abstract

Cited by 8 (8 self)
 Add to MetaCart
A wellknown result by Stein (1956) shows that in particular situations, biased estimators can yield better parameter estimates than their generally preferred unbiased counterparts. This paper follows the same spirit as we will stabilize the unbiased generalization error estimates by regularization and finally obtain more robust model selection criteria for learning. We trade a small bias against a larger variance reduction which has the beneficial effect of being more precise on a single training set. We focus on the subspace information criterion (SIC), which is an unbiased estimator of the expected generalization error measured by the reproducing kernel Hilbert space norm. SIC can be applied to the kernel regression and it was shown in earlier experiments that a small regularization of SIC has a stabilization effect. However,
Functional Analytic Approach to Model Selection  Subspace Information Criterion
 In Proceedings of 1999 Workshop on InformationBased Induction Sciences (IBIS'99
, 1999
"... : The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC). Computer simulations show that SIC works well ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
: The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC). Computer simulations show that SIC works well even when the number of training examples is small. Keywords: supervised learning, generalization capability, model selection, Hilbert space, projection learning. 1 Introduction Supervised learning is obtaining an underlying rule from given training examples, and can be regarded as a function approximation problem. So far, many learning methods for supervised learning have been developed, including the backpropagation algorithm [4, 19], projection learning [13], Bayesian inference [10], and support vector regression [25, 21]. In these learning methods, the quality of the learning results depends heavily on the complexity of models. Here, models refer to, for example, Hilbert spaces to w...
WEIGHTED LASSO IN GRAPHICAL GAUSSIAN MODELING FOR LARGE GENE NETWORK ESTIMATION BASED ON MICROARRAY DATA
"... We propose a statistical method based on graphical Gaussian models for estimating large gene networks from DNA microarray data. In estimating large gene networks, the number of genes is larger than the number of samples, we need to consider some restrictions for model building. We propose weighted l ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We propose a statistical method based on graphical Gaussian models for estimating large gene networks from DNA microarray data. In estimating large gene networks, the number of genes is larger than the number of samples, we need to consider some restrictions for model building. We propose weighted lasso estimation for the graphical Gaussian models as a model of large gene networks. In the proposed method, the structural learning for gene networks is equivalent to the selection of the regularization parameters included in the weighted lasso estimation. We investigate this problem from a Bayes approach and derive an empirical Bayesian information criterion for choosing them. Unlike Bayesian network approach, our method can find the optimal network structure and does not require to use heuristic structural learning algorithm. We conduct Monte Carlo simulation to show the effectiveness of the proposed method. We also analyze Arabidopsis thaliana microarray data and estimate gene networks.
Learning Coefficients of Layered Models when the True Distribution Mismatches the Singularities
, 2003
"... Hierarchical learning machines such as layered neural networks have singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degen erate, resulting that the conventional learning theory of regular statistical models does not hold. Recently, it was proven that ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Hierarchical learning machines such as layered neural networks have singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degen erate, resulting that the conventional learning theory of regular statistical models does not hold. Recently, it was proven that, if the parameter of the true distribution is con tained in the singularities of the learning machine, then the generalization error in Bayes estimation is asymptotically equal to / where is smaller than the dimension of the parameter and is the number of training samples. However, the constant strongly depends on the local geometrical structure of singularities, hence the generalization error is not yet clarified when the true distribution is almost but not completely contained in the singularities. In this paper, in order to analyze such cases, we study the Bayes gen eralization error under the condition that the Kullback distance of the true distribution from the distribution represented by singularities is in proportion to l/s, and show two results. (1) If the dimension of the parameter from inputs to hidden units is not larger than three, then there exits a region of true parameters such that the generalization error is larger than that of the corresponding regular model. (2) However, if the dimension from inputs to hidden units is larger than three, then for arbitrary true distribution, the generalization error is smaller than that of the corresponding regular model.