Results 1 
4 of
4
Building Cost Functions Minimizing to Some Summary Statistics
, 2000
"... A learning machineor a modelis usually trained by minimizing a given criterion (the expectation of the cost function) , measuring the discrepancy between the model output and the desired output. As is already well known, the choice of the cost function has a profound impact on the probabilisti ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
A learning machineor a modelis usually trained by minimizing a given criterion (the expectation of the cost function) , measuring the discrepancy between the model output and the desired output. As is already well known, the choice of the cost function has a profound impact on the probabilistic interpretation of the output of the model, after training. In this work, we use the calculus of variations in order to tackle this problem. In particular, we derive necessary and sufficient conditions on the cost function ensuring that the output of the trained model approximates 1) the conditional expectation of the desired output given the explanatory variables; 2) the conditional median (and, more generally, the quantile); 3) the conditional geometric mean; and 4) the conditional variance. The same method could be applied to the estimation of other summary statistics as well. We also argue that the least absolute deviations criterion could, in some cases, act as an alternative to the ordinary least squares criterion for nonlinear regression. In the same vein, the concept of "regression quantile" is briefly discussed.
NEURAL ARCHITECTURES FOR PARAMETRIC ESTIMATION OF A POSTERIORI PROBABILITIES BY CONSTRAINED CONDITIONAL DENSITY FUNCTIONS
"... Abstract. A new approach to the estimation of ‘a posteriori ’ class probabilities using neural networks, the Joint Network and Data Density Estimation (JNDDE), is presented in this paper. It is based on the estimation of the conditional data density functions, with some restrictions imposed by the c ..."
Abstract
 Add to MetaCart
Abstract. A new approach to the estimation of ‘a posteriori ’ class probabilities using neural networks, the Joint Network and Data Density Estimation (JNDDE), is presented in this paper. It is based on the estimation of the conditional data density functions, with some restrictions imposed by the classifier structure; the Bayed rule is used to obtain the ‘a posteriori’ probabilities from these densities. The proposed method is applied to three different network structures: the logistic perceptron (for the binary case), the softmax perceptron (for multiclass problems) and a generalized softmax perceptron (that can be used to map arbitrarily complex probability functions). Gaussian mixture models are used for the conditional densities. The method has the advantage of establishing a distinction between the network architecture constraints and the model of the data, separating network parameters and the model parameters. Complexity on any of them can be fixed as desired. Maximum Likelihood gradientbased rules for the estimation of the parameters can be obtained. It is shown that JNDDE exhibits a more robust convergence characteristics than other methods of a posteriori probability estimation, such as those based on the minimization of a Strict Sense Bayesian (SSB) cost function.
Controls, Bipolar Disorder, and Schizophrenia Using Intrinsic Connectivity Maps From fMRI Data
, 2010
"... Abstract—We present a method for supervised, automatic, and reliable classification of healthy controls, patients with bipolar disorder, and patients with schizophrenia using brain imaging data. The method uses four supervised classification learning machines trained with a stochastic gradient learn ..."
Abstract
 Add to MetaCart
Abstract—We present a method for supervised, automatic, and reliable classification of healthy controls, patients with bipolar disorder, and patients with schizophrenia using brain imaging data. The method uses four supervised classification learning machines trained with a stochastic gradient learning rule based on the minimization of Kullback–Leibler divergence and an optimal model complexity search through posterior probability estimation. Prior to classification, given the high dimensionality of functional MRI (fMRI) data, a dimension reduction stage comprising two steps is performed: first, a onesample univariate ttest meandifference Tscore approach is used to reduce the number of significant discriminative functional activated voxels, and then singular value decomposition is performed to further reduce the dimension of the input patterns to a number comparable to the limited number of subjects available for each of the three classes. Experimental results
DETECTION IN BREAST CANCER DIAGNOSIS
"... Neural networks (NNs) are customarily used as classifiers aimed at minimizing classification error rates. However, it is known that the NN architectures that compute soft decisions can be used to estimate posterior class probabilities; sometimes, it could be useful to implement general decision rule ..."
Abstract
 Add to MetaCart
Neural networks (NNs) are customarily used as classifiers aimed at minimizing classification error rates. However, it is known that the NN architectures that compute soft decisions can be used to estimate posterior class probabilities; sometimes, it could be useful to implement general decision rules other than the maximum a posteriori