Results 1 
2 of
2
Statistical Learning Algorithms Based on Bregman Distances
, 1997
"... We present a class of statistical learning algorithms formulated in terms of minimizing Bregman distances, a family of generalized entropy measures associated with convex functions. The inductive learning scheme is akin to growing a decision tree, with the Bregman distance filling the role of the im ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
We present a class of statistical learning algorithms formulated in terms of minimizing Bregman distances, a family of generalized entropy measures associated with convex functions. The inductive learning scheme is akin to growing a decision tree, with the Bregman distance filling the role of the impurity function in treebased classifiers. Our approach is based on two components. In the feature selection step, each linear constraint in a pool of candidate features is evaluated by the reduction in Bregman distance that would result from adding it to the model. In the constraint satisfaction step, all of the parameters are adjusted to minimize the Bregman distance subject to the chosen constraints. We introduce a new iterative estimation algorithm for carrying out both the feature selection and constraint satisfaction steps, and outline a proof of the convergence of these algorithms. 1 Introduction In this paper we present a class of statistical learning algorithms formulated in terms...
Cost functions to estimate a posteriori probabilities in multiclass problems
 IEEE Trans. Neural Networks
, 1999
"... Abstract—The problem of designing cost functions to estimate a posteriori probabilities in multiclass problems is addressed in this paper. We establish necessary and sufficient conditions that these costs must satisfy in oneclass oneoutput networks whose outputs are consistent with probability law ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Abstract—The problem of designing cost functions to estimate a posteriori probabilities in multiclass problems is addressed in this paper. We establish necessary and sufficient conditions that these costs must satisfy in oneclass oneoutput networks whose outputs are consistent with probability laws. We focus our attention on a particular subset of the corresponding cost functions; those which verify two usually interesting properties: symmetry and separability (wellknown cost functions, such as the quadratic cost or the cross entropy are particular cases in this subset). Finally, we present a universal stochastic gradient learning rule for singlelayer networks, in the sense of minimizing a general version of these cost functions for a wide family of nonlinear activation functions. Index Terms — Neural networks, pattern classification, probability estimation.