Results 1 
7 of
7
Theoretical analysis of Bayesian matrix factorization
 Journal of Machine Learning Research
"... Recently, variational Bayesian (VB) techniques have been applied to probabilistic matrix factorization and shown to perform very well in experiments. In this paper, we theoretically elucidate properties of the VB matrix factorization (VBMF) method. Through finitesample analysis of the VBMF estimato ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
Recently, variational Bayesian (VB) techniques have been applied to probabilistic matrix factorization and shown to perform very well in experiments. In this paper, we theoretically elucidate properties of the VB matrix factorization (VBMF) method. Through finitesample analysis of the VBMF estimator, we show that two types of shrinkage factors exist in the VBMF estimator: the positivepart JamesStein (PJS) shrinkage and the tracenorm shrinkage, both acting on each singular component separately for producing lowrank solutions. The tracenorm shrinkage is simply induced by nonflat prior information, similarly to the maximum a posteriori (MAP) approach. Thus, no tracenorm shrinkage remains when priors are noninformative. On the other hand, we show a counterintuitive fact that the PJS shrinkage factor is kept activated even with flat priors. This is shown to be induced by the nonidentifiability of the matrix factorization model, that is, the mapping between the target matrix and factorized matrices is not onetoone. We call this modelinduced regularization. We further extend our analysis to empirical Bayes scenarios where hyperparameters are also learned based on the VB free energy. Throughout the paper, we assume no missing entry in the observed matrix, and therefore collaborative filtering is out of scope.
Implicit Regularization in Variational Bayesian Matrix Factorization
"... Matrix factorization into the product of lowrank matrices induces nonidentifiability, i.e., the mapping between the target matrix and factorized matrices is not onetoone. In this paper, we theoretically investigate the influence of nonidentifiability on Bayesian matrix factorization. More specif ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Matrix factorization into the product of lowrank matrices induces nonidentifiability, i.e., the mapping between the target matrix and factorized matrices is not onetoone. In this paper, we theoretically investigate the influence of nonidentifiability on Bayesian matrix factorization. More specifically, we show that a variational Bayesian method involves regularization effect even when the prior is noninformative, which is intrinsically different from the maximum a posteriori approach. We also extend our analysis to empirical Bayes scenarios where hyperparameters are also learned from data. 1.
A Formula of Equations of States in Singular Learning Machines
"... Almost all learning machines used in computational intelligence are not regular but singular statistical models, because they are nonidentifiable and their Fisher information matrices are singular. In singular learning machines, neither the Bayes a posteriori distribution converges to the normal d ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Almost all learning machines used in computational intelligence are not regular but singular statistical models, because they are nonidentifiable and their Fisher information matrices are singular. In singular learning machines, neither the Bayes a posteriori distribution converges to the normal distribution nor the maximum likelihood estimator satisfies the asymptotic normality, resulting that it has been difficult to estimate generalization performances. In this paper, we establish a formula of equations of states which holds among Bayes and Gibbs generalization and training errors, and show that two generalization errors can be estimated from two training errors. The equations of states proved in this paper hold for any true distribution, any learning machine, and a priori distribution, and any singularities, hence they define widely applicable information criteria.
Generalization Error of Linear Neural Networks in an Empirical Bayes Approach
 In Proc. of IJCAI
, 2005
"... It is well known that in unidentifiable models, the Bayes estimation has the advantage of generalization performance to the maximum likelihood estimation. However, accurate approximation of the posterior distribution requires huge computational costs. In this paper, we consider an empirical Bayes ap ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
It is well known that in unidentifiable models, the Bayes estimation has the advantage of generalization performance to the maximum likelihood estimation. However, accurate approximation of the posterior distribution requires huge computational costs. In this paper, we consider an empirical Bayes approach where a part of the parameters are regarded as hyperparameters, which we call a subspace Bayes approach, and theoretically analyze the generalization error of threelayer linear neural networks. We show that a subspace Bayes approach is asymptotically equivalent to a positivepart JamesStein type shrinkage estimation, and behaves similarly to the Bayes estimation in typical cases. 1
Analysis of Variational Bayesian Matrix Factorization
"... Abstract. Recently, the variational Bayesian approximation was applied to probabilistic matrix factorization and shown to perform very well in experiments. However, its good performance was not completely understood beyond its experimental success. The purpose of this paper is to theoretically eluci ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Recently, the variational Bayesian approximation was applied to probabilistic matrix factorization and shown to perform very well in experiments. However, its good performance was not completely understood beyond its experimental success. The purpose of this paper is to theoretically elucidate properties of a variational Bayesian matrix factorization method. In particular, its mechanism of avoiding overfitting is analyzed. Our analysis relies on the key fact that the matrix factorization model induces nonidentifiability, i.e., the mapping between factorized matrices and the original matrix is not onetoone. The positivepart JamesStein shrinkage operator and the MarcenkoPastur law—the limiting distribution of eigenvalues of the central Wishart distribution—play important roles in our analysis. 1
Almost All Learning Machines are Singular
"... Abstract — A learning machine is called singular if its Fisher information matrix is singular. Almost all learning machines used in information processing are singular, for example, layered neural networks, normal mixtures, binomial mixtures, Bayes networks, hidden Markov models, Boltzmann machines, ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — A learning machine is called singular if its Fisher information matrix is singular. Almost all learning machines used in information processing are singular, for example, layered neural networks, normal mixtures, binomial mixtures, Bayes networks, hidden Markov models, Boltzmann machines, stochastic contextfree grammars, and reduced rank regressions are singular. In singular learning machines, the likelihood function can not be approximated by any quadratic form of the parameter. Moreover, neither the distribution of the maximum likelihood estimator nor the Bayes a posteriori distribution converges to the normal distribution, even if the number of training samples tends to infinity. Therefore, the conventional statistical learning theory does not hold in singular learning machines. This paper establishes the new mathematical foundation for singular learning machines. We propose that, by using resolution of singularities, the likelihood function can be represented as the standard form, by which we can prove the asymptotic behavior of the generalization errors of the maximum likelihood method and the Bayes estimation. The result will be a base on which training algorithms of singular learning machines are devised and optimized. I.