Results 1  10
of
11
Algebraic analysis for nonidentifiable learning machines
 Neural Computation
"... This paper clarifies the relation between the learning curve and the algebraic geometrical structure of a nonidentifiable learning machine such as a multilayer neural network whose true parameter set is an analytic set with singular points. By using a concept in algebraic analysis, we rigorously pr ..."
Abstract

Cited by 46 (14 self)
 Add to MetaCart
This paper clarifies the relation between the learning curve and the algebraic geometrical structure of a nonidentifiable learning machine such as a multilayer neural network whose true parameter set is an analytic set with singular points. By using a concept in algebraic analysis, we rigorously prove that the Bayesian stochastic complexity or the free energy is asymptotically equal to λ1 log n − (m1 − 1) log log n+constant, where n is the number of training samples and λ1 and m1 are the rational number and the natural number which are determined as the birational invariant values of the singularities in the parameter space. Also we show an algorithm to calculate λ1 and m1 based on the resolution of singularities in algebraic geometry. In regular statistical models, 2λ1 is equal to the number of parameters and m1 = 1, whereas in nonregular models such as multilayer networks, 2λ1 is not larger than the number of parameters and m1 ≥ 1. Since the increase of the stochastic complexity is equal to the learning curve or the generalization error, the nonidentifiable learning machines are the better models than the regular ones if the Bayesian ensemble learning is applied. 1 1
Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network
 Proc. 1st IEEE Computer Society Bioinformatics Conference
, 2002
"... We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is in the estimation of the conditional distribution of each random variable. We consider fitting nonparametric ..."
Abstract

Cited by 41 (18 self)
 Add to MetaCart
We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is in the estimation of the conditional distribution of each random variable. We consider fitting nonparametric regression models with heterogeneous error variances to the microarray gene expression data to capture the nonlinear structures between genes. A problem still remains to be solved in selecting an optimal graph, which gives the best representation of the system among genes. We theoretically derive a new graph selection criterion from Bayes approach in general situations. The proposed method includes previous methods based on Bayesian networks. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae gene expression data newly obtained by disrupting 100 genes. 1.
Singularities In Mixture Models And Upper Bounds Of Stochastic Complexity
 INTERNATIONAL JOURNAL OF NEURAL NETWORKS
, 2003
"... A learning machine which is a mixture of several distributions, for example, a gaussian mixture or a mixture of experts, has a wide range of applications. However, such a machine is a nonidentifiable statistical model with a lot of singularities in the parameter space, hence its generalization prop ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
A learning machine which is a mixture of several distributions, for example, a gaussian mixture or a mixture of experts, has a wide range of applications. However, such a machine is a nonidentifiable statistical model with a lot of singularities in the parameter space, hence its generalization property is left unknown. Recently an algebraic geometrical method has been developed which enables us to treat such learning machines mathematically. Based on this method, this paper rigorously proves that a mixture learning machine has the smaller Bayesian stochastic complexity than regular statistical models. Since the generalization error of a learning machine is equal to the increase of the stochastic complexity, the result of this paper shows that the mixture model can attain the more precise prediction than regular statistical models if Bayesian estimation is applied in statistical inference.
The Nishimori line and Bayesian Statistics
 J. Phys. A: Math. Gen
, 1999
"... Abstract. “Nishimori line ” is a line or hypersurface in the parameter space of systems with quenched disorder, where simple expressions of the averages of physical quantities over the quenched random variables are obtained. It has been playing an important role in the theoretical studies of the ran ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Abstract. “Nishimori line ” is a line or hypersurface in the parameter space of systems with quenched disorder, where simple expressions of the averages of physical quantities over the quenched random variables are obtained. It has been playing an important role in the theoretical studies of the random frustrated systems since its discovery around 1980. In this paper, a novel interpretation of the Nishimori line from the viewpoint of statistical information processing is presented. Our main aim is the reconstruction of the whole theory of the Nishimori line from the viewpoint of Bayesian statistics, or, almost equivalently, from the viewpoint of the theory of errorcorrecting codes. As a byproduct of our interpretation, counterparts of the Nishimori line in models without gauge invariance are given. We also discussed the issues on the “finite temperature decoding ” of errorcorrecting codes in connection with our theme and clarify the role of gauge invariance in this topic. Submitted to: J. Phys. A: Math. Gen. 1.
Subspace Information Criterion for NonQuadratic Regularizers  Model Selection for Sparse Regressors
 IEEE Transactions on Neural Networks
, 2002
"... Nonquadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that un ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
Nonquadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that under some technical assumptions GSIC is an asymptotically unbiased estimator of the generalization error. GSIC is demonstrated to have a good performance in experiments with the # 1 norm regularizer as we compare with the Network Information Criterion and crossvalidation in relatively large sample cases. However in the small sample case, GSIC tends to fail to capture the optimal model due to its large variance. Therefore, also a biased version of GSIC is introduced, which achieves reliable model selection in the relevant and challenging scenario of high dimensional data and few samples.
Algebraic Information Geometry for Learning Machines with Singularities
, 2001
"... Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matrices are singular. In this paper, the rigorous asymptotic form of the stochastic complexit ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matrices are singular. In this paper, the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two di#erent problems are studied. (1) If the prior is positive, then the stochastic complexity is far smaller than BIC, resulting in the smaller generalization error than regular statistical models, even when the true distribution is not contained in the parametric model. (2) If Je#reys' prior, which is coordinate free and equal to zero at singularities, is employed then the stochastic complexity has the same form as BIC. It is useful for model selection, but not for generalization. 1 Introduction The Fisher information matrix determines a metric of the set of all parameters of a learning machine [2]. If it is positive definite, then ...
Stochastic filtering for motion trajectory in image sequences using a Monte Carlo filter with estimation of hyperparameters
 Proc. 16th Int. Conf. Pattern Recog
, 2002
"... False matching due to errors in feature extraction and changes in illumination between frames may occur in feature tracking in image sequences. False matching leads to outliers in feature motion trajectory. One way of reducing the effect of outliers is stochastic filtering using a state space model ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
False matching due to errors in feature extraction and changes in illumination between frames may occur in feature tracking in image sequences. False matching leads to outliers in feature motion trajectory. One way of reducing the effect of outliers is stochastic filtering using a state space model for motion trajectory. Hyperparameters in the state space model, e.g., variances of noise distributions, must be determined appropriately to control tracking motion and outlier rejection properly. Likelihood can be used to estimate hyperparameters, but it is difficult to apply online tracking due to computational cost. To estimate hyperparameters online, we include hyperparameters in state vector and estimate feature coordinates and hyperparameters simultaneously. A Monte Carlo filter is used in state estimation, because adding hyperparameters to state vector makes state space model nonlinear. Experimental results using synthetic and real data show that the proposed method can estimate appropriate hyperparameters for tracking motion and reducing the effect of outliers.
Point Process Models in Asthma Attacks for Assessing Environmental Risk Factors
"... Point process models are reviewed and discussed for assessing the effects of environmental risk factors on asthma attacks. It is pointed out that the logit model and proportional intensity model are useful for analyzing the data based on the diaries recorded consecutively during several months or du ..."
Abstract
 Add to MetaCart
Point process models are reviewed and discussed for assessing the effects of environmental risk factors on asthma attacks. It is pointed out that the logit model and proportional intensity model are useful for analyzing the data based on the diaries recorded consecutively during several months or during a few years. Some covariates that seems to influence upon asthmatics are explored using these models. Further work on estimating the smoothed baseline intensity function is briefly discussed in terms of the Bayes model.
Fault friction parameters inferred from the early stages
"... [1] We use subdaily GPS time series of positions in the first 5 hours following the 2003 Tokachioki earthquake (Mw = 8.0) located offshore of Hokkaido, Japan, to estimate frictional parameters for the afterslip zone on the subduction interface. The data show little motion immediately after the eart ..."
Abstract
 Add to MetaCart
[1] We use subdaily GPS time series of positions in the first 5 hours following the 2003 Tokachioki earthquake (Mw = 8.0) located offshore of Hokkaido, Japan, to estimate frictional parameters for the afterslip zone on the subduction interface. The data show little motion immediately after the earthquake with sudden acceleration at about 1.2 hours after the main shock. This coincides with the largest aftershock (M = 7.4), followed by gradual deceleration. We assume that early afterslip is the response of a fault patch to instantaneous stress perturbations caused by the main shock and the largest aftershock. Early afterslip is modeled with a springslider system obeying a rate and statedependent friction law. We develop and apply an inversion method to estimate friction parameters, Dc, as, and (a b)s, where s is effective normal stress. The estimated 95 % confidence intervals of Dc, as, and (a b)s are 2.6 10 4 to 1.8 10 3 m, 0.29 to 0.43 MPa, and 0.214 to 0.220 MPa, respectively. Estimated Dc is 10 to 10 3 times larger than typical laboratory values. Estimated as and (a b)s values suggest that a and a b are smaller than typical laboratory values and/or the pore pressure on the plate boundary is significantly elevated above the hydrostatic value. Our analyses show that the model can reproduce the observed GPS data and that the timing of the rapid acceleration of postseismic deformation is controlled by the frictional properties of the fault and stress change from the main shock, not by the timing of the largest aftershock.
Bayesian Approaches to Acoustic Modeling: A Review
, 2012
"... This paper focuses on applications of Bayesian approaches to acoustic modeling for speech recognition and related speech processing applications. Bayesian approaches have been widely studied in the fields of statistics and machine learning, and one of their advantages is that their generalization ca ..."
Abstract
 Add to MetaCart
This paper focuses on applications of Bayesian approaches to acoustic modeling for speech recognition and related speech processing applications. Bayesian approaches have been widely studied in the fields of statistics and machine learning, and one of their advantages is that their generalization capability is better than that of conventional approaches (e.g., maximum likelihood). On the other hand, since inference in Bayesian approaches involves integrals and expectations that are mathematically intractable in most cases and require heavy numerical computations, it is generally difficult to apply them to practical speech recognition problems. However, there have been many such attempts, and this paper aims to summarize these attempts to encourage further progress on Bayesian approaches in the speech processing field. This paper describes various applications of Bayesian approaches to speech processing in terms of the four typical ways of approximating Bayesian inferences, i.e., maximum a posteriori approximation, model complexity control using a Bayesian information criterion based on asymptotic approximation, variational approximation, and Markov chain Monte Carlo based sampling techniques.