Results 1  10
of
23
Algebraic analysis for nonidentifiable learning machines
 Neural Computation
"... This paper clarifies the relation between the learning curve and the algebraic geometrical structure of a nonidentifiable learning machine such as a multilayer neural network whose true parameter set is an analytic set with singular points. By using a concept in algebraic analysis, we rigorously pr ..."
Abstract

Cited by 47 (16 self)
 Add to MetaCart
This paper clarifies the relation between the learning curve and the algebraic geometrical structure of a nonidentifiable learning machine such as a multilayer neural network whose true parameter set is an analytic set with singular points. By using a concept in algebraic analysis, we rigorously prove that the Bayesian stochastic complexity or the free energy is asymptotically equal to λ1 log n − (m1 − 1) log log n+constant, where n is the number of training samples and λ1 and m1 are the rational number and the natural number which are determined as the birational invariant values of the singularities in the parameter space. Also we show an algorithm to calculate λ1 and m1 based on the resolution of singularities in algebraic geometry. In regular statistical models, 2λ1 is equal to the number of parameters and m1 = 1, whereas in nonregular models such as multilayer networks, 2λ1 is not larger than the number of parameters and m1 ≥ 1. Since the increase of the stochastic complexity is equal to the learning curve or the generalization error, the nonidentifiable learning machines are the better models than the regular ones if the Bayesian ensemble learning is applied. 1 1
Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network
 Proc. 1st IEEE Computer Society Bioinformatics Conference
, 2002
"... We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is in the estimation of the conditional distribution of each random variable. We consider fitting nonparametric ..."
Abstract

Cited by 38 (18 self)
 Add to MetaCart
We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is in the estimation of the conditional distribution of each random variable. We consider fitting nonparametric regression models with heterogeneous error variances to the microarray gene expression data to capture the nonlinear structures between genes. A problem still remains to be solved in selecting an optimal graph, which gives the best representation of the system among genes. We theoretically derive a new graph selection criterion from Bayes approach in general situations. The proposed method includes previous methods based on Bayesian networks. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae gene expression data newly obtained by disrupting 100 genes. 1.
Singularities In Mixture Models And Upper Bounds Of Stochastic Complexity
 INTERNATIONAL JOURNAL OF NEURAL NETWORKS
, 2003
"... A learning machine which is a mixture of several distributions, for example, a gaussian mixture or a mixture of experts, has a wide range of applications. However, such a machine is a nonidentifiable statistical model with a lot of singularities in the parameter space, hence its generalization prop ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
A learning machine which is a mixture of several distributions, for example, a gaussian mixture or a mixture of experts, has a wide range of applications. However, such a machine is a nonidentifiable statistical model with a lot of singularities in the parameter space, hence its generalization property is left unknown. Recently an algebraic geometrical method has been developed which enables us to treat such learning machines mathematically. Based on this method, this paper rigorously proves that a mixture learning machine has the smaller Bayesian stochastic complexity than regular statistical models. Since the generalization error of a learning machine is equal to the increase of the stochastic complexity, the result of this paper shows that the mixture model can attain the more precise prediction than regular statistical models if Bayesian estimation is applied in statistical inference.
The Nishimori line and Bayesian Statistics
 J. Phys. A: Math. Gen
, 1999
"... Abstract. “Nishimori line ” is a line or hypersurface in the parameter space of systems with quenched disorder, where simple expressions of the averages of physical quantities over the quenched random variables are obtained. It has been playing an important role in the theoretical studies of the ran ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Abstract. “Nishimori line ” is a line or hypersurface in the parameter space of systems with quenched disorder, where simple expressions of the averages of physical quantities over the quenched random variables are obtained. It has been playing an important role in the theoretical studies of the random frustrated systems since its discovery around 1980. In this paper, a novel interpretation of the Nishimori line from the viewpoint of statistical information processing is presented. Our main aim is the reconstruction of the whole theory of the Nishimori line from the viewpoint of Bayesian statistics, or, almost equivalently, from the viewpoint of the theory of errorcorrecting codes. As a byproduct of our interpretation, counterparts of the Nishimori line in models without gauge invariance are given. We also discussed the issues on the “finite temperature decoding ” of errorcorrecting codes in connection with our theme and clarify the role of gauge invariance in this topic. Submitted to: J. Phys. A: Math. Gen. 1.
Subspace Information Criterion for NonQuadratic Regularizers  Model Selection for Sparse Regressors
 IEEE Transactions on Neural Networks
, 2002
"... Nonquadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that un ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
Nonquadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that under some technical assumptions GSIC is an asymptotically unbiased estimator of the generalization error. GSIC is demonstrated to have a good performance in experiments with the # 1 norm regularizer as we compare with the Network Information Criterion and crossvalidation in relatively large sample cases. However in the small sample case, GSIC tends to fail to capture the optimal model due to its large variance. Therefore, also a biased version of GSIC is introduced, which achieves reliable model selection in the relevant and challenging scenario of high dimensional data and few samples.
Algebraic Information Geometry for Learning Machines with Singularities
, 2001
"... Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matrices are singular. In this paper, the rigorous asymptotic form of the stochastic complexit ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matrices are singular. In this paper, the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two di#erent problems are studied. (1) If the prior is positive, then the stochastic complexity is far smaller than BIC, resulting in the smaller generalization error than regular statistical models, even when the true distribution is not contained in the parametric model. (2) If Je#reys' prior, which is coordinate free and equal to zero at singularities, is employed then the stochastic complexity has the same form as BIC. It is useful for model selection, but not for generalization. 1 Introduction The Fisher information matrix determines a metric of the set of all parameters of a learning machine [2]. If it is positive definite, then ...
Stochastic filtering for motion trajectory in image sequences using a Monte Carlo filter with estimation of hyperparameters
 Proc. 16th Int. Conf. Pattern Recog
, 2002
"... False matching due to errors in feature extraction and changes in illumination between frames may occur in feature tracking in image sequences. False matching leads to outliers in feature motion trajectory. One way of reducing the effect of outliers is stochastic filtering using a state space model ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
False matching due to errors in feature extraction and changes in illumination between frames may occur in feature tracking in image sequences. False matching leads to outliers in feature motion trajectory. One way of reducing the effect of outliers is stochastic filtering using a state space model for motion trajectory. Hyperparameters in the state space model, e.g., variances of noise distributions, must be determined appropriately to control tracking motion and outlier rejection properly. Likelihood can be used to estimate hyperparameters, but it is difficult to apply online tracking due to computational cost. To estimate hyperparameters online, we include hyperparameters in state vector and estimate feature coordinates and hyperparameters simultaneously. A Monte Carlo filter is used in state estimation, because adding hyperparameters to state vector makes state space model nonlinear. Experimental results using synthetic and real data show that the proposed method can estimate appropriate hyperparameters for tracking motion and reducing the effect of outliers.
Fault friction parameters inferred from the early stages
"... [1] We use subdaily GPS time series of positions in the first 5 hours following the 2003 Tokachioki earthquake (Mw = 8.0) located offshore of Hokkaido, Japan, to estimate frictional parameters for the afterslip zone on the subduction interface. The data show little motion immediately after the eart ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
[1] We use subdaily GPS time series of positions in the first 5 hours following the 2003 Tokachioki earthquake (Mw = 8.0) located offshore of Hokkaido, Japan, to estimate frictional parameters for the afterslip zone on the subduction interface. The data show little motion immediately after the earthquake with sudden acceleration at about 1.2 hours after the main shock. This coincides with the largest aftershock (M = 7.4), followed by gradual deceleration. We assume that early afterslip is the response of a fault patch to instantaneous stress perturbations caused by the main shock and the largest aftershock. Early afterslip is modeled with a springslider system obeying a rate and statedependent friction law. We develop and apply an inversion method to estimate friction parameters, Dc, as, and (a b)s, where s is effective normal stress. The estimated 95 % confidence intervals of Dc, as, and (a b)s are 2.6 10 4 to 1.8 10 3 m, 0.29 to 0.43 MPa, and 0.214 to 0.220 MPa, respectively. Estimated Dc is 10 to 10 3 times larger than typical laboratory values. Estimated as and (a b)s values suggest that a and a b are smaller than typical laboratory values and/or the pore pressure on the plate boundary is significantly elevated above the hydrostatic value. Our analyses show that the model can reproduce the observed GPS data and that the timing of the rapid acceleration of postseismic deformation is controlled by the frictional properties of the fault and stress change from the main shock, not by the timing of the largest aftershock.
Significant improvements of the spacetime ETAS model for forecasting of accurate baseline seismicity, Earth Planets Space, 63, this issue
, 2011
"... accurate baseline seismicity ..."
Coseismic and early postseismic slip for the 2003 Tokachioki earthquake sequence inferred from GPS data
"... earthquake is investigated using subdaily GPS time series. Afterslip results are compared with the coseismic slip for the M8 mainshock and the M7.4 aftershock. Afterslip between those two earthquakes is inferred at the southwestern adjacent region of the mainshock, between two epicentral regions, wh ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
earthquake is investigated using subdaily GPS time series. Afterslip results are compared with the coseismic slip for the M8 mainshock and the M7.4 aftershock. Afterslip between those two earthquakes is inferred at the southwestern adjacent region of the mainshock, between two epicentral regions, which possibly triggered the aftershock in the southwest. Subsequently, deeper slip occurs. The afterslip loci are distinct from the rupture regions. The nonuniform propagation of afterslip may reflect the depthdependence of the effective normal stress and the distance between the closest unstable slip patches. Citation: Miyazaki, S., and K. M. Larson (2008), Coseismic and early postseismic slip for the 2003 Tokachioki earthquake sequence inferred from GPS data, Geophys. Res. Lett., 35,