Results 1  10
of
36
Algebraic analysis for nonidentifiable learning machines
 Neural Computation
"... This paper clarifies the relation between the learning curve and the algebraic geometrical structure of a nonidentifiable learning machine such as a multilayer neural network whose true parameter set is an analytic set with singular points. By using a concept in algebraic analysis, we rigorously pr ..."
Abstract

Cited by 58 (18 self)
 Add to MetaCart
(Show Context)
This paper clarifies the relation between the learning curve and the algebraic geometrical structure of a nonidentifiable learning machine such as a multilayer neural network whose true parameter set is an analytic set with singular points. By using a concept in algebraic analysis, we rigorously prove that the Bayesian stochastic complexity or the free energy is asymptotically equal to λ1 log n − (m1 − 1) log log n+constant, where n is the number of training samples and λ1 and m1 are the rational number and the natural number which are determined as the birational invariant values of the singularities in the parameter space. Also we show an algorithm to calculate λ1 and m1 based on the resolution of singularities in algebraic geometry. In regular statistical models, 2λ1 is equal to the number of parameters and m1 = 1, whereas in nonregular models such as multilayer networks, 2λ1 is not larger than the number of parameters and m1 ≥ 1. Since the increase of the stochastic complexity is equal to the learning curve or the generalization error, the nonidentifiable learning machines are the better models than the regular ones if the Bayesian ensemble learning is applied. 1 1
Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network
 Proc. 1st IEEE Computer Society Bioinformatics Conference
, 2002
"... We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is in the estimation of the conditional distribution of each random variable. We consider fitting nonparametric ..."
Abstract

Cited by 45 (18 self)
 Add to MetaCart
(Show Context)
We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is in the estimation of the conditional distribution of each random variable. We consider fitting nonparametric regression models with heterogeneous error variances to the microarray gene expression data to capture the nonlinear structures between genes. A problem still remains to be solved in selecting an optimal graph, which gives the best representation of the system among genes. We theoretically derive a new graph selection criterion from Bayes approach in general situations. The proposed method includes previous methods based on Bayesian networks. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae gene expression data newly obtained by disrupting 100 genes. 1.
Singularities In Mixture Models And Upper Bounds Of Stochastic Complexity
 INTERNATIONAL JOURNAL OF NEURAL NETWORKS
, 2003
"... A learning machine which is a mixture of several distributions, for example, a gaussian mixture or a mixture of experts, has a wide range of applications. However, such a machine is a nonidentifiable statistical model with a lot of singularities in the parameter space, hence its generalization prop ..."
Abstract

Cited by 30 (10 self)
 Add to MetaCart
A learning machine which is a mixture of several distributions, for example, a gaussian mixture or a mixture of experts, has a wide range of applications. However, such a machine is a nonidentifiable statistical model with a lot of singularities in the parameter space, hence its generalization property is left unknown. Recently an algebraic geometrical method has been developed which enables us to treat such learning machines mathematically. Based on this method, this paper rigorously proves that a mixture learning machine has the smaller Bayesian stochastic complexity than regular statistical models. Since the generalization error of a learning machine is equal to the increase of the stochastic complexity, the result of this paper shows that the mixture model can attain the more precise prediction than regular statistical models if Bayesian estimation is applied in statistical inference.
Source rupture process of the 2003 Tokachioki earthquake determined by joint inversion of teleseismic body wave and strong ground motion data, Earth Planets Space
, 2004
"... The spatiotemporal slip distribution of the 2003 Tokachioki, Japan, earthquake was estimated from teleseismic body wave and strong ground motion data. To perform stable inversion, we applied smoothing constraints to the slip distribution with respect to time and space, and determined the optimal w ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
The spatiotemporal slip distribution of the 2003 Tokachioki, Japan, earthquake was estimated from teleseismic body wave and strong ground motion data. To perform stable inversion, we applied smoothing constraints to the slip distribution with respect to time and space, and determined the optimal weights of constraints using an optimized Akaike’s Bayesian Information Criterion (ABIC). We found that the rupture propagates mainly along the dip direction, and the length of the rupture area is shorter than its width. The mean rise time in the shallow asperity is significantly longer than that in the deep asperity, which might be attributed to variable frictional properties or lower strength of the plate interface at shallower depths. The average rupture velocity of deep asperity extends to the shearwave velocity. The derived source parameters are as follows: seismic moment Mo = 1.7×1021 Nm (Mw 8.0); source duration = 50 sec. We also estimated the shear stress change due to the mainshock on and around the major fault zone. It appears that many aftershocks on the plate boundary took place in and adjacent to the zones of stress increase due to the rupture of the mainshock.
The Nishimori line and Bayesian Statistics
 J. Phys. A: Math. Gen
, 1999
"... Abstract. “Nishimori line ” is a line or hypersurface in the parameter space of systems with quenched disorder, where simple expressions of the averages of physical quantities over the quenched random variables are obtained. It has been playing an important role in the theoretical studies of the ran ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
Abstract. “Nishimori line ” is a line or hypersurface in the parameter space of systems with quenched disorder, where simple expressions of the averages of physical quantities over the quenched random variables are obtained. It has been playing an important role in the theoretical studies of the random frustrated systems since its discovery around 1980. In this paper, a novel interpretation of the Nishimori line from the viewpoint of statistical information processing is presented. Our main aim is the reconstruction of the whole theory of the Nishimori line from the viewpoint of Bayesian statistics, or, almost equivalently, from the viewpoint of the theory of errorcorrecting codes. As a byproduct of our interpretation, counterparts of the Nishimori line in models without gauge invariance are given. We also discussed the issues on the “finite temperature decoding ” of errorcorrecting codes in connection with our theme and clarify the role of gauge invariance in this topic. Submitted to: J. Phys. A: Math. Gen. 1.
2000), Fault geometry at the rupture termination of the 1995 Hyogoken Nanbu earthquake
"... Abstract The source geometry and slip distribution at rupture termination of the 1995 Hyogoken Nanbu earthquake were investigated using waveform inversion on the assumption of fault branching in the northeastern part of the rupture model. Possible branching of the Okamoto fault is suggested both by ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
Abstract The source geometry and slip distribution at rupture termination of the 1995 Hyogoken Nanbu earthquake were investigated using waveform inversion on the assumption of fault branching in the northeastern part of the rupture model. Possible branching of the Okamoto fault is suggested both by the staticdisplacement distribution and damage extension east of Kobe (Nishinomiya area). To exclude data contaminated by the basinedgediffracted wave in the waveforminversion process, we examined the spatiotemporal variation of its influence and from a comparison with a flat model, we determined windows appropriate for the data. Three subevents were identified, as was reported in previous works. The largest was in the shallow part on the Awaji side, whereas the smaller two occurred at great depths (7km) on the Kobe side. We found a smaller subevent in the deep part of the branch. Total variance reduction was larger, and the ABIC value was smaller when we assumed the branch than when we did not, which shows the superiority of the branching fault model. Resolution checks showed that the slips on proposed branched portions are physical and not caused by randomdata noise or systematic errors in Green’s func
A procedure for tidal analysis with a Bayesian information criterion. Geophys
 J. Int
, 1991
"... A computer algorithm for tidal analysis is developed, based on a Bayesian method proposed by Ishiguro ef af. (1983). The basic assumption of the method is smoothness of the drift. This assumption is represented in the form of prior probability in the Bayesian model. Once the prior distribution is de ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
A computer algorithm for tidal analysis is developed, based on a Bayesian method proposed by Ishiguro ef af. (1983). The basic assumption of the method is smoothness of the drift. This assumption is represented in the form of prior probability in the Bayesian model. Once the prior distribution is determined, the parameters used in the analysis model are obtained by maximizing the posterior distribution of the parameters. For the given data, ABIC (Akaike's Bayesian Information Criterion, Akaike 1980) is used to select the optimum values of the hyperparameters of the prior distribution and combination of parameters. The program, BAYTAPG, can be adapted to tidal data which includes such irregularities as drift, occasional steps and disturbances caused by meteorological influences. The applicability of this program is examined using simulated data and real strain data.
Subspace Information Criterion for NonQuadratic Regularizers  Model Selection for Sparse Regressors
 IEEE Transactions on Neural Networks
, 2002
"... Nonquadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that un ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
(Show Context)
Nonquadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that under some technical assumptions GSIC is an asymptotically unbiased estimator of the generalization error. GSIC is demonstrated to have a good performance in experiments with the # 1 norm regularizer as we compare with the Network Information Criterion and crossvalidation in relatively large sample cases. However in the small sample case, GSIC tends to fail to capture the optimal model due to its large variance. Therefore, also a biased version of GSIC is introduced, which achieves reliable model selection in the relevant and challenging scenario of high dimensional data and few samples.
Algebraic Information Geometry for Learning Machines with Singularities
, 2001
"... Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matrices are singular. In this paper, the rigorous asymptotic form of the stochastic complexit ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matrices are singular. In this paper, the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two di#erent problems are studied. (1) If the prior is positive, then the stochastic complexity is far smaller than BIC, resulting in the smaller generalization error than regular statistical models, even when the true distribution is not contained in the parametric model. (2) If Je#reys' prior, which is coordinate free and equal to zero at singularities, is employed then the stochastic complexity has the same form as BIC. It is useful for model selection, but not for generalization. 1 Introduction The Fisher information matrix determines a metric of the set of all parameters of a learning machine [2]. If it is positive definite, then ...
Earthquake source parameters determined by the SAFOD pilot hole seismic array
, 2004
"... [1] We estimate the source parameters of #3 microearthquakes by jointly analyzing seismograms recorded by the 32level, 3component seismic array installed in the SAFOD Pilot Hole. We applied an inversion procedure to estimate spectral parameters for the omegasquare model (spectral level and corne ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
[1] We estimate the source parameters of #3 microearthquakes by jointly analyzing seismograms recorded by the 32level, 3component seismic array installed in the SAFOD Pilot Hole. We applied an inversion procedure to estimate spectral parameters for the omegasquare model (spectral level and corner frequency) and Q to displacement amplitude spectra. Because we expect spectral parameters and Q to vary slowly with depth in the well, we impose a smoothness constraint on those parameters as a function of depth using a linear firstdifference operator. This method correctly resolves corner frequency and Q, which leads to a more accurate estimation of source parameters than can be obtained from single sensors. The stress drop of one example of the SAFOD target repeating earthquake falls in