Results 11  20
of
310
Selftraining with Products of Latent Variable Grammars
"... We study selftraining with products of latent variable grammars in this paper. We show that increasing the quality of the automatically parsed data used for selftraining gives higher accuracy selftrained grammars. Our generative selftrained grammars reach F scores of 91.6 on the WSJ test set and ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
We study selftraining with products of latent variable grammars in this paper. We show that increasing the quality of the automatically parsed data used for selftraining gives higher accuracy selftrained grammars. Our generative selftrained grammars reach F scores of 91.6 on the WSJ test set and surpass even discriminative reranking systems without selftraining. Additionally, we show that multiple selftrained grammars can be combined in a product model to achieve even higher accuracy. The product model is most effective when the individual underlying grammars are most diverse. Combining multiple grammars that were selftrained on disjoint sets of unlabeled data results in a final test accuracy of 92.5 % on the WSJ test set and 89.6 % on our Broadcast News test set. 1
Title: Why do we still use stepwise modelling in ecology and behaviour?
"... 1. The biases and shortcomings of stepwise multiple regression are well established within the statistical literature. However an examination of papers published in 2004 by three leading ecological and behavioural journals suggested that the use of this technique remains widespread: of 65 papers in ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
1. The biases and shortcomings of stepwise multiple regression are well established within the statistical literature. However an examination of papers published in 2004 by three leading ecological and behavioural journals suggested that the use of this technique remains widespread: of 65 papers in which a multiple regression approach was used, 57 % of studies used a stepwise procedure. 2. The principal drawbacks of stepwise multiple regression include bias in parameter estimation, inconsistencies among model selection algorithms, an inherent (but often overlooked) problem of multiple hypothesis testing, and an inappropriate focus or reliance on a single best model. We discuss each of these issue with examples. 3. We use a worked example of data on yellowhammer distribution collected over four years to highlight the pitfalls of stepwise regression. We show that stepwise regression allows models containing significant predictors to be obtained from each year’s data. In spite of the significance of the selected models, they vary substantially between years and suggest patterns that are at odds with those determined by analysing the full, four year data set. 4. An Information Theoretic (IT) analysis of the yellowhammer data set illustrates why the varying outcomes of stepwise analyses arise. In particular, the IT approach identifies large numbers of competing models that could describe the data equally well, showing that no one model should be relied upon for inference. 2
Performance Prediction for Exponential Language Models
"... We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set crossentropy for ngram language models. We build models over varying domains, data set sizes, and ngram orders, an ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set crossentropy for ngram language models. We build models over varying domains, data set sizes, and ngram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including classbased models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance. 1
Order selection for samerealization predictions in autoregressive processes
 Annals of Statistics
, 2005
"... Assume that observations are generated from an infiniteorder ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Assume that observations are generated from an infiniteorder
Penalized loss functions for Bayesian model comparison
"... The deviance information criterion (DIC) is widely used for Bayesian model comparison, despite the lack of a clear theoretical foundation. DIC is shown to be an approximation to a penalized loss function based on the deviance, with a penalty derived from a crossvalidation argument. This approximati ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
The deviance information criterion (DIC) is widely used for Bayesian model comparison, despite the lack of a clear theoretical foundation. DIC is shown to be an approximation to a penalized loss function based on the deviance, with a penalty derived from a crossvalidation argument. This approximation is valid only when the effective number of parameters in the model is much smaller than the number of independent observations. In disease mapping, a typical application of DIC, this assumption does not hold and DIC underpenalizes more complex models. Another deviancebased loss function, derived from the same decisiontheoretic framework, is applied to mixture models, which have previously been considered an unsuitable application for DIC.
When do stepwise algorithms meet subset selection criteria?
, 2007
"... Recent results in homotopy and solution paths demonstrate that certain welldesigned greedy algorithms, with a range of values of the algorithmic parameter, can provide solution paths to a sequence of convex optimization problems. On the other hand, in regression many existing criteria in subset sel ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Recent results in homotopy and solution paths demonstrate that certain welldesigned greedy algorithms, with a range of values of the algorithmic parameter, can provide solution paths to a sequence of convex optimization problems. On the other hand, in regression many existing criteria in subset selection (including Cp, AIC, BIC, MDL, RIC, etc.) involve optimizing an objective function that contains a counting measure. The two optimization problems are formulated as (P1) and (P0) in the present paper. The latter is generally combinatoric and has been proven to be NPhard. We study the conditions under which the two optimization problems have common solutions. Hence, in these situations a stepwise algorithm can be used to solve the seemingly unsolvable problem. Our main result is motivated by recent work in sparse representation, while two others emerge from different angles: a direct analysis of sufficiency and necessity and a condition on the mostly correlated covariates. An extreme example connected with least angle regression is of independent interest.
Stochastic and deterministic models for agricultural production networks
 Math. Biosci. Eng
"... An approach to modeling the impact of disturbances in an agricultural production network is presented. A stochastic model and its approximate deterministic model for averages over sample paths of the stochastic system are developed. Simulations, sensitivity and generalized sensitivity analyses are g ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
An approach to modeling the impact of disturbances in an agricultural production network is presented. A stochastic model and its approximate deterministic model for averages over sample paths of the stochastic system are developed. Simulations, sensitivity and generalized sensitivity analyses are given. Finally, it is shown how diseases may be introduced into the network and corresponding simulations are discussed.
Human Cognition and a Pile of Sand: A Discussion on Serial Correlations and Selforganized Criticality
, 2005
"... ... framework of cognitive psychology in favor of the framework of nonlinear dynamical systems theory. Van Orden et al. presented evidence that“purposive behavior originates in selforganized criticality ” (p. 333). Here, the authors show that Van Orden et al.’s analyses do not test their hypotheses ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
... framework of cognitive psychology in favor of the framework of nonlinear dynamical systems theory. Van Orden et al. presented evidence that“purposive behavior originates in selforganized criticality ” (p. 333). Here, the authors show that Van Orden et al.’s analyses do not test their hypotheses. Further, the authors argue that a confirmation of Van Orden et al.’s hypotheses would not have constituted firm evidence in support of their framework. Finally, the absence of a specific model for how selforganized criticality produces the observed behavior makes it very difficult to derive testable predictions. The authors conclude that the proposed paradigm shift is presently unwarranted.
Statistical model selection methods applied to biological networks
 Transactions in Computational Systems Biology
, 2005
"... Abstract. Many biological networks have been labelled scalefree as their degree distribution can be approximately described by a powerlaw distribution. While the degree distribution does not summarize all aspects of a network it has often been suggested that its functional form contains important c ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Abstract. Many biological networks have been labelled scalefree as their degree distribution can be approximately described by a powerlaw distribution. While the degree distribution does not summarize all aspects of a network it has often been suggested that its functional form contains important clues as to underlying evolutionary processes that have shaped the network. Generally determining the appropriate functional form for the degree distribution has been fitted in an adhoc fashion. Here we apply formal statistical model selection methods to determine which functional form best describes degree distributions of protein interaction and metabolic networks. We interpret the degree distribution as belonging to a class of probability models and determine which of these models provides the best description for the empirical data using maximum likelihood inference, composite likelihood methods, the Akaike information criterion and goodnessoffit tests. The whole data is used in order to determine the parameter that best explains the data under a given model (e.g. scalefree or random graph). As we will show, present protein interaction and metabolic network data from different organisms suggests that simple scalefree models do not provide an adequate description of real network data. 1
Fragility of Asymptotic Agreement under Bayesian Learning ∗
, 2009
"... Under the assumption that individuals know the conditional distributions of signals given the payoffrelevant parameters, existing results conclude that as individuals observe infinitely many signals, their beliefs about the parameters will eventually merge. We first show that these results are frag ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Under the assumption that individuals know the conditional distributions of signals given the payoffrelevant parameters, existing results conclude that as individuals observe infinitely many signals, their beliefs about the parameters will eventually merge. We first show that these results are fragile when individuals are uncertain about the signal distributions: given any such model, vanishingly small individual uncertainty about the signal distributions can lead to substantial (nonvanishing) differences in asymptotic beliefs. Under a uniform convergence assumption, we then characterize the conditions under which a small amount of uncertainty leads to significant asymptotic disagreement.