Results 11  20
of
408
An improved algorithm for estimating incident daily solar radiation from measurements of temperature, humidity, and precipitation
 Agric. For. Meteor
, 1999
"... We present a reformulation of the Bristow–Campbell model for daily solar radiation, developed using daily observations of radiation, temperature, humidity, and precipitation, from 40 stations in contrasting climates. By expanding the original model to include a spatially and temporally variable esti ..."
Abstract

Cited by 87 (5 self)
 Add to MetaCart
(Show Context)
We present a reformulation of the Bristow–Campbell model for daily solar radiation, developed using daily observations of radiation, temperature, humidity, and precipitation, from 40 stations in contrasting climates. By expanding the original model to include a spatially and temporally variable estimate of clearsky transmittance, and applying a small number of other minor modifications, the new model produces better results than the original over a wider range of climates. Our method does not require reparameterization on a sitebysite basis, a distinct advantage over the original approach. We do require observations of dewpoint temperature, which the original model does not, but we suggest a method that could eliminate this dependency. Mean absolute error (MAE) for predictions of clearsky transmittance was improved by 28 % compared to the original model formulation. Aerosols and snowcover probably contribute to variation in clearsky transmittance that remains unexplained by our method. MAE and bias for prediction of daily incident radiation were about 2.4 MJ mÿ2 dayÿ1 and 0.5 MJ mÿ2 dayÿ1, respectively. As a percent of the average observed values of incident radiation, MAE and bias are about 15 % and 4%, respectively. The lowest errors and smallest biases (percent basis) occurred during the summer. The highest prediction biases were associated with stations having a strong seasonal concentration of precipitation, with underpredictions at summerprecipitation stations, and overpredictions at winterprecipitation stations. Further study is required to characterize the behavior of this method for tropical climates. # 1999 Elsevier Science B.V. All rights reserved.
Classifying gene expression profiles from pairwise mRNA comparisons
 Stat. Appl. Geneti. Mol. Biol
, 2004
"... Copyright c○2004 by the authors. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, bepres ..."
Abstract

Cited by 71 (6 self)
 Add to MetaCart
Copyright c○2004 by the authors. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, bepress, which has been given certain exclusive rights by the author. Statistical Applications in Genetics and Molecular Biology is produced by The Berkeley Electronic Press (bepress).
A Comparison of Dynamic and nonDynamic Rough Set Methods for Extracting Laws from Decision Tables
, 1998
"... We report results of experiments on several data sets, in particular: Monk's problems data (see [58]), medical data (lymphography, breast cancer, primary tumor  see [30]) and StatLog's data (see [32]). We compare standard methods for extracting laws from decision tables (see [43], [52]), ..."
Abstract

Cited by 68 (7 self)
 Add to MetaCart
We report results of experiments on several data sets, in particular: Monk's problems data (see [58]), medical data (lymphography, breast cancer, primary tumor  see [30]) and StatLog's data (see [32]). We compare standard methods for extracting laws from decision tables (see [43], [52]), based on rough set (see [42]) and boolean reasoning (see [8]), with the method based on dynamic reducts and dynamic rules (see [3],[4],[5],[6]). We also compare the results of computer experiments on those data sets obtained by applying our system based on rough set methods with the results on the same data sets obtained with help of several data analysis systems known from literature.
The variable selection problem
 Journal of the American Statistical Association
, 2000
"... The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables ..."
Abstract

Cited by 64 (3 self)
 Add to MetaCart
The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments which have led to the wide variety of approaches for this problem. 1
Datadriven calibration of penalties for leastsquares regression
, 2009
"... Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from data. We propose a completely datadriven calibration algorithm for these parameters in the leastsquares regression framework, without assuming a parti ..."
Abstract

Cited by 56 (13 self)
 Add to MetaCart
Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from data. We propose a completely datadriven calibration algorithm for these parameters in the leastsquares regression framework, without assuming a particular shape for the penalty. Our algorithm relies on the concept of minimal penalty, recently introduced by Birgé and Massart (2007) in the context of penalized least squares for Gaussian homoscedastic regression. On the positive side, the minimal penalty can be evaluated from the data themselves, leading to a datadriven estimation of an optimal penalty which can be used in practice; on the negative side, their approach heavily relies on the homoscedastic Gaussian nature of their stochastic framework. The purpose of this paper is twofold: stating a more general heuristics for designing a datadriven penalty (the slope heuristics) and proving that it works for penalized leastsquares regression with a random design, even for heteroscedastic nonGaussian data. For technical reasons, some exact mathematical results will be proved only for regressogram binwidth selection. This is at least a first step towards further results, since the approach and the method that we use are indeed general.
2007): “An Improved, BiasReduced Probabilistic Functional Gene Network of Baker’s Yeast, Saccharomyces cerevisiae,” PLoS
 http://www.bepress.com/sagmb/vol9/iss1/art22 DOI: 10.2202/15446115.1483 32 and Michailidis: Network Enrichment Analysis in Complex Experiments
, 1988
"... Background. Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA micr ..."
Abstract

Cited by 55 (16 self)
 Add to MetaCart
(Show Context)
Background. Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations. Methodology/Principal Findings. We report a significantly improved version (v. 2) of a probabilistic functional gene network [1] of the baker’s yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pairwise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA coexpression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis. Conclusions/Significance. YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95 % of the validated proteome). YeastNet is available from
Connections between the Lines: Augmenting Social Networks with Text
"... Network data is ubiquitous, encoding collections of relationships between entities such as people, places, genes, or corporations. While many resources for networks of interesting entities are emerging, most of these can only annotate connections in a limited fashion. Although relationships between ..."
Abstract

Cited by 46 (3 self)
 Add to MetaCart
(Show Context)
Network data is ubiquitous, encoding collections of relationships between entities such as people, places, genes, or corporations. While many resources for networks of interesting entities are emerging, most of these can only annotate connections in a limited fashion. Although relationships between entities are rich, it is impractical to manually devise complete characterizations of these relationships for every pair of entities on large, realworld corpora. In this paper we present a novel probabilistic topic model to analyze text corpora and infer descriptions of its entities and of relationships between those entities. We develop variational methods for performing approximate inference on our model and demonstrate that our model can be practically deployed on large corpora such as Wikipedia. We show qualitatively and quantitatively that our model can construct and annotate graphs of relationships and make useful predictions.
On Growing Better Decision Trees from Data
, 1995
"... This thesis investigates the problem of growing decision trees from data, for the purposes of classification and prediction. ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
This thesis investigates the problem of growing decision trees from data, for the purposes of classification and prediction.
Penalized loss functions for Bayesian model comparison
"... The deviance information criterion (DIC) is widely used for Bayesian model comparison, despite the lack of a clear theoretical foundation. DIC is shown to be an approximation to a penalized loss function based on the deviance, with a penalty derived from a crossvalidation argument. This approximati ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
The deviance information criterion (DIC) is widely used for Bayesian model comparison, despite the lack of a clear theoretical foundation. DIC is shown to be an approximation to a penalized loss function based on the deviance, with a penalty derived from a crossvalidation argument. This approximation is valid only when the effective number of parameters in the model is much smaller than the number of independent observations. In disease mapping, a typical application of DIC, this assumption does not hold and DIC underpenalizes more complex models. Another deviancebased loss function, derived from the same decisiontheoretic framework, is applied to mixture models, which have previously been considered an unsuitable application for DIC.
2004b. An evaluation of multipass electrofishing for estimating the abundance of streamdwelling salmonids. Transactions of the American Fisheries Society 133:462–475
"... Abstract.—Failure to estimate capture efficiency, defined as the probability of capturing individual fish, can introduce a systematic error or bias into estimates of fish abundance. We evaluated the efficacy of multipass electrofishing removal methods for estimating fish abundance by comparing estim ..."
Abstract

Cited by 36 (7 self)
 Add to MetaCart
(Show Context)
Abstract.—Failure to estimate capture efficiency, defined as the probability of capturing individual fish, can introduce a systematic error or bias into estimates of fish abundance. We evaluated the efficacy of multipass electrofishing removal methods for estimating fish abundance by comparing estimates of capture efficiency from multipass removal estimates to capture efficiencies measured by the recapture of known numbers of marked individuals for bull trout Salvelinus confluentus and westslope cutthroat trout Oncorhynchus clarki lewisi. Electrofishing capture efficiency measured by the recapture of marked fish was greatest for westslope cutthroat trout and for the largest sizeclasses of both species. Capture efficiency measured by the recapture of marked fish also was low for the first electrofishing pass (mean, 28%) and decreased considerably (mean, 1.71 times lower) with successive passes, which suggested that fish were responding to the electrofishing procedures. On average, the removal methods overestimated threepass capture efficiency by 39 % and underestimated fish abundance by 88%, across both species and all sizeclasses. The overestimates of efficiency were positively related to the crosssectional area of the stream and the amount of undercut banks and negatively related to the number of removal passes for bull trout, whereas for westslope cutthroat trout, the overestimates were positively related to the amount of cobble sub