• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values (2001)

by Tapio Schneider
Venue:Journal of Climate
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 14
Next 10 →

Second Order Cone Programming Approaches for Handling Missing and Uncertain Data

by Pannagadatta K. Shivaswamy, Chiranjib Bhattacharyya, Alexander J. Smola - JOURNAL OF MACHINE LEARNING RESEARCH , 2006
"... We propose a novel second order cone programming formulation for designing robust classifiers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations ..."
Abstract - Cited by 22 (6 self) - Add to MetaCart
We propose a novel second order cone programming formulation for designing robust classifiers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations are independent of the underlying distribution, requiring only the existence of second order moments. These formulations are then specialized to the case of missing values in observations for both classification and regression problems. Experiments show that the proposed formulations outperform imputation.

An integrated machine learning approach to stroke prediction

by Aditya Khosla, Hsu-kuang Chiu, Yu Cao, Junling Hu, Cliff Chiung-yu Lin, Honglak Lee - In KDD , 2010
"... Stroke is the third leading cause of death and the principal cause of serious long-term disability in the United States. Accurate prediction of stroke is highly valuable for early intervention and treatment. In this study, we compare the Cox proportional hazards model with a machine learning approac ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Stroke is the third leading cause of death and the principal cause of serious long-term disability in the United States. Accurate prediction of stroke is highly valuable for early intervention and treatment. In this study, we compare the Cox proportional hazards model with a machine learning approach for stroke prediction on the Cardiovascular Health Study (CHS) dataset. Specifically, we consider the common problems of data imputation, feature selection, and prediction in medical datasets. We propose a novel automatic feature selection algorithm that selects robust features based on our proposed heuristic: conservative mean. Combined with Support Vector Machines (SVMs), our proposed feature selection algorithm achieves a greater area under the ROC curve (AUC) as compared to the Cox proportional hazards

LETTERS Discriminants of Twentieth-Century Changes in Earth Surface Temperatures

by Tapio Schneider, Isaac M. Held , 2000
"... An approach to identifying climate changes is presented that does not hinge on simulations of natural climate variations or anthropogenic changes. Observed interdecadal climate variations are decomposed into several discriminants, mutually uncorrelated spatiotemporal components with a maximal ratio ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
An approach to identifying climate changes is presented that does not hinge on simulations of natural climate variations or anthropogenic changes. Observed interdecadal climate variations are decomposed into several discriminants, mutually uncorrelated spatiotemporal components with a maximal ratio of interdecadal-to-intradecadal variance. The dominant discriminants of twentieth-century variations in surface temperature exhibit large-scale warming in which, particularly in the Northern Hemisphere summer months, localized cooling is embedded. The structure of the large-scale warming is consistent with expected effects of increases in greenhouse gas concentrations. The localized cooling, with maxima on scales of 1000–2000 km over East Asia, eastern Europe, and North America, is suggestive of radiative effects of anthropogenic sulfate aerosols. 1.

J3.9 MULTIPLE IMPUTATION THROUGH MACHINE LEARNING ALGORITHMS

by Michael B. Richman, Theodore B. Trafalis, Indra Adrianto
"... A problem common to meteorological and climatological datasets is how to address missing data. The majority of multivariate analysis techniques require that all variables be represented for each observation; hence, some action is required in the presence of missing data. In cases where the individua ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
A problem common to meteorological and climatological datasets is how to address missing data. The majority of multivariate analysis techniques require that all variables be represented for each observation; hence, some action is required in the presence of missing data. In cases where the individual observations are thought not important, deletion of every observation missing one or more pieces of data (complete case deletion) is common. As the amount of missing data increases, tacit deletion can lead to bias in the first two statistical moments of the remaining data as population estimators and inaccuracies in subsequent analyses. What is desired is a principled method that uses information available in the remaining data to predict the missing values. Such techniques include substituting nearby data, interpolation techniques and linear regression using nearby sites as predictors. One class of technique that uses the information available in an iterative manner is known as multiple imputation. In this work, different types of machine learning techniques, such as support vector machines (SVMs) and artificial neural networks (ANNs) are tested against standard imputation methods (e.g., multiple regression), simple regression, mean substitution, and casewise deletion. All methods are used to predict the known values of climatological data which have been altered to produce missing data. These data sets are on the order of 400 variables (data station sites) and a large number of observations. Both precipitation and air temperature data are used to provide a range of inherent spatial coherence seen by analysts. The MSE of the prediction and the MAE of the variance are presented to assess the efficacy of each technique. Results indicate that the non-iterative methods, such as casewise deletion and mean substitution, lead to the largest errors and iterative imputation has considerably lower errors. Within the iterative techniques, SVMs are most promising in reducing error. 1.

under a Creative Commons License. Nonlinear Processes in Geophysics Spatio-temporal filling of missing points in geophysical data sets

by D. Kondrashov, M. Ghil
"... Abstract. The majority of data sets in the geosciences are obtained from observations and measurements of natural systems, rather than in the laboratory. These data sets are often full of gaps, due to to the conditions under which the measurements are made. Missing data give rise to various problems ..."
Abstract - Add to MetaCart
Abstract. The majority of data sets in the geosciences are obtained from observations and measurements of natural systems, rather than in the laboratory. These data sets are often full of gaps, due to to the conditions under which the measurements are made. Missing data give rise to various problems, for example in spectral estimation or in specifying boundary conditions for numerical models. Here we use Singular Spectrum Analysis (SSA) to fill the gaps in several types of data sets. For a univariate record, our procedure uses only temporal correlations in the data to fill in the missing points. For a multivariate record, multi-channel SSA (M-SSA) takes advantage of both spatial and temporal correlations. We iteratively produce estimates of missing data points, which are then used to compute a self-consistent lag-covariance matrix; cross-validation allows us to optimize the window width and number of dominant SSA or M-SSA modes to fill the gaps. The optimal parameters of our procedure depend on the distribution in time (and space) of the missing data, as well as on the variance distribution between oscillatory modes and noise. The algorithm is demonstrated on synthetic examples, as well as on data sets from oceanography, hydrology, atmospheric sciences, and space physics: global sea-surface temperature, flood-water records of the Nile River, the Southern Oscillation Index (SOI), and satellite observations of relativistic electrons. 1

Discriminants of twentieth-century changes in Earth surface temperatures

by Tapio Schneider Atmospheric, Tapio Schneider, Isaac M. Held , 2000
"... We present an approach to identifying climate changes that does not hinge on simulations of natural climate variations or anthropogenic changes. We decompose observed interdecadal climate variations into several discriminants, mutually uncorrelated spatiotemporal components with maximal ratio of int ..."
Abstract - Add to MetaCart
We present an approach to identifying climate changes that does not hinge on simulations of natural climate variations or anthropogenic changes. We decompose observed interdecadal climate variations into several discriminants, mutually uncorrelated spatiotemporal components with maximal ratio of interdecadal to intradecadal variance. The dominant discriminants of twentieth-century variations in surface temperature exhibit large-scale warming in which, particularly in the Northern Hemisphere summer months, localized cooling is embedded. The structure of the large-scale warming is consistent with expected effects of increases in greenhouse gas concentrations. The localized cooling, with maxima on scales of 1,000--2,000 km over East Asia, eastern Europe, and North America, is suggestive of radiative effects of anthropogenic sulphate aerosols.

462 JOURNAL OF CLIMATE VOLUME 16 Climate Field Reconstruction under Stationary and Nonstationary Forcing

by S. Rutherford, M. E. Mann, T. L. Delworth, R. J. Stouffer , 2001
"... The fidelity of climate reconstructions employing covariance-based calibration techniques is tested with varying levels of sparseness of available data during intervals of relatively constant (stationary) and increasing (nonstationary) forcing. These tests employ a regularized expectation-maximizati ..."
Abstract - Add to MetaCart
The fidelity of climate reconstructions employing covariance-based calibration techniques is tested with varying levels of sparseness of available data during intervals of relatively constant (stationary) and increasing (nonstationary) forcing. These tests employ a regularized expectation-maximization algorithm using surface temperature data from both the instrumental record and coupled ocean–atmosphere model integrations. The results indicate that if radiative forcing is relatively constant over a data-rich calibration period and increases over a data-sparse reconstruction period, the imputed temperatures in the reconstruction period may be biased and may underestimate the true temperature trend. However, if radiative forcing is stationary over a data-sparse reconstruction period and increases over a data-rich calibration period, the imputed values in the reconstruction period are nearly unbiased. These results indicate that using the data-rich part of the twentieth-century instrumental record (which contains an increasing temperature trend plausibly associated with increasing radiative forcing) for calibration does not significantly bias reconstructions of prior climate. 1.

Applying Machine Learning Techniques To Ecological Data

by Georgios Petkos , 2003
"... This thesis is about modelling carbon flux in forests based on meterological variables using modern machine learning techniques. The motivation is to better understand the carbon uptake process from trees and find the driving factors of it, using totally automated techniques. Data from two British f ..."
Abstract - Add to MetaCart
This thesis is about modelling carbon flux in forests based on meterological variables using modern machine learning techniques. The motivation is to better understand the carbon uptake process from trees and find the driving factors of it, using totally automated techniques. Data from two British forests were used, (Griffin and Harwood) but finally results were obtained only with Harwood because Griffin had spurious variables in it. Both data sets presented significant challenges: missing values, noise and dimensionality reduction. The missing value problem was addressed with the regularized EM algorithm, whereas for filtering out noise, n-step moving averages was used. A range of different `semi-wrapper' and a filter method have been used for dimensionality reduction: forward selection, backward elimination, best ascent hill climbing, genetic algorithms, evolutionary strategies and correlation-based feature selection. Modelling was done with Multiple Linear Regression, Multilayer Perceptrons and Support Vector Regression. The best model found had at most 83% explained variance. Support Vector Regression and Multilayer Perceptrons had almost the same performance and were better than Multiple Linear Regression, since they managed to capture non-linear details of the process.

2308 JOURNAL OF CLIMATE VOLUME 18 Proxy-Based Northern Hemisphere Surface Temperature Reconstructions: Sensitivity to Method, Predictor Network, Target Season, and Target Domain

by S. Rutherford, M. E. Mann, T. J. Osborn, R. S. Bradley, K. R. Briffa, M. K. Hughes, P. D. Jones , 2003
"... Results are presented from a set of experiments designed to investigate factors that may influence proxy-based reconstructions of large-scale temperature patterns in past centuries. The factors investigated include 1) the method used to assimilate proxy data into a climate reconstruction, 2) the pro ..."
Abstract - Add to MetaCart
Results are presented from a set of experiments designed to investigate factors that may influence proxy-based reconstructions of large-scale temperature patterns in past centuries. The factors investigated include 1) the method used to assimilate proxy data into a climate reconstruction, 2) the proxy data network used, 3) the target season, and 4) the spatial domain of the reconstruction. Estimates of hemispheric-mean temperature are formed through spatial averaging of reconstructed temperature patterns that are based on either the local calibration of proxy and instrumental data or a more elaborate multivariate climate field reconstruction approach. The experiments compare results based on the global multiproxy dataset used by Mann and coworkers, with results obtained using the extratropical Northern Hemisphere (NH) maximum latewood tree-ring density set used by Briffa and coworkers. Mean temperature reconstructions are compared for the full NH (Tropics and extratropics, land and ocean) and extratropical continents only, with varying target seasons (cold-season half year, warm-season half year, and annual mean). The comparisons demonstrate dependence of reconstructions on seasonal, spatial, and methodological considerations, emphasizing the primary importance of the target region and seasonal window of the reconstruction. The comparisons support the generally robust nature of several previously published estimates of NH mean

paleoclimatic reconstructions

by Martin P. Tingley, Peter F. Craigmile, Murali Haran, Bo Li, Bala Rajaratnam, Martin P. Tingley, Peter F. Craigmile, Murali Haran, Bo Li, Bala Rajaratnam, Martin P. Tingley, Peter F. Craigmile, Murali Haran, Bo Li , 2010
"... To whom correspondence should be addressed. ..."
Abstract - Add to MetaCart
To whom correspondence should be addressed.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University