Results 11  20
of
917
Linkage and autocorrelation cause feature selection bias in relational learning
 In Proc. of the 19th Intl Conference on Machine Learning
, 2002
"... Two common characteristics of relational data sets — concentrated linkage and relational autocorrelation — can cause learning algorithms to be strongly biased toward certain features, irrespective of their predictive power. We identify these characteristics, define quantitative measures of their sev ..."
Abstract

Cited by 95 (32 self)
 Add to MetaCart
Two common characteristics of relational data sets — concentrated linkage and relational autocorrelation — can cause learning algorithms to be strongly biased toward certain features, irrespective of their predictive power. We identify these characteristics, define quantitative measures of their severity, and explain how they produce this bias. We show how linkage and autocorrelation affect a representative algorithm for feature selection by applying the algorithm to synthetic data and to data drawn from the Internet Movie Database. 1.1 Relational Data and Statistical Dependence Figure 1 presents two simple relational data sets. In each
A generalized moments estimator for the autoregressive parameter in a spatial model
 International Economic Review
, 1999
"... This paper is concerned with the estimation of the autoregressive parameter in a widely considered spatial autocorrelation model. The typical estimator for this parameter considered in the literature is the (quasi) maximum likelihood estimator corresponding to a normal density. However, as discussed ..."
Abstract

Cited by 95 (12 self)
 Add to MetaCart
This paper is concerned with the estimation of the autoregressive parameter in a widely considered spatial autocorrelation model. The typical estimator for this parameter considered in the literature is the (quasi) maximum likelihood estimator corresponding to a normal density. However, as discussed in the paper, the (quasi) maximum likelihood estimator may not be computationally feasible in many cases involving moderate or large sized samples. In this paper we suggest a generalized moments estimator that is computationally simple irrespective of the sample size. We provide results concerning the large and small sample properties of this estimator. 1 Introduction 1 There exists a large body of literature that considers autocorrelation of the disturbances across cross sectional units for panel data, i.e., data which are observed both across cross sectional units and over time. However, the estimation of models that permit for autocorrelation of the disturbances across
Nearoptimal sensor placements: Maximizing information while minimizing communication cost
 In IPSN
, 2006
"... When monitoring spatial phenomena with wireless sensor networks, selecting the best sensor placements is a fundamental task. Not only should the sensors be informative, but they should also be able to communicate efficiently. In this paper, we present a datadriven approach that addresses the three ..."
Abstract

Cited by 89 (16 self)
 Add to MetaCart
When monitoring spatial phenomena with wireless sensor networks, selecting the best sensor placements is a fundamental task. Not only should the sensors be informative, but they should also be able to communicate efficiently. In this paper, we present a datadriven approach that addresses the three central aspects of this problem: measuring the predictive quality of a set of sensor locations (regardless of whether sensors were ever placed at these locations), predicting the communication cost involved with these placements, and designing an algorithm with provable quality guarantees that optimizes the NPhard tradeoff. Specifically, we use data from a pilot deployment to build nonparametric probabilistic models called Gaussian Processes (GPs) both for the spatial phenomena of interest and for the spatial variability of link qualities, which allows us to estimate predictive power and communication cost of unsensed locations. Surprisingly, uncertainty in the representation of link qualities plays an important role in estimating communication costs. Using these models, we present a novel, polynomialtime, datadriven algorithm, pSPIEL, which selects Sensor Placements at Informative and costEffective Locations. Our approach exploits two important properties of this problem: submodularity, formalizing the intuition that adding a node to a small deployment can help more than adding a node to a large deployment; and locality, under which nodes that are far from each other provide almost independent information. Exploiting these properties, we prove strong approximation guarantees for our pSPIEL approach. We also provide extensive experimental validation of this practical approach on several realworld placement problems, and built a complete system implementation on 46 Tmote Sky motes, demonstrating significant advantages over existing methods.
Detecting Features in Spatial Point Processes with . . .
, 1995
"... We consider the problem of detecting features in spatial point processes in the presence of substantial clutter. One example is the detection of mine elds using reconnaissance aircraft images that erroneously identify many objects that are not mines. Another is the detection of seismic faults on the ..."
Abstract

Cited by 81 (31 self)
 Add to MetaCart
We consider the problem of detecting features in spatial point processes in the presence of substantial clutter. One example is the detection of mine elds using reconnaissance aircraft images that erroneously identify many objects that are not mines. Another is the detection of seismic faults on the basis of earthquake catalogs: earthquakes tend to be clustered close to the faults, but there are many that are farther away. Our solution uses modelbased clustering based on a mixture model for the process, in which features are assumed to generate points according to highly linear multivariate normal densities, and the clutter arises according to a spatial Poisson process. Very nonlinear features are represented by several highly linear multivariate normal densities, giving a piecewise linear representation. The model is estimated in two stages. In the rst stage, hierarchical modelbased clustering is used to provide a rst estimate of the features. In the second stage, this clustering is re ned using the EM algorithm. The number of features is found using an approximation to the posterior probability of each number of features. For the minefield
On Conditional and Intrinsic Autoregressions
, 1995
"... This paper discusses standard and intrinsic autoregressions and describes how the problems that arise can be alleviated using Dempster's (1972) algorithm or an appropriate modification. The approach partly represents a synthesis of standard geostatistical and Gaussian Markov random field formulation ..."
Abstract

Cited by 75 (6 self)
 Add to MetaCart
This paper discusses standard and intrinsic autoregressions and describes how the problems that arise can be alleviated using Dempster's (1972) algorithm or an appropriate modification. The approach partly represents a synthesis of standard geostatistical and Gaussian Markov random field formulations. Some nonspatial applications are also mentioned. Some key words: Agricultural experiments; Bayesian image analysis; Conditional autoregressions; Dempster's algorithm; Geographical epidemiology; Geostatistics; Intrinsic autoregressions; Multiway tables; Prior distributions; Spatial statistics; Surface reconstruction; Texture analysis. 1 Introduction
Approximating MultiDimensional Aggregate Range Queries Over Real Attributes
, 2000
"... Finding approximate answers to multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a quer ..."
Abstract

Cited by 74 (8 self)
 Add to MetaCart
Finding approximate answers to multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a query that specifies a range in each dimension, find a good approximation of the number of records in the table that satisfy the query. We present a new histogram technique that is designed to approximate the density of multidimensional datasets with real attributes. Our technique finds buckets of variable size, and allows the buckets to overlap. Overlapping buckets allow more efficient approximation of the density. The size of the cells is based on the local density of the data. This technique leads to a faster and more compact approximation of the data distribution. We also show how to generalize kernel density estimators, and how to apply them on the multidimensional query approxim...
Efficient Implementation of Gaussian Processes
, 1997
"... Neural networks and Bayesian inference provide a useful framework within which to solve regression problems. However their parameterization means that the Bayesian analysis of neural networks can be difficult. In this paper, we investigate a method for regression using Gaussian process priors which ..."
Abstract

Cited by 72 (4 self)
 Add to MetaCart
Neural networks and Bayesian inference provide a useful framework within which to solve regression problems. However their parameterization means that the Bayesian analysis of neural networks can be difficult. In this paper, we investigate a method for regression using Gaussian process priors which allows exact Bayesian analysis using matrix manipulations. We discuss the workings of the method in detail. We will also detail a range of mathematical and numerical techniques that are useful in applying Gaussian processes to general problems including efficient approximate matrix inversion methods developed by Skilling. 1 Introduction Neural networks and Bayesian inference have provided a useful framework within which to solve regression problems (MacKay 1992a) (MacKay 1992b). However due to the parameterization of a neural network, implementations of the Bayesian analysis of a neural network require either maximum aposteriori approximations (MacKay 1992b) or the evaluation of integrals u...
Gaussian processes for machine learning
 International Journal of Neural Systems
, 2004
"... Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. ..."
Abstract

Cited by 66 (15 self)
 Add to MetaCart
Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible nonparametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations [13, 78, 31]. The mathematical literature on GPs is large and often uses deep
Spatial Econometrics
 PALGRAVE HANDBOOK OF ECONOMETRICS: VOLUME 1, ECONOMETRIC THEORY
, 2001
"... Spatial econometric methods deal with the incorporation of spatial interaction and spatial structure into regression analysis. The field has seen a recent and rapid growth spurred both by theoretical concerns as well as by the need to be able to apply econometric models to emerging large geocoded da ..."
Abstract

Cited by 64 (5 self)
 Add to MetaCart
Spatial econometric methods deal with the incorporation of spatial interaction and spatial structure into regression analysis. The field has seen a recent and rapid growth spurred both by theoretical concerns as well as by the need to be able to apply econometric models to emerging large geocoded data bases. The review presented in this chapter outlines the basic terminology and discusses in some detail the specification of spatial effects, estimation of spatial regression models, and specification tests for spatial effects.
Bayesian Calibration of Computer Models
 Journal of the Royal Statistical Society, Series B, Methodological
, 2000
"... this paper a Bayesian approach to the calibration of computer models. We represent the unknown inputs as a parameter vector `. Using the observed data we derive the posterior distribution of `, which in particular quantifies the `residual uncertainty' about ..."
Abstract

Cited by 62 (1 self)
 Add to MetaCart
this paper a Bayesian approach to the calibration of computer models. We represent the unknown inputs as a parameter vector `. Using the observed data we derive the posterior distribution of `, which in particular quantifies the `residual uncertainty' about