Results 1 -
9 of
9
Rapid Detection of Significant Spatial Clusters
- In KDD
, 2004
"... Given an NN grid of squares, where each square has a count c i j and an underlying population p i j , our goal is to find the rectangular region with the highest density, and to calculate its significance by randomization. An arbitrary density function D, dependent on a region 's total count C and t ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
Given an NN grid of squares, where each square has a count c i j and an underlying population p i j , our goal is to find the rectangular region with the highest density, and to calculate its significance by randomization. An arbitrary density function D, dependent on a region 's total count C and total population P, can be used. For example, if each count represents the number of disease cases occurring in that square, we can use Kulldorff's spatial scan statistic D K to find the most significant spatial disease cluster. A naive approach to finding the maximum density region requires O(N ) time, and is generally computationally infeasible. We present a multiresolution algorithm which partitions the grid into overlapping regions using a novel overlap-kd tree data structure, bounds the maximum score of subregions contained in each region, and prunes regions which cannot contain the maximum density region. For sufficiently dense regions, this method finds the maximum density region in ) time, in practice resulting in significant (20-2000x) speedups on both real and simulated datasets.
A Fast Multi-Resolution Method for Detection of Significant Spatial Overdensities
- Advances in Neural Information Processing Systems 16
, 2003
"... Given an N N grid of squares, where each square s ij has a count c ij and an underlying population p ij , our goal is to nd the square region S with the highest density, and to calculate the signi cance of this region by Monte Carlo testing. Any density measure D, which depends on the total count ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
Given an N N grid of squares, where each square s ij has a count c ij and an underlying population p ij , our goal is to nd the square region S with the highest density, and to calculate the signi cance of this region by Monte Carlo testing. Any density measure D, which depends on the total count and total population of the region, can be used. For example, if each count c ij represents the number of disease cases occurring in that square, we can use Kulldor's spatial scan statistic DK to nd the most signi cant spatial disease cluster. A naive approach to nding the region of maximum density would be to calculate the density measure for every square region: this requires O(RN ) calculations, where R is the number of Monte Carlo replications, and hence is generally computationally infeasible. We present a novel multi-resolution algorithm which partitions the grid into overlapping regions, bounds the maximum score of subregions contained in each region, and prunes regions which cannot contain the maximum density region. For suciently dense regions, this method nds the maximum density region in optimal O(RN ) time, and in practice it results in signi cant (10-200x) speedups as compared to the naive approach.
Residual analysis for spatial point processes (with discussion
- Journal of the Royal Statistical Society (series B
, 2005
"... [Read before The Royal Statistical Society at a meeting organized by the Research Section on ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
[Read before The Royal Statistical Society at a meeting organized by the Research Section on
Detecting significant multidimensional spatial clusters
- Advances in Neural Information Processing Systems 17
, 2005
"... Assume a uniform, multidimensional grid of bivariate data, where each cell of the grid has a count ci and a baseline bi. Our goal is to find spatial regions (d-dimensional rectangles) where the ci are significantly higher than expected given bi. We focus on two applications: detection of clusters of ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
Assume a uniform, multidimensional grid of bivariate data, where each cell of the grid has a count ci and a baseline bi. Our goal is to find spatial regions (d-dimensional rectangles) where the ci are significantly higher than expected given bi. We focus on two applications: detection of clusters of disease cases from epidemiological data (emergency department visits, over-the-counter drug sales), and discovery of regions of increased brain activity corresponding to given cognitive tasks (from fMRI data). Each of these problems can be solved using a spatial scan statistic (Kulldorff, 1997), where we compute the maximum of a likelihood ratio statistic over all spatial regions, and find the significance of this region by randomization. However, computing the scan statistic for all spatial regions is generally computationally infeasible, so we introduce a novel fast spatial scan algorithm, generalizing the 2D scan algorithm of (Neill and Moore, 2004) to arbitrary dimensions. Our new multidimensional multiresolution algorithm allows us to find spatial clusters up to 1400x faster than the naive spatial scan, without any loss of accuracy. 1
1 Methods for Extracting Place Semantics from Flickr Tags
"... We describe an approach for extracting semantics for tags, unstructured text-labels assigned to resources on the Web, based on each tag’s usage patterns. In particular, we focus on the problem of extracting place semantics for tags that are assigned to photos on Flickr, a popular-photo sharing Web s ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We describe an approach for extracting semantics for tags, unstructured text-labels assigned to resources on the Web, based on each tag’s usage patterns. In particular, we focus on the problem of extracting place semantics for tags that are assigned to photos on Flickr, a popular-photo sharing Web site that supports location (latitude/longitude) metadata for photos. We propose the adaptation of two baseline methods, inspired by well-known burst-analysis techniques, for the task; we also describe two novel methods, TagMaps and scale-structure identification. We evaluate the methods on a subset of Flickr data. We show that our scale-structure identification method outperforms existing techniques and that a hybrid approach generates further improvements (achieving 85% precision at 81 % recall). The approach and methods described in this work can be used in other domains such as geo-annotated Web pages, where text terms can be extracted and associated with usage patterns.
Spatial Scan Statistics for Graph Clustering
"... In this paper, we present a measure associated with detection and inference of statistically anomalous clusters of a graph based on the likelihood test of observed and expected edges in a subgraph. This measure is adapted from spatial scan statistics for point sets and provides quantitative assessme ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper, we present a measure associated with detection and inference of statistically anomalous clusters of a graph based on the likelihood test of observed and expected edges in a subgraph. This measure is adapted from spatial scan statistics for point sets and provides quantitative assessment for clusters. We discuss some important properties of this statistic and its relation to modularity and Bregman divergences. We apply a simple clustering algorithm to find clusters with large values of this measure in a variety of real-world data sets, and we illustrate its ability to identify statistically significant clusters of selected granularity. 1 Introduction. Numerous techniques have been proposed for identifying clusters in large networks, but it has proven difficult to
ABSTRACT Towards Extracting Flickr Tag Semantics
"... We address the problem of extracting semantics of tags – short, unstructured text-labels assigned to resources on the Web – based on each tag’s metadata patterns. In particular, we describe an approach for extracting place and event semantics for tags that are assigned to photos on Flickr, a popular ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We address the problem of extracting semantics of tags – short, unstructured text-labels assigned to resources on the Web – based on each tag’s metadata patterns. In particular, we describe an approach for extracting place and event semantics for tags that are assigned to photos on Flickr, a popular photo sharing website supporting time and location (latitude/longitude) metadata. The approach can be generalized to other domains where text terms can be extracted and associated with metadata patterns, such as geoannotated web pages.
Spatial Point Process Models of Defensive Strategies: Detecting Changes
, 2004
"... Abstract. The study of stochastic processes can take many forms. Theoretical properties are important to ensure consistent model definition. Statistical inference on unknown parameters is equally important but can be difficult. This is principally because many of the standard assumptions for proving ..."
Abstract
- Add to MetaCart
Abstract. The study of stochastic processes can take many forms. Theoretical properties are important to ensure consistent model definition. Statistical inference on unknown parameters is equally important but can be difficult. This is principally because many of the standard assumptions for proving consistency and asymptotic normality of estimators involve independence and homogeneity. In the case where inference is concerned with detecting change in a spatial process from one time point to another, a statistical-computing approach can be rewarding. Regardless of the complexity of the stochastic process, if simulating from it is relatively easy, then detecting change is possible using a Monte Carlo approach. The methodology is applied in a military scenario, where a country’s defensive posture changes as a function of its perceived threat. For tactical-decision purposes, it is extremely important to know whether the country’s perceived threat level has changed.
Discretized Spatio-Temporal Scan Window
"... The focus of this paper is the discovery of anomalous spatio-temporal windows. We propose a Discretized Spatio-Temporal Scan Window approach to address the question of how we can treat Space and Time together without compromising on the properties of each and their impact on each other. In doing so ..."
Abstract
- Add to MetaCart
The focus of this paper is the discovery of anomalous spatio-temporal windows. We propose a Discretized Spatio-Temporal Scan Window approach to address the question of how we can treat Space and Time together without compromising on the properties of each and their impact on each other. In doing so we discover anomalous Spatio-Temporal windows, identify at what point in time the window changes, identify the spatial patterns of change over time and identify a spatial extent in time which is completely deviant with respect to the rest of the anomalous spatiotemporal windows. None of the current approaches address all these issues in combination. Subsequently we perform experiments on several real world datasets to validate our approach while comparing with the established approach of discovering a cylindrical spatio-temporal Scan window. 1

