Results 1  10
of
35
Mining of concurrent text and time series
 In proceedings of the 6 th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining Workshop on Text Mining
, 2000
"... ..."
Markov Random Fields in Statistics
, 1990
"... For nearly a century, statisticians have been intrigued by the problems of developing a satisfactory methodology for the analysis of spatial data; see Student (1914), for an early example. It is only since the early 1970's, however, that the statistical analysis of large data sets, using flexible pa ..."
Abstract

Cited by 43 (0 self)
 Add to MetaCart
For nearly a century, statisticians have been intrigued by the problems of developing a satisfactory methodology for the analysis of spatial data; see Student (1914), for an early example. It is only since the early 1970's, however, that the statistical analysis of large data sets, using flexible parametric models has become a feasible proposition. On the practical side, progress has been made possible by the availability of relatively cheap, computerised resources for the collection and analysis of data. The study of digital images and the use of satellite data for remote sensing are prominent examples in this respect. On the methodological side, substantial progress is associated with the introduction of Markov random fields (MRFs), as a class of parametric models for spatial data (Besag 1974). Shaped by these developments, spatial statistics has emerged as perhaps the most dynamic and computer intensive of all the areas of statistical endeavour; building upon models used originally...
Overfitting Explained
, 1997
"... Overfitting arises when model components are evaluated against the wrong reference distribution. Most modeling algorithms iteratively find the best of several components and then test whether this component is good enough to add to the model. We show that for independently distributed random variabl ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Overfitting arises when model components are evaluated against the wrong reference distribution. Most modeling algorithms iteratively find the best of several components and then test whether this component is good enough to add to the model. We show that for independently distributed random variables, the reference distribution for any one variable underestimates the reference distribution for the the highestvalued variable # thus variate values will appear significant when they are not, and model components will be added when they should not be added. We relate this problem to the wellknown statistical theory of multiple comparisons or simultaneous inference.
Wavelet Analysis of Covariance with Application to Atmospheric Time Series
 J. Geophys. Res
, 2000
"... Introduction The bivariate relationship between two time series is often of crucial interest in atmospheric science. For example, the MaddenJulian oscillation (MJO) [3] was found using bivariate spectral analysis between the station pressure and zonal wind components at Canton Island  specifica ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Introduction The bivariate relationship between two time series is often of crucial interest in atmospheric science. For example, the MaddenJulian oscillation (MJO) [3] was found using bivariate spectral analysis between the station pressure and zonal wind components at Canton Island  specifically the cospectrum and magnitude squared coherence. We introduce the wavelet covariance and correlation between two time series based upon the maximal overlap DWT [6] and Daubechies families of wavelets [1, Sec. 6.2]. Key points include 4 the wavelet covariance decomposes the usual covariance on a scale by scale basis, 4 approximate confidence intervals may be calculated for estimators of the wavelet covariance and correlation, and 4 the wavelet crosscovariance and crosscorrelation are used to investigate lead/lag relationships. We apply these wavelet estimators to the bivariate analysis of the Southern Oscillation Index (SOI), an
Adjusting for multiple comparisons in decision tree pruning
 Proc. 3rd Int. Conf. on Knowledge Discovery & Data Mining (KDD97
, 1997
"... Pruning is a common technique to avoid over tting in decision trees. Most pruning techniques do not account for one important factor  multiple comparisons. Multiple comparisons occur when an induction algorithm examines several candidate models and selects the one that best accords with the data. M ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
Pruning is a common technique to avoid over tting in decision trees. Most pruning techniques do not account for one important factor  multiple comparisons. Multiple comparisons occur when an induction algorithm examines several candidate models and selects the one that best accords with the data. Making multiple comparisons produces incorrect inferences about model accuracy. We examine a method that adjusts for multiple comparisons when pruning decision trees { Bonferroni pruning. In experiments with arti cial and realistic datasets, Bonferroni pruning produces smaller trees that are at least as accurate as trees pruned using other common approaches.
Tensor diagonalization, a useful tool in signal processing
 IFAC SYMPOSIUM ON SYSTEM IDENTIFICATION
, 1994
"... Tensors appear more and more often in signal processing problems, and especially spatial processing, which typically involves multichannel modeling. Even if it is not always obvious that tensor algebra is the best framework to address a problem, there are cases where no choice is left. Blind identif ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
Tensors appear more and more often in signal processing problems, and especially spatial processing, which typically involves multichannel modeling. Even if it is not always obvious that tensor algebra is the best framework to address a problem, there are cases where no choice is left. Blind identification of multichannel non monic MA models is given as an illustrating example of this claim.
Web Proxy Workload Characterisation And Modelling
, 1999
"... Understanding WWW traffic characteristics is key to improving the performance and scalability of the Web. In the first part of this thesis, Web proxy workloads from different levels of a caching hierarchy are used to understand how the workload characteristics change across different levels of a cac ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Understanding WWW traffic characteristics is key to improving the performance and scalability of the Web. In the first part of this thesis, Web proxy workloads from different levels of a caching hierarchy are used to understand how the workload characteristics change across different levels of a caching hierarchy. The main observations of this study are: HTML and image documents account for 95% of the documents seen in the workload; the distribution of transfer sizes of documents is heavytailed, with the tails becoming heavier as one moves from the client side to the server side of the network; the popularity profile of documents does not precisely follow the Zipf distribution; onetimers account for approximately 70% of the documents referenced; concentration of references is less at proxy caches than at servers, and concentration of references is higher at lowerlevel proxies than at higherlevel proxies; there appears to be no correlation between document modification rate and document pop...
Assessing Nonstationary Time Series Using Wavelets
, 1998
"... The discrete wavelet transform has be used extensively in the field of Statistics, mostly in the area of "denoising signals" or nonparametric regression. This thesis provides a new application for the discrete wavelet transform, assessing nonstationary events in time series  especially long memory ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
The discrete wavelet transform has be used extensively in the field of Statistics, mostly in the area of "denoising signals" or nonparametric regression. This thesis provides a new application for the discrete wavelet transform, assessing nonstationary events in time series  especially long memory processes. Long memory processes are those which exhibit substantial correlations between events separated by a long period of time. Departures from stationarity in these heavily autocorrelated time series, such as an abrupt change in the variance at an unknown location or "bursts" of increased variability, can be detected and accurately located using discrete wavelet transforms  both orthogonal and overcomplete. A cumulative sum of squares method, utilizing a KolomogorovSmirnovtype
Measuring Performance of Ad Hoc Networks Using Timescales for Information Flow
, 2003
"... We define metrics to characterize the performance of ad hoc networks based on timescales for information flow, power consumption and interference. The statistical distribution of timescales has not been previously considered. Yet, it is important for understanding the feasibility of communicating ov ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We define metrics to characterize the performance of ad hoc networks based on timescales for information flow, power consumption and interference. The statistical distribution of timescales has not been previously considered. Yet, it is important for understanding the feasibility of communicating over such networks, for comparing different algorithms for building up network topology and for distinguishing regimes of routing. We quantify the longest timescale for information flow and estimate its distribution. We also introduce a decentralized adaptive power algorithm, that uses only information local to each device, for building ad hoc networks. This algorithm is shown to perform significantly better by all our metrics when compared with a standard, constant power, algorithm.
Tracking land cover change in a mixed logit model: recognizing temporal and spatial effects
, 2006
"... As an essential part of integrated land usetransport models, prediction of land cover changes and illumination of the many factors behind such change are always of interest to planners, policy makers, developers and others. Using a mixed logit framework, this paper studies land cover evolution in t ..."
Abstract

Cited by 6 (6 self)
 Add to MetaCart
As an essential part of integrated land usetransport models, prediction of land cover changes and illumination of the many factors behind such change are always of interest to planners, policy makers, developers and others. Using a mixed logit framework, this paper studies land cover evolution in the Austin, Texas region, recognizing distancedependent correlations both observed and unobserved  over space and time, in a sea of satellite image pixels. The paper describes the computational methods used for model estimation and application, including generalized Cholesky decomposition and likelihood simulation. Results indicate that neighborhood characteristics have strong effects on land cover evolution: Clustering is significant over time, but high residential densities can impede future development. Model application produces graphic predictions, allowing one to visually confirm these results and appreciate the variability in potential urban futures.