Results 1  10
of
50
Online Aggregation
, 1997
"... Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to users and has been abandoned in most other area ..."
Abstract

Cited by 378 (44 self)
 Add to MetaCart
(Show Context)
Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to users and has been abandoned in most other areas of computing. In this paper we propose a new online aggregation interface that permits users to both observe the progress of their aggregation queries and control execution on the fly. After outlining usability and performance requirements for a system supporting online aggregation, we present a suite of techniques that extend a database system to meet these requirements. These include methods for returning the output in random order, for providing control over the relative rate at which different aggregates are computed, and for computing running confidence intervals. Finally, we report on an initial implementation of online aggregation in postgres. 1 Introduction Aggregation is an incre...
Ripple Joins for Online Aggregation
"... We present a new family of join algorithms, called ripple joins, for online processing of multitable aggregation queries in a relational database management system (dbms). Such queries arise naturally in interactive exploratory decisionsupport applications. Traditional offline join algorithms are ..."
Abstract

Cited by 183 (11 self)
 Add to MetaCart
We present a new family of join algorithms, called ripple joins, for online processing of multitable aggregation queries in a relational database management system (dbms). Such queries arise naturally in interactive exploratory decisionsupport applications. Traditional offline join algorithms are designed to minimize the time to completion of the query. In contrast, ripple joins are designed to minimize the time until an acceptably precise estimate of the query result is available, as measured by the length of a confidence interval. Ripple joins are adaptive, adjusting their behavior during processing in accordance with the statistical properties of the data. Ripple joins also permit the user to dynamically trade off the two key performance factors of online aggregation: the time between successive updates of the running aggregate, and the amount by which the confidenceinterval length decreases at each update. We show how ripple joins can be implemented in an existing dbms using iterators, and we give an overview of the methods used to compute confidence intervals and to adaptively optimize the ripple join "aspectratio" parameters. In experiments with an initial implementation of our algorithms in the postgres dbms, the time required to produce reasonably precise online estimates was up to two orders of magnitude smaller than the time required for the best offline join algorithms to produce exact answers.
A BiLevel Bernoulli Scheme for Database Sampling
 In Proceedings of ACM SIGMOD
, 2004
"... Current database sampling methods give the user insufficient control when processing ISOstyle sampling queries. To address this problem, we provide a bilevel Bernoulli sampling scheme that combines the rowlevel and pagelevel sampling methods currently used in most commercial systems. By adjustin ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Current database sampling methods give the user insufficient control when processing ISOstyle sampling queries. To address this problem, we provide a bilevel Bernoulli sampling scheme that combines the rowlevel and pagelevel sampling methods currently used in most commercial systems. By adjusting the parameters of the method, the user can systematically trade off processing speed and statistical precision—the appropriate choice of parameter settings becomes a query optimization problem. We indicate the SQL extensions needed to support bilevel sampling and determine the optimal parameter settings for an important class of sampling queries with explicit time or accuracy constraints. As might be expected, rowlevel sampling is preferable when data values on each page are homogeneous, whereas pagelevel sampling should be used when data values on a page vary widely. Perhaps surprisingly, we show that in many cases the optimal sampling policy is of the “bangbang ” type: we identify a “pageheterogeneity index ” (PHI) such that optimal sampling is as “rowlike ” as possible if the PHI is less than 1 and as “pagelike ” as possible otherwise. The PHI depends upon both the query and the data, and can be estimated by means of a pilot sample. Because pilot sampling can be nontrivial to implement in commercial database systems, we also give a heuristic method for setting the sampling parameters; the method avoids pilot sampling by using a small number of summary statistics that are maintained in the system catalog. Results from over 1100 experiments on 372 real and synthetic data sets show that the heuristic method performs optimally about half of the time, and yields sampling errors within a factor of 2.2 of optimal about 93 % of the time. The heuristic method is stable over a wide range of sampling rates and performs best in the most critical cases, where the data is highly clustered or skewed. 1.
Is Your Model Susceptible to FloatingPoint Errors?
"... For information about citing this article, click here ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
For information about citing this article, click here
Assessing Linearity in High Dimensions
, 2000
"... This paper presents a quasiregression method for determining the degree of linearity in a function, where the cost grows only as nd. A bias corrected version of quasiregression is able to estimate the degree of linearity with a sample size of order d ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
This paper presents a quasiregression method for determining the degree of linearity in a function, where the cost grows only as nd. A bias corrected version of quasiregression is able to estimate the degree of linearity with a sample size of order d
Efficient Risk Estimation via Nested Sequential Simulation
, 2010
"... We analyze the computational problem of estimating financial risk in a nested simulation. In this approach, an outer simulation is used to generate financial scenarios and an inner simulation is used to estimate future portfolio values in each scenario. We focus on one risk measure, the probability ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
We analyze the computational problem of estimating financial risk in a nested simulation. In this approach, an outer simulation is used to generate financial scenarios and an inner simulation is used to estimate future portfolio values in each scenario. We focus on one risk measure, the probability of a large loss, and we propose a new algorithm to estimate this risk. Our algorithm sequentially allocates computational effort in the inner simulation based on marginal changes in the risk estimator in each scenario. Theoretical results are given to show that the risk estimator has a faster convergence order compared to the conventional uniform inner sampling approach. Numerical results consistent with the theory are presented. 1.
Z.D.: ChipWhisperer: An OpenSource Platform for Hardware Embedded Security Research
 In: Constructive SideChannel Analysis and Secure Design  COSADE
, 2014
"... Abstract. This paper introduces a complete side channel analysis toolbox, inclusive of the analog capture hardware, target device, capture software, and analysis software. The highly modular design allows use of the hardware and software with a variety of existing systems. The hardware uses a synch ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract. This paper introduces a complete side channel analysis toolbox, inclusive of the analog capture hardware, target device, capture software, and analysis software. The highly modular design allows use of the hardware and software with a variety of existing systems. The hardware uses a synchronous capture method which greatly reduces the required sample rate, while also reducing the data storage requirement, and improving synchronization of traces. The synchronous nature of the hardware lends itself to fault injection, and a module to generate glitches of programmable width is also provided. The entire design (hardware and software) is opensource, and maintained in a publicly available repository. Several long example capture traces are provided for researchers looking to evaluate standard cryptographic implementations.
D.: Random alpha PageRank
 Internet Mathematics
, 2009
"... Abstract. We suggest a revision to the PageRank random surfer model that considers the influence of a population of random surfers on the PageRank vector. In the revised model, each member of the population has its own teleportation parameter chosen from a probability distribution, and consequently, ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract. We suggest a revision to the PageRank random surfer model that considers the influence of a population of random surfers on the PageRank vector. In the revised model, each member of the population has its own teleportation parameter chosen from a probability distribution, and consequently, the ranking vector is random. We propose three algorithms for computing the statistics of the random ranking vector based respectively on (i) random sampling, (ii) paths along the links of the underlying graph, and (iii) quadrature formulas. We find that the expectation of the random ranking vector produces similar rankings to its deterministic analogue, but the standard deviation gives uncorrelated information (under a Kendalltau metric) with myriad potential uses. We examine applications of this model to web spam. 1.
A gridbased algorithm for ondevice GSM positioning
 In Ubicomp ’10: Proceedings of the 12th ACM international conference on Ubiquitous computing
, 2010
"... We propose a gridbased GSM positioning algorithm that can be deployed entirely on mobile devices. The algorithm uses Gaussian distributions to model signal intensity variations within each grid cell. Position estimates are calculated by combining a probabilistic centroid algorithm with particle fil ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
We propose a gridbased GSM positioning algorithm that can be deployed entirely on mobile devices. The algorithm uses Gaussian distributions to model signal intensity variations within each grid cell. Position estimates are calculated by combining a probabilistic centroid algorithm with particle filtering. In addition to presenting the positioning algorithm, we describe methods that can be used to create, update and maintain radio maps on a mobile device. We have implemented the positioning algorithm on Nokia S60 and Nokia N900 devices and we evaluate the algorithm using a combination of offline and real world tests. The results indicate that the accuracy of our method is comparable to stateoftheart methods, while at the same time having significantly smaller storage requirements.