Results 1  10
of
97
The History of Histograms (abridged)
 PROC. OF VLDB CONFERENCE
, 2003
"... The history of histograms is long and rich, full of detailed information in every step. It includes the course of histograms in diFFerent scientific fields, the successes and failures of histograms in approximating and compressing information, their adoption by industry, and solutions that hav ..."
Abstract

Cited by 84 (0 self)
 Add to MetaCart
The history of histograms is long and rich, full of detailed information in every step. It includes the course of histograms in diFFerent scientific fields, the successes and failures of histograms in approximating and compressing information, their adoption by industry, and solutions that have been given on a great variety of histogramrelated problems. In this paper and in the same spirit of the histogram techniques themselves, we compress their entire history (including their "future history" as currently anticipated) in the given/fixed space budget, mostly recording details for the periods, events, and results with the highest (personallybiased) interest. In a limited set of experiments, the semantic distance between the compressed and the full form of the history was found relatively small!
Models of integration given multiple sources of information
 Psychol. Rev
, 1990
"... Several models of information integration are developed and analyzed within the context ofa prototypical patternrecognition task. The central concerns are whether the models prescribe maximally efficient (optimal) integration and to what extent the models are psychologically valid. Evaluation, inte ..."
Abstract

Cited by 49 (17 self)
 Add to MetaCart
Several models of information integration are developed and analyzed within the context ofa prototypical patternrecognition task. The central concerns are whether the models prescribe maximally efficient (optimal) integration and to what extent the models are psychologically valid. Evaluation, integration, and decision processes are specified for each model. Important features are whether evaluation is noisy, whether integration follows Bayes's theorem, and whether decision consists of a criterion rule or a relative goodness rule. Simulations of the models and predictions of the results by the same models are carried out to provide a measure of identifiability or the extent to which the models can be distinguished from one another. The models are also contrasted against empirical results from tasks with 2 and 4 response alternatives and with graded responses. Conceptual Framework There is a growing consensus that behavior reflects the influence of multiple sources of information. Auditory and visual perception, reading and speech perception, and decision making and judgment are modulated by a wide variety of influences
The unicorn, the normal curve, and other improbable creatures
 Psychological Bulletin
, 1989
"... An investigation of the distributional characteristics of 440 largesample achievement and psychometric measures found all to be significantly nonnormal at the alpha.01 significance level. Several classes of contamination were found, including tail weights from the uniform to the double exponential, ..."
Abstract

Cited by 41 (0 self)
 Add to MetaCart
An investigation of the distributional characteristics of 440 largesample achievement and psychometric measures found all to be significantly nonnormal at the alpha.01 significance level. Several classes of contamination were found, including tail weights from the uniform to the double exponential, exponentiallevel asymmetry, severe digit preferences, multimodalities, and modes external to the mean/median interval. Thus, the underlying tenets of normalityassuming statistics appear fallacious for these commonly used types of data. However, findings here also fail to support the types of distributions used in most prior robustness research suggesting the failure of such statistics under nonnormal conditions. A reevaluation of the statistical robustness literature appears appropriate in light of these findings. 1 During recent years a considerable literature devoted to robust statistics has appeared. This research reflects a growing concern among statisticians regarding the robustness, or insensitivity, of parametric statistics to violations of their underlying assumptions. Recent findings suggest that the most commonly used of these statistics exhibit varying degrees of nonrobustness to certain violations of the normality assumption. Although the importance of such findings is underscored by numerous empirical studies documenting nonnormality in a variety of fields, a startling lack of such evidence exists for achievement
Statistical Themes and Lessons for Data Mining
, 1997
"... Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statist ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statistical themes and lessons that are directly relevant to data mining and attempts to identify opportunities where close cooperation between the statistical and computational communities might reasonably provide synergy for further progress in data analysis.
Modern statistical estimation via oracle inequalities
, 2006
"... A number of fundamental results in modern statistical theory involve thresholding estimators. This survey paper aims at reconstructing the history of how thresholding rules came to be popular in statistics and describing, in a not overly technical way, the domain of their application. Two notions pl ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
A number of fundamental results in modern statistical theory involve thresholding estimators. This survey paper aims at reconstructing the history of how thresholding rules came to be popular in statistics and describing, in a not overly technical way, the domain of their application. Two notions play a fundamental role in our narrative: sparsity and oracle inequalities. Sparsity is a property of the object to estimate, which seems to be characteristic of many modern problems, in statistics as well as applied mathematics and theoretical computer science, to name a few. ‘Oracle inequalities’ are a powerful decisiontheoretic tool which has served to understand the optimality of thresholding rules, but which has many other potential applications, some of which we will discuss. Our story is also the story of the dialogue between statistics and applied harmonic analysis. Starting with the work of Wiener, we will see that certain representations emerge as being optimal for estimation. A leitmotif throughout
Milestones in the history of thematic cartography, statistical graphics, and data visualization
 13TH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2002), AIX EN PROVENCE
, 1995
"... ..."
One and done? Optimal decisions from very few samples
 Cognitive Science Society
, 2009
"... In many situations human behavior approximates that of a Bayesian ideal observer, suggesting that, at some level, cognition can be described as Bayesian inference. However, a number of findings have highlighted an intriguing mismatch between human behavior and that predicted by Bayesian inference: p ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
In many situations human behavior approximates that of a Bayesian ideal observer, suggesting that, at some level, cognition can be described as Bayesian inference. However, a number of findings have highlighted an intriguing mismatch between human behavior and that predicted by Bayesian inference: people often appear to make judgments based on a few samples from a probability distribution, rather than the full distribution. Although samplebased approximations are a common implementation of Bayesian inference, the very limited number of samples used by humans seems to be insufficient to approximate the required probability distributions. Here we consider this discrepancy in the broader framework of statistical decision theory, and ask: if people were making decisions based on samples, but samples were costly, how many samples should people use? We find that under reasonable assumptions about how long it takes to produce a sample, locally suboptimal decisions based on few samples are globally optimal. These results reconcile a large body of work showing sampling, or probabilitymatching, behavior with the hypothesis that human cognition is well described as Bayesian inference, and suggest promising future directions for studies of resourceconstrained cognition.
Faster least squares approximation
 Numerische Mathematik
"... Least squares approximation is a technique to find an approximate solution to a system of linear equations that has no exact solution. Methods dating back to Gauss and Legendre find a solution in O(nd 2) time, where n is the number of constraints and d is the number of variables. We present two rand ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
Least squares approximation is a technique to find an approximate solution to a system of linear equations that has no exact solution. Methods dating back to Gauss and Legendre find a solution in O(nd 2) time, where n is the number of constraints and d is the number of variables. We present two randomized algorithms that provide very accurate relativeerror approximations to the solution of a least squares approximation problem more rapidly than existing exact algorithms. Both of our algorithms preprocess the data with a randomized Hadamard transform. One then uniformly randomly samples constraints and solves the smaller problem on those constraints, and the other performs a sparse random projection and solves the smaller problem on those projected coordinates. In both cases, the solution to the smaller problem provides a relativeerror approximation to the exact solution and can be computed in o(nd 2) time. 1
The null ritual: What you always wanted to know about null hypothesis testing but were afraid to ask
 Handbook on Quantitative Methods in the Social Sciences. Sage, Thousand Oaks, CA
, 2004
"... No scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. (Ronald A. Fisher, 1956, p. 42) It is tempting, if the only tool you have i ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
No scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. (Ronald A. Fisher, 1956, p. 42) It is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail. (A. H. Maslow, 1966, pp. 15–16) One of us once had a student who ran an experiment for his thesis. Let us call him Pogo. Pogo had an experimental group and a control group and found that the means of both groups were exactly the same. He believed it would be unscientific to simply state this result; he was anxious to do a significance test. The result of the test was that the two means did not differ significantly, which Pogo reported in his thesis. In 1962, Jacob Cohen reported that the experiments published in a major psychology journal had, on average, only a 50: 50 chance of detecting a mediumsized effect if there was one. That is, the statistical power was as low as 50%. This result was widely cited, but did it change researchers’ practice? Sedlmeier and Gigerenzer (1989) checked the studies in the same journal, 24 years later, a time period that should allow for change. Yet only 2 out of 64 researchers mentioned power,