## Aide-Memoire. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality (2000)

### BibTeX

@MISC{Donoho00aide-memoire.high-dimensional,

author = {David L. Donoho},

title = {Aide-Memoire. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality},

year = {2000}

}

### OpenURL

### Abstract

The coming century is surely the century of data. A combination of blind faith and serious purpose makes our society invest massively in the collection and processing of data of all kinds, on scales unimaginable until recently. Hyperspectral Imagery, Internet Portals, Financial tick-by-tick data, and DNA Microarrays are just a few of the betterknown sources, feeding data in torrential streams into scientific and business databases worldwide. In traditional statistical data analysis, we think of observations of instances of particular phenomena (e.g. instance ↔ human being), these observations being a vector of values we measured on several variables (e.g. blood pressure, weight, height,...). In traditional statistical methodology, we assumed many observations and a few, wellchosen variables. The trend today is towards more observations but even more so, to radically larger numbers of variables – voracious, automatic, systematic collection of hyper-informative detail about each observed instance. We are seeing examples where the observations gathered on individual instances are curves, or spectra, or images, or

### Citations

2339 |
A wavelet tour of signal processing
- Mallat
- 1998
(Show Context)
Citation Context ...t years, the discipline of computational harmonic analysis has developed large 22collections of bases and frames such as wavelets, wavelet packets, cosine packets, Wilson bases, brushlets, and so on =-=[16, 39]-=-. As we have seen, in two cases, that a basis being sought numerically by a procedure like principal components analysis or independent components analysis, will resemble some previously-known basis d... |

1445 |
Independent component analysis, a new concept
- Comon
- 1994
(Show Context)
Citation Context ...r large entries in the output of the rank-k approximation for appropriate k, in UDkV ′ α. In the last decade, an important alternative to PCA has been developed: ICA – independent components analysis =-=[13, 2, 11]-=-. It is valuable when, for physical reasons, we really expect the model X = AS to hold for an unknown A and a sparse or nonGaussian S. The matrix A need not be orthogonal. An example where this occurs... |

1212 |
Pattern Recognition and Neural Networks
- Ripley
- 1996
(Show Context)
Citation Context ...imensional – because we take the view of N points in a D-dimensional space. In this section we describe a number of fundamental tasks of data analysis. Good references on some of these issues include =-=[41, 51, 66]-=-; I use these often in teaching. 4.1 Classification In classification, one of the D variables is an indicator of class membership. Examples include: in a consumer financial data base, most of the vari... |

1180 | An information-maximization approach to blind separation and blind deconvolution
- Bell, Sejnowski
- 1995
(Show Context)
Citation Context ... will be a basis of imagelets (2-d image patches) or movielets (if we were analyzing 3-d image patches). Interesting examples of this work include work by Olshausen and Field [47], Bell and Sejnowski =-=[2]-=- van Hateren and Ruderman [25], and Donato et al. [17]. Recall that principal components of image data typically produce sinusoidal patterns. ICA in contrast, typically produces basis functions unlike... |

1119 | Exploratory data analysis - Tukey - 1977 |

1038 |
Multivariate Analysis
- Mardia, Kent, et al.
- 1979
(Show Context)
Citation Context ...imensional – because we take the view of N points in a D-dimensional space. In this section we describe a number of fundamental tasks of data analysis. Good references on some of these issues include =-=[41, 51, 66]-=-; I use these often in teaching. 4.1 Classification In classification, one of the D variables is an indicator of class membership. Examples include: in a consumer financial data base, most of the vari... |

997 |
Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature
- Olshausen, Field
- 1996
(Show Context)
Citation Context ...ents analysis. The result will be a basis of imagelets (2-d image patches) or movielets (if we were analyzing 3-d image patches). Interesting examples of this work include work by Olshausen and Field =-=[47]-=-, Bell and Sejnowski [2] van Hateren and Ruderman [25], and Donato et al. [17]. Recall that principal components of image data typically produce sinusoidal patterns. ICA in contrast, typically produce... |

604 |
Analysis of a complex statistical variable into principal components
- Hotelling
(Show Context)
Citation Context ...nderlying latent variables are responsible for essentially the structure we see in the array X, and by uncovering those variables, we have achieved important insights. 13Principal Component Analysis =-=[28, 35, 38]-=- is an early example of this. One takes the covariance matrix C of the observable X, obtains the eigenvectors, which will be orthogonal, bundles them as columns in an orthogonal matrix U and defines S... |

578 | Functional Data Analysis
- Ramsay, Silverman
- 1997
(Show Context)
Citation Context ...he right variables in advance. But we shouldn’t read anything pejorative into the fact that we don’t know which variables to measure in advance. For example consider the functional data analysis case =-=[49]-=-, where the data are curves (for example, in the hyperspectral imaging case where the data are spectra). The ideal variables might then turn out to be the position of certain peaks in those curves. Wh... |

574 | Using linear algebra for intelligent information retrieval
- Berry, Dumais, et al.
- 1996
(Show Context)
Citation Context ...given document, in a suitable normalization. Each search request may be viewed as a vector of term frequencies, and the MatrixVector product of the Term-Document matrix by the search vector measures. =-=[6, 45]-=- 3.2 Sensor Array Data In many fields we are seeing the use of sensor arrays generating vector-valued observations as a functions of time. For example, consider a problem in study of evoked potential ... |

539 | Blind beamforming for non Gaussian signals
- Cardoso, Souloumiac
- 1993
(Show Context)
Citation Context ...r large entries in the output of the rank-k approximation for appropriate k, in UDkV ′ α. In the last decade, an important alternative to PCA has been developed: ICA – independent components analysis =-=[13, 2, 11]-=-. It is valuable when, for physical reasons, we really expect the model X = AS to hold for an unknown A and a sparse or nonGaussian S. The matrix A need not be orthogonal. An example where this occurs... |

505 |
Adaptive control processes: a guided tour
- Bellman
- 1961
(Show Context)
Citation Context ...skills and will attract different kinds of scientists. 6 Curse of Dimensionality 6.1 Origins of the Phrase The colorful phrase the ‘curse of dimensionality’ was apparently coined by Richard Belman in =-=[3]-=-, in connection with the difficulty of optimization by exhaustive enumeration on product spaces. Bellman reminded us that, if we consider a cartesian grid of spacing 1/10 on the unit cube in 10 dimens... |

444 | Projection pursuit regression - FRIEDMAN, STUETZLE - 1981 |

406 |
Universal Approximation Bounds for Superpositions of a Sigmoid Function
- Barron
- 1993
(Show Context)
Citation Context ...ussian with mean 0 and variance 1 . How does the accuracy of estimation depend on N, the number of observations in our dataset? Let F be the functional class of all functions f which are Lipschitz on =-=[0, 1]-=- d . A now-standard calculation in minimax decision theory [30] shows that for any estimator ˆ f of any kind, we have sup E( f∈F ˆ f − f(x)) 2 ≥ Const · N −2/(2+D) , n →∞. This lower bound is nonasymp... |

401 | Multivariate Density Estimation - Scott - 1992 |

360 |
Wavelets and Operators
- Meyer
- 1992
(Show Context)
Citation Context ...ne for a different orthant. We can easily see derive equivalent conditions. For example, the class of superpositions of Gaussian bumps studied by Niyogi and Girosi is simple Yves Meyer’s bump algebra =-=[43]-=-; this corresponds to a ball in a Besov space BD 1,1 (RD), so that the functions in F(M,Gaussians) are getting increasingly smooth in high dimensions [23, 43]. In short, one does not really crack the ... |

279 | Classifying facial actions
- Donato, Bartlett, et al.
- 1999
(Show Context)
Citation Context ...ess with discontinuities is almost diagonalized by the Wavelet basis. There are numerous other examples where a fixed basis known in harmonic analysis does as well as a general procedure. The article =-=[17]-=- shows that in recognizing facial gestures, the Gabor basis gives better classification than techniques based on general multivariate analysis – such as Fisher scores and Principal component scores. T... |

268 |
Statistical Estimation: Asymptotic Theory
- Ibragimov, Hasminskii
- 1981
(Show Context)
Citation Context ...mation depend on N, the number of observations in our dataset? Let F be the functional class of all functions f which are Lipschitz on [0, 1] d . A now-standard calculation in minimax decision theory =-=[30]-=- shows that for any estimator ˆ f of any kind, we have sup E( f∈F ˆ f − f(x)) 2 ≥ Const · N −2/(2+D) , n →∞. This lower bound is nonasymptotic. How much data do we need in order to obtain an estimate ... |

165 |
Littlewood-Paley Theory and the Study of Function Spaces
- Frazier, Jawerth, et al.
- 1991
(Show Context)
Citation Context ...nd Girosi is simple Yves Meyer’s bump algebra [43]; this corresponds to a ball in a Besov space BD 1,1 (RD), so that the functions in F(M,Gaussians) are getting increasingly smooth in high dimensions =-=[23, 43]-=-. In short, one does not really crack the curse of dimensionality in Barron’s sense; one simply works with changing amounts of smoothness in different dimensions. The ratio S/D between the smoothness ... |

156 | On orthogonal and symplectic matrix ensembles - Tracy, Widom - 1996 |

149 | Data compression and harmonic analysis
- Donoho, Vetterli, et al.
- 1998
(Show Context)
Citation Context ...t years, the discipline of computational harmonic analysis has developed large 22collections of bases and frames such as wavelets, wavelet packets, cosine packets, Wilson bases, brushlets, and so on =-=[16, 39]-=-. As we have seen, in two cases, that a basis being sought numerically by a procedure like principal components analysis or independent components analysis, will resemble some previously-known basis d... |

137 | Plaid models for gene expression data
- Lazzeroni, Owen
- 2002
(Show Context)
Citation Context ...aking values 0 and 1, and in addition is sparse (relatively few 1’s). An iterative, heuristic algorithm is used to fit layers k = 1, 2,...,K of the gene expression array. The second, Plaid Modelling, =-=[36]-=- has been developed by my Stanford colleagues Lazzeroni and Owen. It seeks in addition to constrain each vector βk to have entries either 0 and 1. K∑ Xi,j = µ0 + k=1 µkαkβ T k Again an iterative, heur... |

116 | Independent component analysis of natural image sequences yields spatio-temporal lters similar to simple cells in primary visual cortex
- Hateren, Ruderman
- 1998
(Show Context)
Citation Context ... (2-d image patches) or movielets (if we were analyzing 3-d image patches). Interesting examples of this work include work by Olshausen and Field [47], Bell and Sejnowski [2] van Hateren and Ruderman =-=[25]-=-, and Donato et al. [17]. Recall that principal components of image data typically produce sinusoidal patterns. ICA in contrast, typically produces basis functions unlike classical bases. In the talk ... |

95 |
On the volume of tubes
- Weyl
- 1939
(Show Context)
Citation Context ...ceton which attracted Weyl’s interest; out of this grew simultaneous publication in a single issue of Amer. J. Math. of a statistical paper by Hotelling [29] and a differentialgeometric paper by Weyl =-=[68]-=-. Out of this grew a substantial amount of modern differential geometry. Even if we focus attention on the basic tools of modern data analysis, from regression to principal components, we find they we... |

75 | Adaptive covariance estimation of locally stationary processes
- Mallat, Papanicolaou, et al.
- 1998
(Show Context)
Citation Context ...pectrum, and learn the full probability distribution. In effect, spectrum estimation is relying heavily on the fact that we know the covariance to be almost diagonal in the Fourier basis. The article =-=[40]-=- develops a full machinery based on this insight. A dataset is analyzed to see which out of a massive library of bases comes closest to diagonalizing its empirical covariance, the best basis in the li... |

73 |
Rectifiable sets and the traveling salesman problem
- Jones
- 1990
(Show Context)
Citation Context ...-Dbiological data. A third set of figures may suggest the problem of finding a curve embedded in extremely noisy data. These problems suggest the relevance of work in harmonic analysis by Peter Jones =-=[33]-=- and later by Guy David and Stephen Semmes [14]. 26Jones’ traveling salesman theorem considers the problem: given a discrete set S of points in [0, 1] 2 , when is there a rectifiable curve passing th... |

67 |
Analysis of and on Uniformly Rectifiable Sets
- David, Semmes
- 1993
(Show Context)
Citation Context ...uggest the problem of finding a curve embedded in extremely noisy data. These problems suggest the relevance of work in harmonic analysis by Peter Jones [33] and later by Guy David and Stephen Semmes =-=[14]-=-. 26Jones’ traveling salesman theorem considers the problem: given a discrete set S of points in [0, 1] 2 , when is there a rectifiable curve passing through the points? His theorem says that this ca... |

64 |
In data smog: Surviving the information glut
- Shenk
- 1997
(Show Context)
Citation Context ...ate an amazing demand for data. 2.6 How Usefulis allThis? One can easily make the case that we are gathering too much data already, and that fewer data would lead to better decisions and better lives =-=[57]-=-. But one also has to be very naive to imagine that such wistful concerns amount to much against the onslaught of the forces I have mentioned. Reiterating: throughout science, engineering, government ... |

52 |
Ridgelets: the key to high dimensional intermittency
- Candès, Donoho
- 1999
(Show Context)
Citation Context ...lysis which ought to have been built by harmonic analysts, but which were not. At the moment, it appears that these empirical results can best be explained in terms of two recent systems. • Ridgelets =-=[8]-=-. One can build a frame or a basis consisting of highly directional elements, roughly speaking ridge functions with a wavelet profile. With a and b scalars and u a unit vector (direction) the simplest... |

49 | On the distribution of the largest principal component
- Johnstone
- 2000
(Show Context)
Citation Context ...bles. What is the behavior of the top eigenvalue of CD,N? Consider a sequence of problems where D/N → β – large dimension, large sample size. This is a problem which has been studied for decades; see =-=[32]-=- for references. Classical results in random matrix theory – in the spirit of the Wigner semicircle law – study infinite matrices, and in accord with our “second blessing of dimensionality”, give info... |

46 |
Gene shaving: a new class of clustering methods for expression arrays
- Hastie, Tibshirani
- 2000
(Show Context)
Citation Context ...re similar. See for example [45]. Because we have mentioned gene expression data earlier, we briefly mention a figure presented earlier in the talk showing a gene expression array, (figure taken from =-=[26]-=-) while a figure based on a modern clustering method shows the array after suitable permutation of entries according to cluster analysis. Recently, more quantitative approaches have been developed, of... |

40 |
Tubes and spheres in n-space and a class of statistical problems
- Hotelling
- 1939
(Show Context)
Citation Context ...ng gave a talk on a statistical problem at Princeton which attracted Weyl’s interest; out of this grew simultaneous publication in a single issue of Amer. J. Math. of a statistical paper by Hotelling =-=[29]-=- and a differentialgeometric paper by Weyl [68]. Out of this grew a substantial amount of modern differential geometry. Even if we focus attention on the basic tools of modern data analysis, from regr... |

37 |
Fonctions aleatoire de second ordre
- LOEVE
- 1946
(Show Context)
Citation Context ...nderlying latent variables are responsible for essentially the structure we see in the array X, and by uncovering those variables, we have achieved important insights. 13Principal Component Analysis =-=[28, 35, 38]-=- is an early example of this. One takes the covariance matrix C of the observable X, obtains the eigenvectors, which will be orthogonal, bundles them as columns in an orthogonal matrix U and defines S... |

32 | On the geometry of similarity search: dimensionality curse and concentration of measure - Pestov |

32 | Visual Revelations: Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot - Wainer - 1997 |

11 |
Oracle inequalities an nonparametric function estimation
- Johnstone
- 1998
(Show Context)
Citation Context ...he true effects of searching for variables to be included among many variables. A variety of results indicated that this form of logarithmic penalty is both necessary and sufficient, for a survey see =-=[31]-=-. That is, with this logarithmic penalty, one can mine one’s data to one’s taste, while controlling the risk of finding spurious structure. The form of the logarithmic penalty is quite fortunate. The ... |

11 | Overcoming the curse of dimensionality in clustering by means of the wavelet transform
- Murtagh, Starck, et al.
(Show Context)
Citation Context ...given document, in a suitable normalization. Each search request may be viewed as a vector of term frequencies, and the MatrixVector product of the Term-Document matrix by the search vector measures. =-=[6, 45]-=- 3.2 Sensor Array Data In many fields we are seeing the use of sensor arrays generating vector-valued observations as a functions of time. For example, consider a problem in study of evoked potential ... |

9 |
Gene expression informatics - it’s all in your mine. Nature Genetics
- Bassett, Eisen, et al.
- 1999
(Show Context)
Citation Context ...at the moment involves gene expression data. Here we obtain data on the relative abundance of D genes in each of N different cell lines. The 11details of how the experiment works are pointed to from =-=[5]-=-. The goal is to learn which genes are associated with the various diseases or other states associated with the cell lines. 3.4 Consumer Preferences Data Recently on the world-wide-web we see the rise... |

7 | Assessing Linearity in High Dimensions
- Owen
- 1999
(Show Context)
Citation Context ...ut 2.7, but that is still heavy. Hence, we are entitled to dream that these fundamental components can be sped-up. I will mention two recent articles that give an idea that something can be done Owen =-=[48]-=- considered the problem of determining if a dataset satisfied an approximate linear relation of the form (1). With N>Dobservations, the usual method of assessing linearity would require order ND3 oper... |

6 |
Computing 2010: from black holes to biology’, Nature C67–C70
- Butler
- 1999
(Show Context)
Citation Context ...ast numbers of others will be deeply moved by this vision. Recent initiatives to build an e-cell [18] has attracted substantial amounts of attention from science journalists and science policy makers =-=[7]-=-. Can one build computer models which simulate the basic processes of life by building a mathematical model of the basic cycles in the cell and watching the evolution of what amounts to a system of co... |

5 | Eye of the Hurricane, an Autobiography - Bellman - 1984 |

5 |
Curvelets: a surpisingly effective nonadaptive representation of objects with edges
- Candès, Donoho
- 2000
(Show Context)
Citation Context ... roughly speaking ridge functions with a wavelet profile. With a and b scalars and u a unit vector (direction) the simplest ridgelets take the form ρ(x; a, b, u) =ψ((u ′ x − b)/a)/a 1/2 . • Curvelets =-=[9]-=-. One can build a frame consisting of highly directional elements, where the width and length are in relation width = length 2 . The construction is a kind of elaborate multiresolution deployment of r... |

5 | The Visual Display of Quantitative Information - Edward - 641 |

4 | The International Congress of Mathematicians in - Scott - 1900 |

3 | The Heritage of P. Levy in Geometrical Functional-Analysis Asterisque - Milman |

3 | Tukey (1977) Exploratory Data Analysis - John |

2 |
Challenges in Analysis I”, address at Mathematical Visions towards the year 2000, Tel Aviv
- Coifman
- 1999
(Show Context)
Citation Context ...s. Perhaps we need a different 24way to measure degree of differentiability, in which discontinuous functions are not claimed to possess large numbers of derivatives. R.R. Coifman and J.O. Stromberg =-=[12]-=- have gone in exactly this direction. Viewing the condition of bounded mixed variation as a natural low order smoothness condition, they have explored its consequences. Roughly speaking, functions obe... |

2 |
The formation of dark halos in auniverse dominated by cold dark matter
- Frenk, White, et al.
- 1988
(Show Context)
Citation Context ...ause of the imminent arrival of data from the XMM satellite observatory, which will shed light on inhomogeneities in the distribution of matter at the very earliest detectable moments in the universe =-=[19, 52]-=-. I hope also to present figures from work by Arne Stoschek, illustrating the problem of recovering filamentary structure in 3-Dbiological data. A third set of figures may suggest the problem of findi... |

2 |
Trevor Hastie and Robert Tibshirani (2001
- Friedman
(Show Context)
Citation Context ...ange rates today, given recent exchange rates; in a hyperspectral database an indicator of chemical composition. There is a well-known and widely used collection of tools for regression modeling; see =-=[66, 22]-=-. In linear regression modeling, we assume that the response depends on the predictors linearly, Xi,1 = a0 + a2Xi,2 + ...+ aDXi,D + Zi; (1) the idea goes back to Gauss, if not earlier. In nonlinear re... |

2 |
Grattan-Guiness A sideways look at Hilbert’s Twenty-Three Problems of
- Ivor
- 1900
(Show Context)
Citation Context ... this talk. 21 Introduction 1.1 August 8, 2000 The morning of August 8, 1900, David Hilbert gave an address at the International Congress of mathematicians in Paris, entitled ‘Mathematical Problems’ =-=[50, 24]-=-. Despite the fact that this was not a plenary address and that it was delivered as part of the relatively nonprestigious section of History and Philosophy of Mathematics, Hilbert’s lecture eventually... |