## Random projection in dimensionality reduction: Applications to image and text data (2001)

Venue: | in Knowledge Discovery and Data Mining |

Citations: | 156 - 0 self |

### BibTeX

@INPROCEEDINGS{Bingham01randomprojection,

author = {Ella Bingham and Heikki Mannila},

title = {Random projection in dimensionality reduction: Applications to image and text data},

booktitle = {in Knowledge Discovery and Data Mining},

year = {2001},

pages = {245--250},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results are sparse. We present experimental results on using random projection as a dimensionality reduction tool in a number of cases, where the high dimensionality of the data would otherwise lead to burdensome computations. Our application areas are the processing of both noisy and noiseless images, and information retrieval in text documents. We show that projecting the data onto a random lower-dimensional subspace yields results comparable to conventional dimensionality reduction methods such as principal component analysis: the similarity of data vectors is preserved well under random projection. However, using random projections is computationally signicantly less expensive than using, e.g., principal component analysis. We also show experimentally that using a sparse random matrix gives additional computational savings in random projection.

### Citations

3393 |
Introduction to Modern Information Retrieval
- Salton, McGill
- 1983
(Show Context)
Citation Context ...An image is presented as a matrix of pixel brightness values, the distribution of which is generally approximately Gaussian: symmetric and bell-shaped. Text document data is presented in vector space =-=[25]-=-, in which each document forms one d-dimensional vector where d is the vocabulary size. The i-th element of the vector indicates (some function of) the frequency of the i-th vocabulary term in the doc... |

2985 | Indexing by latent semantic analysis
- Deerwester, Dumais, et al.
- 1990
(Show Context)
Citation Context ...ontext of sparse text document data. For a sparse data matrix XdN with about c nonzero entries per column, the computational complexity of SVD is of order O(dcN) [22]. Latent semantic indexing (LSI) [=-=9, 22] is a-=- dimensionality reduction method for text document data. Using LSI, the document data is presented in a lower-dimensional \topic" space: the documents are characterized by some underlying (latent... |

760 | Approximate nearest neighbors: towards removing the curse of dimensionality
- Indyk, Motwani
- 1998
(Show Context)
Citation Context ...ng the random mapping in the context of the WEBSOM 1 system. Kurimo [20] applies random projection to the indexing of audio documents, prior to using LSI and SOM. Kleinberg [19] and Indyk and Motwani =-=[14]-=- use random projections in nearest-neighbor search in a high dimensional Euclidean space, and also present theoretical insights. Dasgupta [6, 7] has used random projections in learning high-dimensiona... |

439 | Efficient similarity search in sequence databases
- Agrawal, Faloutsos, et al.
- 1993
(Show Context)
Citation Context ...n retrieval literature, and other approaches than random projection have been presented. Ostrovsky and Rabani [21] give a dimension reduction operation that is suitable for clustering. Agrawal et al. =-=[3]-=- map time series into frequency domain by the discrete Fourier transform and only retain thesrst few frequencies. Keogh and Pazzani [18] reduce the dimension of time series data by segmenting the time... |

380 |
Discrete Cosine Transforms: Algorithms, Advantages
- Rao, Yip
- 1990
(Show Context)
Citation Context ...CT is also optimal for human eye: the distortions introduced occur at the highest frequencies only, and the human eye tends to neglect these as noise. DCT can be performed by simple matrix operations =-=[23, 27]-=-: an image is transformed to the DCT space and dimensionality reduction is done in the inverse transform by discarding the transform coecients corresponding to the highest frequencies. Computing the D... |

269 | Latent semantic indexing: A proba- bilistic analysis
- Papadimitriou, Raghavan, et al.
- 1997
(Show Context)
Citation Context ...n methods. Section 3 gives the experimental results of dimensionality reduction on image data, and Section 4 on text data. Finally, Section 5 gives a conclusion. 1.1 Related work Papadimitriou et al. =-=[22-=-] use random projection in the preprocessing of textual data, prior to applying LSI. They present experimental results on an articially generated set of documents. In their approach, the columns of th... |

204 | An Introduction to Wavelets
- Graps
- 1995
(Show Context)
Citation Context ...g the time series into sections and indexing only the section means. Aggarwal et al. [2] index market basket data by a specic signature table, which easens the similarity search. Wavelet transforms ([=-=12, 27]-=- etc.) are a common method of signal compression. 2. METHODSFORDIMENSIONALITYREDUCTION 2.1 Random projection In random projection, the original d-dimensional data is projected to a k-dimensional (ksd)... |

178 | Two algorithms for nearest-neighbor search in high dimensions
- Kleinberg
- 1997
(Show Context)
Citation Context ...experimental results in using the random mapping in the context of the WEBSOM 1 system. Kurimo [20] applies random projection to the indexing of audio documents, prior to using LSI and SOM. Kleinberg =-=[19]-=- and Indyk and Motwani [14] use random projections in nearest-neighbor search in a high dimensional Euclidean space, and also present theoretical insights. Dasgupta [6, 7] has used random projections ... |

175 |
Learning mixtures of Gaussians
- Dasgupta
- 1999
(Show Context)
Citation Context ...o using LSI and SOM. Kleinberg [19] and Indyk and Motwani [14] use random projections in nearest-neighbor search in a high dimensional Euclidean space, and also present theoretical insights. Dasgupta =-=[6, 7]-=- has used random projections in learning high-dimensional Gaussian mixture models. Other applications of random projection include e.g. [4, 28]. The problems of dimensionality reduction and similarity... |

167 | Database-friendly Random Projections
- Achlioptas
- 2001
(Show Context)
Citation Context ...ough the origin is p k=d [15]. The choice of the random matrix R is one of the key points of interest. The elements r ij of R are often Gaussian distributed, but this need not be the case. Achlioptas =-=[1-=-] has recently shown that the Gaussian distribution can be replaced by a much simpler distribution such as r ij = p 3 8 > > : +1 with probability 1 6 0 with probability 2 3 1 with probability 1 6 : (... |

161 |
Image processing, analysis, and machine vision,” in Chapman and
- Sonka, Hlavac, et al.
- 1993
(Show Context)
Citation Context ...g the time series into sections and indexing only the section means. Aggarwal et al. [2] index market basket data by a specic signature table, which easens the similarity search. Wavelet transforms ([=-=12, 27]-=- etc.) are a common method of signal compression. 2. METHODSFORDIMENSIONALITYREDUCTION 2.1 Random projection In random projection, the original d-dimensional data is projected to a k-dimensional (ksd)... |

119 | An elementary proof of the johnson-lindenstrauss lemma
- Dasgupta, Gupta
- 1999
(Show Context)
Citation Context ...in a vector space are projected onto a randomly selected subspace of suitably high dimension, then the distances between the points are approximately preserved. For a simple proof of this result, see =-=[10, 8-=-]. Random projection is computationally very simple: forming the random matrix R and projecting the d N data matrix X into k dimensions is of order O(dkN ), and if the data matrix X is sparse with ab... |

110 |
Extension of lip- shitz mapping into hilbert space
- Johnson, Lindenstrauss
- 1984
(Show Context)
Citation Context ... of N d-dimensional observations, X RP kN = RkdXdN (1) is the projection of the data onto a lower k-dimensional subspace. The key idea of random mapping arises from the Johnson-Lindenstrauss lemma [15=-=]-=-: if points in a vector space are projected onto a randomly selected subspace of suitably high dimension, then the distances between the points are approximately preserved. For a simple proof of this ... |

107 | 1887), Data Exploration Using Self-Organizing Maps - Kaski - 1997 |

102 |
The Johnson- Lindenstrauss lemma and the sphericity of some graphs
- Frank1, Maehara
- 1988
(Show Context)
Citation Context ...in a vector space are projected onto a randomly selected subspace of suitably high dimension, then the distances between the points are approximately preserved. For a simple proof of this result, see =-=[10, 8-=-]. Random projection is computationally very simple: forming the random matrix R and projecting the d N data matrix X into k dimensions is of order O(dkN ), and if the data matrix X is sparse with ab... |

100 |
algorithms for PCA and SPCA
- EM
- 1997
(Show Context)
Citation Context ...se size is dd for d-dimensional data) is very expensive to compute. The computational complexity of estimating the PCA is O(d 2 N) + O(d 3 ) [11]. There exists computationally less expensive methods [=-=26, 24]-=- forsnding only a few eigenvectors and eigenvalues of a large matrix; in our experiments, we use appropriate Matlab routines to realize these. A closely related method is singular value decomposition ... |

97 | An algorithmic theory of learning: Robust concepts and random projection
- Arriaga, Vempala
- 1999
(Show Context)
Citation Context ...an space, and also present theoretical insights. Dasgupta [6, 7] has used random projections in learning high-dimensional Gaussian mixture models. Other applications of random projection include e.g. =-=[4, 28]-=-. The problems of dimensionality reduction and similarity search have often been addressed in the information retrieval literature, and other approaches than random projection have been presented. Ost... |

96 |
Experiments with random projection
- Dasgupta
- 2000
(Show Context)
Citation Context ...o using LSI and SOM. Kleinberg [19] and Indyk and Motwani [14] use random projections in nearest-neighbor search in a high dimensional Euclidean space, and also present theoretical insights. Dasgupta =-=[6, 7]-=- has used random projections in learning high-dimensional Gaussian mixture models. Other applications of random projection include e.g. [4, 28]. The problems of dimensionality reduction and similarity... |

82 |
Large-scale sparse singular value computations
- Berry
- 1992
(Show Context)
Citation Context ...X SV D = U T k X (5) where Uk is of size dk and contains these k singular vectors. Like PCA, SVD is also expensive to compute. There exists numerical routines such as the power or the Lanczos method [=-=5-=-] that are more ecient than PCA for sparse data matrices X, and that is why we shall use SVD instead of PCA in the context of sparse text document data. For a sparse data matrix XdN with about c nonze... |

49 | A simple dimensionality reduction technique for fast similarity search in large time series databases
- Keogh, Pazzini
- 2000
(Show Context)
Citation Context ...uction operation that is suitable for clustering. Agrawal et al. [3] map time series into frequency domain by the discrete Fourier transform and only retain thesrst few frequencies. Keogh and Pazzani =-=[18-=-] reduce the dimension of time series data by segmenting the time series into sections and indexing only the section means. Aggarwal et al. [2] index market basket data by a specic signature table, wh... |

32 |
Context vectors: general purpose approximate meaning representations selforganized from raw data
- Hecht-Nielsen
- 1994
(Show Context)
Citation Context .../websom/ cause signicant distortions in the data set if R is not orthogonal. Orthogonalizing R is unfortunately computationally expensive. Instead, we can rely on a result presented by Hecht-Nielsen [=-=13]-=-: in a high-dimensional space, there exists a much larger number of almost orthogonal than orthogonal directions. Thus vectors having random directions might be suciently close to orthogonal, and equi... |

32 |
Random projection: A new approach to VLSI layout
- Vempala
- 1998
(Show Context)
Citation Context ...an space, and also present theoretical insights. Dasgupta [6, 7] has used random projections in learning high-dimensional Gaussian mixture models. Other applications of random projection include e.g. =-=[4, 28]-=-. The problems of dimensionality reduction and similarity search have often been addressed in the information retrieval literature, and other approaches than random projection have been presented. Ost... |

23 | A new method for similarity indexing of market basket data
- Aggarwal, Wolf, et al.
- 1999
(Show Context)
Citation Context ...d only retain thesrst few frequencies. Keogh and Pazzani [18] reduce the dimension of time series data by segmenting the time series into sections and indexing only the section means. Aggarwal et al. =-=[2-=-] index market basket data by a specic signature table, which easens the similarity search. Wavelet transforms ([12, 27] etc.) are a common method of signal compression. 2. METHODSFORDIMENSIONALITYRED... |

21 |
Dimensionality reduction by random mapping
- Kaski
- 1998
(Show Context)
Citation Context ...ted set of documents. In their approach, the columns of the random projection matrix are assumed strictly orthogonal, but actually this need not be the case, as we shall see in our experiments. Kaski =-=[17, 16]-=- has presented experimental results in using the random mapping in the context of the WEBSOM 1 system. Kurimo [20] applies random projection to the indexing of audio documents, prior to using LSI and ... |

19 | Indexing audio documents by using latent semantic analysis and
- Kurimo
- 1999
(Show Context)
Citation Context ...but actually this need not be the case, as we shall see in our experiments. Kaski [17, 16] has presented experimental results in using the random mapping in the context of the WEBSOM 1 system. Kurimo =-=[20]-=- applies random projection to the indexing of audio documents, prior to using LSI and SOM. Kleinberg [19] and Indyk and Motwani [14] use random projections in nearest-neighbor search in a high dimensi... |

3 |
Management and analysis of large scienti datasets
- Sirovich, Everson
- 1992
(Show Context)
Citation Context ...se size is dd for d-dimensional data) is very expensive to compute. The computational complexity of estimating the PCA is O(d 2 N) + O(d 3 ) [11]. There exists computationally less expensive methods [=-=26, 24]-=- forsnding only a few eigenvectors and eigenvalues of a large matrix; in our experiments, we use appropriate Matlab routines to realize these. A closely related method is singular value decomposition ... |

2 |
Polynomial time approximation schemens for geometric k-clustering
- Ostrovsky, Rabani
- 2000
(Show Context)
Citation Context ...dimensionality reduction and similarity search have often been addressed in the information retrieval literature, and other approaches than random projection have been presented. Ostrovsky and Rabani =-=[21]-=- give a dimension reduction operation that is suitable for clustering. Agrawal et al. [3] map time series into frequency domain by the discrete Fourier transform and only retain thesrst few frequencie... |