## Approximating Matrix Multiplication for Pattern Recognition Tasks (1997)

Venue: | In Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms |

Citations: | 33 - 0 self |

### BibTeX

@INPROCEEDINGS{Cohen97approximatingmatrix,

author = {Edith Cohen and David D. Lewis},

title = {Approximating Matrix Multiplication for Pattern Recognition Tasks},

booktitle = {In Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms},

year = {1997},

pages = {682--691}

}

### Years of Citing Articles

### OpenURL

### Abstract

Many pattern recognition tasks, including estimation, classification, and the finding of similar objects, make use of linear models. The fundamental operation in such tasks is the computation of the dot product between a query vector and a large database of instance vectors. Often we are interested primarily in those instance vectors which have high dot products with the query. We present a random sampling based algorithm that enables us to identify, for any given query vector, those instance vectors which have large dot products, while avoiding explicit computation of all dot products. We provide experimental results that demonstrate considerable speedups for text retrieval tasks. 1 Introduction In pattern recognition tasks, a database of instances to be processed (images, signals, documents,...) is commonly represented as a set of a vectors x 1 ; : : : ; xn of numeric feature values. Examples of feature values include the number of times a word occurs in a document, the coordinates...

### Citations

3124 | Introduction to Modern Information Retrieval - Salton, McGill - 1983 |

2044 |
An introduction to probability theory and its applications, volume I
- Feller
- 1968
(Show Context)
Citation Context ...binomial distribution with parameters S and p j . The expected value of S j is Sp j . The probability that S j = t (j is sampled t times) is ` S t ' p t j (1 \Gamma p j ) S \Gammat : (see, e.g. Feller=-=[9]-=- for background.) Using the normal approximation, ProbfS jsTg = 1 \Gamma \Phi ` T \Gamma0:5\GammaSp j p Sp j (1\Gammap j ) ' ProbfS j ! Tg = \Phi ` T+0:5\GammaSp j p Sp j (1\Gammap j ) ' (2.1) 3 Imple... |

796 | Managing Gigabytes: Compressing and Indexing Documents and Images
- Wittenm, Moffat, et al.
- 1999
(Show Context)
Citation Context ... of instances (documents) can be 10 4 to 10 6 and the number of features (words or phrases) may be 100,000 or more. This is an expensive task even when utilizing sparse matrix and indexing techniques =-=[18]-=-. Dense instances with hundreds of features are also common, for example, with factor analytic text representations (Sec. 4.1) or in image retrieval. We propose a random-sampling based algorithm that ... |

773 | An optimal algorithm for approximate nearest neighbor searching in fixed dimensions
- Arya, Mount, et al.
- 1998
(Show Context)
Citation Context ...s very small. Bentley et al.[3] showed better expected asymptotic performance for instances where the dimension is constant and the vectors are randomly sampled from certain distributions. Arya et al.=-=[1]-=- established better worst-case performance for constant dimension, by allowing approximation. They presented an O(n log n) time algorithm that computes (1 + ffl)-nearest neighbors. (for any fixed cons... |

597 |
K.S.: Relevance weighting of search terms
- Robertson, Jones
- 1976
(Show Context)
Citation Context ...ty to database documents must be found [14, ch. 4], or it can be used as evidence toward setting the parameters of a probabilistic classification function, which is then applied to database documents =-=[13, 5]-=-. In both cases, linear models are widely used and continue to be developed [12]. Applying a weight vector q to a database of instance vectors means computing the vector by matrix product q T A T , wh... |

309 |
An introduction to Numerical Analysis
- Atkinson
- 1989
(Show Context)
Citation Context ...ying structure in terms of document clusters, word clusters, or latent factors. The most popular such approach is Latent Semantic Indexing (LSI) [7]. LSI is based on Singular Value Decomposition (see =-=[2] for-=- background). The sparse document vectors x i 2 R t + are replaced by dense d-dimensional vectorssx i 2 R d where d �� t (typically, d is of order 10 2 ). The coordinates ofsx i are linear combina... |

269 |
Nearest neighbor (NN) norms: NN pattern classification techniques. Los Alamitos, CA
- Dasarathy
- 1991
(Show Context)
Citation Context ...Relation to Euclidean proximity problems A variety of tasks involve finding examples in close proximity to other examples. This may be a goal in itself, or may be a means of performing classification =-=[6]-=- or regression [11]. Frequently seen proximity problems are closest pair, nearest neighbors, and bichromatic nearest neighbors. A variety of proximity measures can be used in such tasks. As mentioned ... |

244 | Training algorithms for linear text classifiers
- Lewis, Schapire, et al.
- 1996
(Show Context)
Citation Context ...oward setting the parameters of a probabilistic classification function, which is then applied to database documents [13, 5]. In both cases, linear models are widely used and continue to be developed =-=[12]-=-. Applying a weight vector q to a database of instance vectors means computing the vector by matrix product q T A T , where A is a matrix whose rows fx 1 ; : : : ; xng are the instance vectors. The va... |

203 |
Recent Trends in Hierarchic Document Clustering: A
- Willett
(Show Context)
Citation Context ...hbors, under a dot product based proximity function. In information retrieval, such sets of nearest neighbors can be used to support hypertext browsing [15] or in doing a cluster analysis of the data =-=[17]-=-. Finding nearest neighbors has similar computational characteristics to running large numbers of queries or other linear classifiers against the document database. Different methods of representing t... |

166 |
Using Probabilistic Models of Document Retrieval without Relevance Information
- Croft, Harper
- 1997
(Show Context)
Citation Context ...ty to database documents must be found [14, ch. 4], or it can be used as evidence toward setting the parameters of a probabilistic classification function, which is then applied to database documents =-=[13, 5]-=-. In both cases, linear models are widely used and continue to be developed [12]. Applying a weight vector q to a database of instance vectors means computing the vector by matrix product q T A T , wh... |

160 | The effect of adding relevance information in a relevance feedback environment
- Buckley, Salton, et al.
- 1994
(Show Context)
Citation Context ...he document, computed using the Cornell "ltc" weighting. This weighting formula has been shown to be effective in making proximity of document vectors correspond more closely to similarity o=-=f meaning [4]-=-. The resulting matrix is sparse, with about 2 \Theta 10 6 nonzeros. The vector for each document is normalized to have Euclidean norm of 1.0, so that dot product between two document vectors is their... |

142 |
Matrix Computations," The Johns Hopkins
- Golub, Loan
- 1983
(Show Context)
Citation Context ...r with a collection of sparse document vectors by making use of an inverted file [18] of document vectors. The inverted file is the appropriate storage for efficient sparse matrix multiplication (see =-=[10]-=-). For each indexing term, a list of the id's of all document with a nonzero weight for that term, plus the weights themselves, are stored. The dot product of a query vector with every document vector... |

88 | A.C.: Optimal expected-time algorithms for closest point problems
- Bentley, Weide, et al.
- 1980
(Show Context)
Citation Context ...trivially solvable in O(dn 2 ) time (for dense data, for sparse data the time depends on the nonzero structure). Some known algorithms perform better when the dimension d is very small. Bentley et al.=-=[3]-=- showed better expected asymptotic performance for instances where the dimension is constant and the vectors are randomly sampled from certain distributions. Arya et al.[1] established better worst-ca... |

85 |
Query evaluation: Strategies and optimizations
- Turtle, Flood
- 1995
(Show Context)
Citation Context ...ed entries of nonnegative matrix products, without full computation of the product. Our algorithm assigns scores to each entry of q T A T . In contrast to existing approximate scoring techniques (see =-=[16]-=-), the expected value of the scores is equal to the true value that would be obtained with the full computation. Furthermore, the variance of the scores is independent of the weight distribution of th... |

58 |
Indexing by latent semantic indexing
- Deerwester
(Show Context)
Citation Context ...rix of document /word values as having a simpler underlying structure in terms of document clusters, word clusters, or latent factors. The most popular such approach is Latent Semantic Indexing (LSI) =-=[7]. LS-=-I is based on Singular Value Decomposition (see [2] for background). The sparse document vectors x i 2 R t + are replaced by dense d-dimensional vectorssx i 2 R d where d �� t (typically, d is of ... |

54 |
Support for browsing in an intelligent text retrieval system
- Thompson, Croft
- 1989
(Show Context)
Citation Context ...inds, for each document, its set of nearest neighbors, under a dot product based proximity function. In information retrieval, such sets of nearest neighbors can be used to support hypertext browsing =-=[15]-=- or in doing a cluster analysis of the data [17]. Finding nearest neighbors has similar computational characteristics to running large numbers of queries or other linear classifiers against the docume... |

52 | Latent Semantic Indexing (LSI): TREC-3 report
- Dumais
- 1994
(Show Context)
Citation Context ...sampling approach. There has been little progress in speeding up retrieval with LSI document representations, beyond the obvious expedients of using parallel hardware or using fewer latent dimensions =-=[8]-=-. To achieve our speedup of 5 to 10-fold by dropping dimensions would require using only 32--64 dimensions, which would substantially impact effectiveness. We deemphasize our sparse matrix results, si... |

31 |
Generalized Additive Models, volume 43
- Hastie, Tibshirani
- 1990
(Show Context)
Citation Context ...ean proximity problems A variety of tasks involve finding examples in close proximity to other examples. This may be a goal in itself, or may be a means of performing classification [6] or regression =-=[11]-=-. Frequently seen proximity problems are closest pair, nearest neighbors, and bichromatic nearest neighbors. A variety of proximity measures can be used in such tasks. As mentioned earlier, the cosine... |