Results 1 - 10
of
64
Fast Computation of Low Rank Matrix Approximations
, 2001
"... In many practical applications, given an m n matrix A it is of interest to nd an approximation to A that has low rank. We introduce a technique that exploits spectral structure in A to accelerate Orthogonal Iteration and Lanczos Iteration, the two most common methods for computing such approximat ..."
Abstract
-
Cited by 96 (2 self)
- Add to MetaCart
In many practical applications, given an m n matrix A it is of interest to nd an approximation to A that has low rank. We introduce a technique that exploits spectral structure in A to accelerate Orthogonal Iteration and Lanczos Iteration, the two most common methods for computing such approximations. Our technique amounts to independently sampling and/or quantizing the entries of the input matrix A, thus speeding up computation by reducing the number of non-zero entries and/or the length of their representation. Our analysis s based on observing that both sampling and quantization can be viewed as adding a random matrix E to A, where the entries of E are independent, zero-mean random variables of bounded variance. Such random matrices posses no significant linear structure, and we can thus prove that the effect of sampling and quantization nearly vanishes when a low rank approximation to A is computed. In fact, the more prominent the linear structure in A is, the more data we can afford to discard and, ultimately, the faster we can discover it. We give bounds on the quality of our approximation both in the L2 and in the Frobenius norm.
pSearch: Information Retrieval in Structured Overlays
, 2002
"... We describe an efficient peer-to-peer information retrieval system, pSearch, that supports state-of-the-art content- and semantic-based full-text searches. pSearch avoids the scalability problem of existing systems that employ centralized indexing, or index/query flooding. It also avoids the nondete ..."
Abstract
-
Cited by 68 (6 self)
- Add to MetaCart
We describe an efficient peer-to-peer information retrieval system, pSearch, that supports state-of-the-art content- and semantic-based full-text searches. pSearch avoids the scalability problem of existing systems that employ centralized indexing, or index/query flooding. It also avoids the nondeterminism that is exhibited by heuristic-based approaches. In pSearch, documents in the network are organized around their vector representations (based on modern document ranking algorithms) such that the search space for a given query is organized around related documents, achieving both eciency and accuracy.
A survey of eigenvector methods of web information retrieval
- SIAM Rev
"... Abstract. Web information retrieval is significantly more challenging than traditional wellcontrolled, small document collection information retrieval. One main difference between traditional information retrieval and Web information retrieval is the Web’s hyperlink structure. This structure has bee ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
Abstract. Web information retrieval is significantly more challenging than traditional wellcontrolled, small document collection information retrieval. One main difference between traditional information retrieval and Web information retrieval is the Web’s hyperlink structure. This structure has been exploited by several of today’s leading Web search engines, particularly Google and Teoma. In this survey paper, we focus on Web information retrieval methods that use eigenvector computations, presenting the three popular methods of HITS, PageRank, and SALSA.
Sampling from large matrices: an approach through geometric functional analysis
- Journal of the ACM
, 2006
"... Abstract. We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(r log r) with a small error in the spectral norm, where r = �A�2 F /�A�22 is the numerical rank of A. The numerical rank is always bounded by, a ..."
Abstract
-
Cited by 43 (4 self)
- Add to MetaCart
Abstract. We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(r log r) with a small error in the spectral norm, where r = �A�2 F /�A�22 is the numerical rank of A. The numerical rank is always bounded by, and is a stable relaxation of, the rank of A. This yields an asymptotically optimal guarantee in an algorithm for computing low-rank approximations of A. We also prove asymptotically optimal estimates on the spectral norm and the cut-norm of random submatrices of A. The result for the cut-norm yields a slight improvement on the best known sample complexity for an approximation algorithm for MAX-2CSP problems. We use methods of Probability in Banach spaces, in particular the law of large numbers for operator-valued random variables. 1.
A Two-Dimensional Data Distribution Method For Parallel Sparse Matrix-Vector Multiplication
- SIAM REVIEW
"... A new method is presented for distributing data in sparse matrix-vector multiplication. The method is two-dimensional, tries to minimise the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipar ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
A new method is presented for distributing data in sparse matrix-vector multiplication. The method is two-dimensional, tries to minimise the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipartitioning of the sparse matrix, each time splitting a rectangular matrix into two parts with a nearly equal number of nonzeros. The communication volume caused by the split is minimised. After the matrix partitioning, the input and output vectors are partitioned with the objective of minimising the maximum communication volume per processor. Experimental results of our implementation, Mondriaan, for a set of sparse test matrices show a reduction in communication compared to one-dimensional methods, and in general a good balance in the communication work.
PeerSearch: Efficient Information Retrieval in Peer-to-Peer Networks
- IN PROCEEDINGS OF HOTNETS-I, ACM SIGCOMM
, 2002
"... In this paper, we propose an efficient peer-to-peer information retrieval system PeerSearch that supports state-of-the-art content and semantic searches. PeerSearch avoids the scalability problem of existing systems that employ centralized indexing, index flooding, or query flooding. It also avoids ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
In this paper, we propose an efficient peer-to-peer information retrieval system PeerSearch that supports state-of-the-art content and semantic searches. PeerSearch avoids the scalability problem of existing systems that employ centralized indexing, index flooding, or query flooding. It also avoids the non-determinism that exhibited by heuristic-based approaches. PeerSearch
Matrix completion from a few entries
"... Let M be a random nα × n matrix of rank r ≪ n, and assume that a uniformly random subset E of its entries is observed. We describe an efficient algorithm that reconstructs M from |E | = O(r n) observed entries with relative root mean square error RMSE ≤ C(α) ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
Let M be a random nα × n matrix of rank r ≪ n, and assume that a uniformly random subset E of its entries is observed. We describe an efficient algorithm that reconstructs M from |E | = O(r n) observed entries with relative root mean square error RMSE ≤ C(α)
On Scaling Latent Semantic Indexing for Large Peer-To-Peer Systems
- Proc. 27th Annual International ACM SIGIR Conference
, 2004
"... The exponential growth of data demands scalable infrastructures capable of indexing and searching rich content such as text, music, and images. A promising direction is to combine information retrieval with peer-to-peer technology for scalability, fault-tolerance, and low administration cost. One pi ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
The exponential growth of data demands scalable infrastructures capable of indexing and searching rich content such as text, music, and images. A promising direction is to combine information retrieval with peer-to-peer technology for scalability, fault-tolerance, and low administration cost. One pioneering work along this direction is pSearch [32, 33]. pSearch places documents onto a peerto -peer overlay network according to semantic vectors produced using Latent Semantic Indexing (LSI). The search cost for a query is reduced since documents related to the query are likely to be co-located on a small number of nodes. Unfortunately, because of its reliance on LSI, pSearch also inherits the limitations of LSI. (1) When the corpus is large and heterogeneous, LSI's retrieval quality is inferior to methods such as Okapi. (2) The Singular Value Decomposition (SVD) used in LSI is unscalable in terms of both memory consumption and computation time.
Seurat: A Pointillist Approach to Anomaly Detection
- in Proc. 7th International Symposium on Recent Advances in Intrusion Detection (RAID’07), Sophia Antipolis, French
"... Abstract. This paper proposes a new approach to detecting aggregated anomalous events by correlating host file system changes across space and time. Our approach is based on a key observation that many host state transitions of interest have both temporal and spatial locality. Abnormal state changes ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Abstract. This paper proposes a new approach to detecting aggregated anomalous events by correlating host file system changes across space and time. Our approach is based on a key observation that many host state transitions of interest have both temporal and spatial locality. Abnormal state changes, which may be hard to detect in isolation, become apparent when they are correlated with similar changes on other hosts. Based on this intuition, we have developed a method to detect similar, coincident changes to the patterns of file updates that are shared across multiple hosts. We have implemented this approach in a prototype system called Seurat and demonstrated its effectiveness using a combination of real workstation cluster traces, simulated attacks, and a manually launched Linux worm.
Towards a Semantic-Aware File Store
- In the Proceedings of HotOS-IX, May 2003
, 2003
"... Permission is granted for noncommercial reproduction of the work for educational or research purposes. ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Permission is granted for noncommercial reproduction of the work for educational or research purposes.

