Results 1  10
of
81
Fast Computation of Low Rank Matrix Approximations
, 2001
"... In many practical applications, given an m n matrix A it is of interest to nd an approximation to A that has low rank. We introduce a technique that exploits spectral structure in A to accelerate Orthogonal Iteration and Lanczos Iteration, the two most common methods for computing such approximat ..."
Abstract

Cited by 128 (3 self)
 Add to MetaCart
In many practical applications, given an m n matrix A it is of interest to nd an approximation to A that has low rank. We introduce a technique that exploits spectral structure in A to accelerate Orthogonal Iteration and Lanczos Iteration, the two most common methods for computing such approximations. Our technique amounts to independently sampling and/or quantizing the entries of the input matrix A, thus speeding up computation by reducing the number of nonzero entries and/or the length of their representation. Our analysis s based on observing that both sampling and quantization can be viewed as adding a random matrix E to A, where the entries of E are independent, zeromean random variables of bounded variance. Such random matrices posses no significant linear structure, and we can thus prove that the effect of sampling and quantization nearly vanishes when a low rank approximation to A is computed. In fact, the more prominent the linear structure in A is, the more data we can afford to discard and, ultimately, the faster we can discover it. We give bounds on the quality of our approximation both in the L2 and in the Frobenius norm.
pSearch: Information Retrieval in Structured Overlays
, 2002
"... We describe an efficient peertopeer information retrieval system, pSearch, that supports stateoftheart content and semanticbased fulltext searches. pSearch avoids the scalability problem of existing systems that employ centralized indexing, or index/query flooding. It also avoids the nondete ..."
Abstract

Cited by 76 (6 self)
 Add to MetaCart
We describe an efficient peertopeer information retrieval system, pSearch, that supports stateoftheart content and semanticbased fulltext searches. pSearch avoids the scalability problem of existing systems that employ centralized indexing, or index/query flooding. It also avoids the nondeterminism that is exhibited by heuristicbased approaches. In pSearch, documents in the network are organized around their vector representations (based on modern document ranking algorithms) such that the search space for a given query is organized around related documents, achieving both eciency and accuracy.
Sampling from large matrices: an approach through geometric functional analysis
 Journal of the ACM
, 2006
"... Abstract. We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(r log r) with a small error in the spectral norm, where r = �A�2 F /�A�22 is the numerical rank of A. The numerical rank is always bounded by, a ..."
Abstract

Cited by 69 (4 self)
 Add to MetaCart
Abstract. We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(r log r) with a small error in the spectral norm, where r = �A�2 F /�A�22 is the numerical rank of A. The numerical rank is always bounded by, and is a stable relaxation of, the rank of A. This yields an asymptotically optimal guarantee in an algorithm for computing lowrank approximations of A. We also prove asymptotically optimal estimates on the spectral norm and the cutnorm of random submatrices of A. The result for the cutnorm yields a slight improvement on the best known sample complexity for an approximation algorithm for MAX2CSP problems. We use methods of Probability in Banach spaces, in particular the law of large numbers for operatorvalued random variables. 1.
A TwoDimensional Data Distribution Method For Parallel Sparse MatrixVector Multiplication
 SIAM REVIEW
"... A new method is presented for distributing data in sparse matrixvector multiplication. The method is twodimensional, tries to minimise the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipar ..."
Abstract

Cited by 68 (9 self)
 Add to MetaCart
A new method is presented for distributing data in sparse matrixvector multiplication. The method is twodimensional, tries to minimise the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipartitioning of the sparse matrix, each time splitting a rectangular matrix into two parts with a nearly equal number of nonzeros. The communication volume caused by the split is minimised. After the matrix partitioning, the input and output vectors are partitioned with the objective of minimising the maximum communication volume per processor. Experimental results of our implementation, Mondriaan, for a set of sparse test matrices show a reduction in communication compared to onedimensional methods, and in general a good balance in the communication work.
Matrix completion from a few entries
"... Let M be a random nα × n matrix of rank r ≪ n, and assume that a uniformly random subset E of its entries is observed. We describe an efficient algorithm that reconstructs M from E  = O(r n) observed entries with relative root mean square error RMSE ≤ C(α) ..."
Abstract

Cited by 68 (5 self)
 Add to MetaCart
Let M be a random nα × n matrix of rank r ≪ n, and assume that a uniformly random subset E of its entries is observed. We describe an efficient algorithm that reconstructs M from E  = O(r n) observed entries with relative root mean square error RMSE ≤ C(α)
A survey of eigenvector methods of web information retrieval
 SIAM Rev
"... Abstract. Web information retrieval is significantly more challenging than traditional wellcontrolled, small document collection information retrieval. One main difference between traditional information retrieval and Web information retrieval is the Web’s hyperlink structure. This structure has bee ..."
Abstract

Cited by 66 (6 self)
 Add to MetaCart
Abstract. Web information retrieval is significantly more challenging than traditional wellcontrolled, small document collection information retrieval. One main difference between traditional information retrieval and Web information retrieval is the Web’s hyperlink structure. This structure has been exploited by several of today’s leading Web search engines, particularly Google and Teoma. In this survey paper, we focus on Web information retrieval methods that use eigenvector computations, presenting the three popular methods of HITS, PageRank, and SALSA.
Alignmentfree sequence comparisona review
 Bioinformatics
, 2003
"... Motivation: Genetic recombination and, in particular, genetic shuffling are at odds with sequence comparison by alignment, which assumes conservation of contiguity between homologous segments. A variety of theoretical foundations are being used to derive alignmentfree methods that overcome this lim ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
Motivation: Genetic recombination and, in particular, genetic shuffling are at odds with sequence comparison by alignment, which assumes conservation of contiguity between homologous segments. A variety of theoretical foundations are being used to derive alignmentfree methods that overcome this limitation. The formulation of alternative metrics for dissimilarity between sequences and their algorithmic implementations are reviewed. Results: The overwhelming majority of work on alignmentfree sequence has taken place in the past two decades, with most reports published in the past 5 years. Two main categories of methods have been proposed—methods based on word (oligomer) frequency, and methods that do not require resolving the sequence with fixed word length segments. The first category is based on the statistics of word frequency, on the distances defined in a Cartesian space defined by the frequency vectors, and on the information content of frequency distribution. The second category includes the use of Kolmogorov complexity and Chaos Theory. Despite their low visibility, alignmentfree metrics are in fact already widely used as preselection filters for alignmentbased querying of large applications. Recent work is furthering their usage as a scaleindependent methodology that is capable of recognizing homology when loss of contiguity is beyond the possibility of alignment. Availability: Most of the alignmentfree algorithms reviewed were implemented in MATLAB code and are available
PeerSearch: Efficient Information Retrieval in PeertoPeer Networks
 IN PROCEEDINGS OF HOTNETSI, ACM SIGCOMM
, 2002
"... In this paper, we propose an efficient peertopeer information retrieval system PeerSearch that supports stateoftheart content and semantic searches. PeerSearch avoids the scalability problem of existing systems that employ centralized indexing, index flooding, or query flooding. It also avoids ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
In this paper, we propose an efficient peertopeer information retrieval system PeerSearch that supports stateoftheart content and semantic searches. PeerSearch avoids the scalability problem of existing systems that employ centralized indexing, index flooding, or query flooding. It also avoids the nondeterminism that exhibited by heuristicbased approaches. PeerSearch
On Scaling Latent Semantic Indexing for Large PeerToPeer Systems
 Proc. 27th Annual International ACM SIGIR Conference
, 2004
"... The exponential growth of data demands scalable infrastructures capable of indexing and searching rich content such as text, music, and images. A promising direction is to combine information retrieval with peertopeer technology for scalability, faulttolerance, and low administration cost. One pi ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
The exponential growth of data demands scalable infrastructures capable of indexing and searching rich content such as text, music, and images. A promising direction is to combine information retrieval with peertopeer technology for scalability, faulttolerance, and low administration cost. One pioneering work along this direction is pSearch [32, 33]. pSearch places documents onto a peerto peer overlay network according to semantic vectors produced using Latent Semantic Indexing (LSI). The search cost for a query is reduced since documents related to the query are likely to be colocated on a small number of nodes. Unfortunately, because of its reliance on LSI, pSearch also inherits the limitations of LSI. (1) When the corpus is large and heterogeneous, LSI's retrieval quality is inferior to methods such as Okapi. (2) The Singular Value Decomposition (SVD) used in LSI is unscalable in terms of both memory consumption and computation time.
Semantic Small World: An overlay network for peertopeer search
, 2004
"... For a peertopeer (P2P) system holding massive amount of data, efficient semantic based search for resources (such as data or services) is a key determinant to its scalability. This paper presents the design of an overlay network, namely semantic small world (SSW), that facilitates efficient semant ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
For a peertopeer (P2P) system holding massive amount of data, efficient semantic based search for resources (such as data or services) is a key determinant to its scalability. This paper presents the design of an overlay network, namely semantic small world (SSW), that facilitates efficient semantic based search in P2P systems. SSW is based on three innovative ideas: 1) small world network; 2) semantic clustering; 3) dimension reduction. Peers in SSW are clustered according to the semantics of their local data and selforganized as a small world overlay network. To address the maintenance issue of high dimensional overlay networks, a dynamic dimension reduction method, called adaptive space linearization, is used to construct a onedimensional SSW that supports operations in the high dimensional semantic space. SSW achieves a very competitive tradeoff between the search latencies/traffic and maintenance overheads. Through extensive simulations, we show that SSW is much more scalable to very large network sizes and very large numbers of data objects compared to pSearch, the stateoftheart semanticbased search technique for P2P systems. In addition, SSW is adaptive to distribution of data and locality of interest; is very resilient to failures; and has good load balancing property. 1.