Results 1  10
of
10
Matrix completion from a few entries
"... Let M be a random nα × n matrix of rank r ≪ n, and assume that a uniformly random subset E of its entries is observed. We describe an efficient algorithm that reconstructs M from E  = O(r n) observed entries with relative root mean square error RMSE ≤ C(α) ..."
Abstract

Cited by 68 (5 self)
 Add to MetaCart
Let M be a random nα × n matrix of rank r ≪ n, and assume that a uniformly random subset E of its entries is observed. We describe an efficient algorithm that reconstructs M from E  = O(r n) observed entries with relative root mean square error RMSE ≤ C(α)
A rankrevealing method with updating, downdating and applications
 SIAM J. Matrix Anal. Appl
"... Abstract. A new rank revealing method is proposed. For a given matrix and a threshold for nearzero singular values, by employing a globally convergent iterative scheme as well as a deflation technique the method calculates approximate singular values below the threshold one by one and returns the a ..."
Abstract

Cited by 24 (8 self)
 Add to MetaCart
Abstract. A new rank revealing method is proposed. For a given matrix and a threshold for nearzero singular values, by employing a globally convergent iterative scheme as well as a deflation technique the method calculates approximate singular values below the threshold one by one and returns the approximate rank of the matrix along with an orthonormal basis for the approximate null space. When a row or column is inserted or deleted, algorithms for updating/downdating the approximate rank and null space are straightforward, stable and efficient. Numerical results exhibiting the advantages of our code over existing packages based on twosided orthogonal rankrevealing decompositions are presented. Also presented are applications of the new algorithm in numerical computation of the polynomial GCD as well as identification of nonisolated zeros of polynomial systems.
Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection
, 2008
"... Nearest neighbor graphs are widely used in data mining and machine learning. The bruteforce method to compute the exact kNN graph takes Θ(dn 2) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate kNN graph in Θ(dn t) ti ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
Nearest neighbor graphs are widely used in data mining and machine learning. The bruteforce method to compute the exact kNN graph takes Θ(dn 2) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate kNN graph in Θ(dn t) time for high dimensional data (large d). The exponent t depends on an internal parameter and is larger than one. Experiments show that a high quality graph usually requires a small t which is close to one. A few of the practical details of the algorithms are as follows. First, the divide step uses an inexpensive Lanczos procedure to perform recursive spectral bisection. After each conquer step, an additional refinement step is performed to improve the accuracy of the graph. Finally, a hash table is used to avoid repeating distance calculations during the divide and conquer process. The combination of these techniques is shown to yield quite effective algorithms for building kNN graphs.
Internet Document Filtering Using Fourier Domain Scoring
 In Luc de Raedt and Arno Siebes, editors, Principles of Data Mining and Knowledge Discovery, number 2168 in Lecture Notes in Artificial Intelligence
, 2001
"... Most search engines return alW of unwanted information. A more thorough filrough process can be performed on this information to sort out therelL ant documents. A new methodcal)x Frequency Domain Scoring (FDS), which is based on the Fourier Transform is proposed. ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Most search engines return alW of unwanted information. A more thorough filrough process can be performed on this information to sort out therelL ant documents. A new methodcal)x Frequency Domain Scoring (FDS), which is based on the Fourier Transform is proposed.
Lanczos Vectors versus Singular Vectors for Effective Dimension Reduction
, 2008
"... This paper takes an indepth look at a technique for computing filtered matrixvector (matvec) products which are required in many data analysis applications. In these applications the data matrix is multiplied by a vector and we wish to perform this product accurately in the space spanned by a few ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper takes an indepth look at a technique for computing filtered matrixvector (matvec) products which are required in many data analysis applications. In these applications the data matrix is multiplied by a vector and we wish to perform this product accurately in the space spanned by a few of the major singular vectors of the matrix. We examine the use of the Lanczos algorithm for this purpose. The goal of the method is identical with that of the truncated singular value decomposition (SVD), namely to preserve the quality of the resulting matvec product in the major singular directions of the matrix. The Lanczosbased approach achieves this goal by using a small number of Lanczos vectors, but it does not explicitly compute singular values/vectors of the matrix. The main advantage of the Lanczosbased technique is its low cost when compared with that of the truncated SVD. This advantage comes without sacrificing accuracy. The effectiveness of this approach is demonstrated on a few sample applications requiring dimension reduction, including information retrieval and face recognition. The proposed technique can be applied as a replacement to the truncated SVD technique whenever the problem can be formulated as a filtered matvec multiplication.
Populating Categories using Constrained Matrix Factorization
, 2010
"... Matrix factorization methods are a wellscalable means of discovering generalizable information in noisy training data with many examples and many features. We propose a method to populate a given ontology of categories and seed examples using matrix factorization with constraints, based on a large ..."
Abstract
 Add to MetaCart
Matrix factorization methods are a wellscalable means of discovering generalizable information in noisy training data with many examples and many features. We propose a method to populate a given ontology of categories and seed examples using matrix factorization with constraints, based on a large corpus of nounphrase/context cooccurrence statistics. While our method performs reasonably well on some categories, it is outperformed by a simple nearestneighbor based baseline. We demonstrate, however, that dimensionality reduction applied to the baseline model improves performance considerably. 1
Incremental Matrix Factorization for Collaborative Filtering
"... Based on Singular Value Decomposition an incremental and iterative Matrix Factorization method for very sparse matrices is presented. Such matrices arise in Collaborative Filtering (CF) systems, like the Netflix system. This paper shows how such an incremental Matrix Factorization can be used to pre ..."
Abstract
 Add to MetaCart
Based on Singular Value Decomposition an incremental and iterative Matrix Factorization method for very sparse matrices is presented. Such matrices arise in Collaborative Filtering (CF) systems, like the Netflix system. This paper shows how such an incremental Matrix Factorization can be used to predict ratings in a CF system and therefore how to fill the empty fields of a rating matrix of a CF system. Also the here presented method is easy to implement and offers, if implemented in the right way, a good and reliable performance. A. Recommendation Systems I.
Examining Committee:
"... MultiNet: An interactive program for analysing and visualizing complex networks by ..."
Abstract
 Add to MetaCart
MultiNet: An interactive program for analysing and visualizing complex networks by
A Novel Clustering Method of High Dimensional Data
"... In various application domains, data are often presented in very highdimensional formats; the dimension could be in the hundreds, thousands or more, for example in text/web mining for browsing related documents that matched users’ query and bioinformatics for finding out genes and proteins that hav ..."
Abstract
 Add to MetaCart
In various application domains, data are often presented in very highdimensional formats; the dimension could be in the hundreds, thousands or more, for example in text/web mining for browsing related documents that matched users’ query and bioinformatics for finding out genes and proteins that have similar functionality. In addition, these data sets are often sparse, and the embedded concepts are heterogeneous. Discovering the homogeneous concept groups in the highdimensional data sets and clustering them accordingly are contemporary challenge. Conventional clustering techniques often based on Euclidean metric. However, the metric is ad hoc not intrinsic to the semantic of the documents. In this paper, we are proposing a novel approach, in which the semantic space of highdimensional data is structured as a simplicial complex of Euclidean space (a hypergraph but with different focus). Such a simplicial structure intrinsically captures the semantic of the data; for example, the coherent topics of documents will appear in the same connected component. Finally, we cluster the data by the structure of concepts, which is organized by such a geometry. 1.
Constrained Singular Value Decomposition Read the Web Project Proposal
, 2009
"... Matrix factorization methods are a wellscalable means of discovering generalizable information in noisy training data with many examples and many features. We propose to populate a given ontology of categories and seed examples using matrix factorization with constraints, similar in spirit to singu ..."
Abstract
 Add to MetaCart
Matrix factorization methods are a wellscalable means of discovering generalizable information in noisy training data with many examples and many features. We propose to populate a given ontology of categories and seed examples using matrix factorization with constraints, similar in spirit to singular