Results 1 - 10
of
18
Learning with Matrix Factorization
, 2004
"... Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent
A Unified View of Matrix Factorization Models
"... Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, E-PCA, MMMF, pLSI, pLSI-pHITS, Bregman co-clustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as m ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, E-PCA, MMMF, pLSI, pLSI-pHITS, Bregman co-clustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as minimizing a generalized Bregman divergence, and we show that (i) a straightforward alternating projection algorithm can be applied to almost any model in our unified view; (ii) the Hessian for each projection has special structure that makes a Newton projection feasible, even when there are equality constraints on the factors, which allows for matrix co-clustering; and (iii) alternating projections can be generalized to simultaneously factor a set of matrices that share dimensions. These observations immediately yield new optimization algorithms for the above factorization methods, and suggest novel generalizations of these methods such as incorporating row and column biases, and adding or relaxing clustering constraints. 1
The discrete basis problem
, 2005
"... We consider the Discrete Basis Problem, which can be described as follows: given a collection of Boolean vectors find a collection of k Boolean basis vectors such that the original vectors can be represented using disjunctions of these basis vectors. We show that the decision version of this problem ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
We consider the Discrete Basis Problem, which can be described as follows: given a collection of Boolean vectors find a collection of k Boolean basis vectors such that the original vectors can be represented using disjunctions of these basis vectors. We show that the decision version of this problem is NP-complete and that the optimization version cannot be approximated within any finite ratio. We also study two variations of this problem, where the Boolean basis vectors must be mutually otrhogonal. We show that the other variation is closely related with the well-known Metric k-median Problem in Boolean space. To solve these problems, two algorithms will be presented. One is designed for the variations mentioned above, and it is solely based on solving the k-median problem, while another is a heuristic intended to solve the general Discrete Basis Problem. We will also study the results of extensive experiments made with these two algorithms with both synthetic and real-world data. The results are twofold: with the synthetic data, the algorithms did rather well, but with the real-world data the results were not as good.
Transformation invariant component analysis for binary images
- In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume I
, 2006
"... There are various situations where image data is binary: character recognition, result of image segmentation etc. As a first contribution, we compare Gaussian based principal component analysis (PCA), which is often used to model images, and ”binary PCA ” which models the binary data more naturally ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
There are various situations where image data is binary: character recognition, result of image segmentation etc. As a first contribution, we compare Gaussian based principal component analysis (PCA), which is often used to model images, and ”binary PCA ” which models the binary data more naturally using Bernoulli distributions. Furthermore, we address the problem of data alignment. Image data is often perturbed by some global transformations such as shifting, rotation, scaling etc. In such cases the data needs to be transformed to some canonical aligned form. As a second contribution, we extend the binary PCA to the ”transformation invariant mixture of binary PCAs ” which simultaneously corrects the data for a set of global transformations and learns the binary PCA model on the aligned data. 1 1.
Closed-Form Supervised Dimensionality Reduction with Generalized Linear Models (Technical Report
- IBM T.J. Watson Research Center
, 2008
"... We propose a family of supervised dimensionality reduction (SDR) algorithms that combine feature extraction (dimensionality reduction) with learning a predictive model in a unified optimization framework, using data- and class-appropriate generalized linear models (GLMs), and handling both classific ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We propose a family of supervised dimensionality reduction (SDR) algorithms that combine feature extraction (dimensionality reduction) with learning a predictive model in a unified optimization framework, using data- and class-appropriate generalized linear models (GLMs), and handling both classification and regression problems. Our approach uses simple closed-form update rules and is provably convergent. Promising empirical results are demonstrated on a variety of high-dimensional datasets. 1.
Learning to Read Between the Lines: The Aspect Bernoulli Model
- Proc. SIAM Int Conf on Data Mining
, 2004
"... We present a novel probabilistic multiple cause model for binary observations. In contrast to other approaches, the model is linear and it infers reasons behind both observed and unobserved attributes with the aid of an explanatory variable. We exploit this distinctive feature of the method to autom ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We present a novel probabilistic multiple cause model for binary observations. In contrast to other approaches, the model is linear and it infers reasons behind both observed and unobserved attributes with the aid of an explanatory variable. We exploit this distinctive feature of the method to automatically distinguish between attributes that are `o#' by content and those that are missing. Results on artificially corrupted binary images as well as the expansion of short text documents are given by way of demonstration.
Noisy-OR Component Analysis and its Application to Link Analysis
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We develop a new component analysis framework, the Noisy-Or Component Analyzer (NOCA), that targets high-dimensional binary data. NOCA is a probabilistic latent variable model that assumes the expression of observed high-dimensional binary data is driven by a small number of hidden binary sources ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We develop a new component analysis framework, the Noisy-Or Component Analyzer (NOCA), that targets high-dimensional binary data. NOCA is a probabilistic latent variable model that assumes the expression of observed high-dimensional binary data is driven by a small number of hidden binary sources combined via noisy-or units. The component analysis procedure is equivalent to learning of NOCA parameters. Since the classical EM formulation of the NOCA learning problem is intractable, we develop its variational approximation. We test the NOCA framework on two problems: (1) a synthetic image-decomposition problem and (2) a co-citation data analysis problem for thousands of CiteSeer documents. We demonstrate good performance of the new model on both problems. In addition, we contrast the model to two mixture-based latent-factor models: the probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA).
Factorisation and denoising of 0–1 data: a variational approach
- Neurocomputing, special
"... Presence-absence (0-1) observations are special in that often the absence of evidence is not evidence of absence. Here we develop an independent factor model, which has the unique capability to isolate the former as an independent discrete binary noise factor. This representation then forms the basi ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Presence-absence (0-1) observations are special in that often the absence of evidence is not evidence of absence. Here we develop an independent factor model, which has the unique capability to isolate the former as an independent discrete binary noise factor. This representation then forms the basis of inferring missed presences by means of denoising. This is achieved in a probabilistic formalism, employing independent Beta latent source densities and a Bernoulli data likelihood model. Variational approximations are employed to make the inferences tractable. We relate our model to existing models of 0-1 data, demonstrating its advantages for the problem considered, and we present applications in several problem domains, including social network analysis and DNA fingerprint analysis. Key words: factor models, data denoising, 0-1 data 1
CROC: A New Evaluation Criterion for Recommender Systems
"... Evaluation of a recommender system algorithm is a challenging task due to the many possible scenarios in which such systems may be deployed. We have designed a new performance plot called the CROC curve with an associated statistic: the area under the curve. Our CROC curve supplements the widely use ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Evaluation of a recommender system algorithm is a challenging task due to the many possible scenarios in which such systems may be deployed. We have designed a new performance plot called the CROC curve with an associated statistic: the area under the curve. Our CROC curve supplements the widely used ROC curve in recommender system evaluation by discovering performance characteristics that standard ROC evaluation often ignores. Empirical studies on two domains and including several recommender system algorithms demonstrate that combining ROC and CROC curves in evaluation can lead to a more informed characterization of performance than using either curve alone.
Weighted Low-Rank Approximations
- In 20th International Conference on Machine Learning
, 2003
"... We study the common problem of approximating a target matrix with a matrix of lower rank. We provide a simple and e#cient (EM) algorithm for solving weighted low-rank approximation problems, which, unlike their unweighted version, do not admit a closedform solution in general. We analyze, in a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We study the common problem of approximating a target matrix with a matrix of lower rank. We provide a simple and e#cient (EM) algorithm for solving weighted low-rank approximation problems, which, unlike their unweighted version, do not admit a closedform solution in general. We analyze, in addition, the nature of locally optimal solutions that arise in this context, demonstrate the utility of accommodating the weights in reconstructing the underlying low-rank representation, and extend the formulation to nonGaussian noise models such as logistic models.

