Results 1  10
of
38
Learning with Matrix Factorization
, 2004
"... Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent
A Unified View of Matrix Factorization Models
"... Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, EPCA, MMMF, pLSI, pLSIpHITS, Bregman coclustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as m ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, EPCA, MMMF, pLSI, pLSIpHITS, Bregman coclustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as minimizing a generalized Bregman divergence, and we show that (i) a straightforward alternating projection algorithm can be applied to almost any model in our unified view; (ii) the Hessian for each projection has special structure that makes a Newton projection feasible, even when there are equality constraints on the factors, which allows for matrix coclustering; and (iii) alternating projections can be generalized to simultaneously factor a set of matrices that share dimensions. These observations immediately yield new optimization algorithms for the above factorization methods, and suggest novel generalizations of these methods such as incorporating row and column biases, and adding or relaxing clustering constraints. 1
The discrete basis problem
, 2005
"... We consider the Discrete Basis Problem, which can be described as follows: given a collection of Boolean vectors find a collection of k Boolean basis vectors such that the original vectors can be represented using disjunctions of these basis vectors. We show that the decision version of this problem ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
We consider the Discrete Basis Problem, which can be described as follows: given a collection of Boolean vectors find a collection of k Boolean basis vectors such that the original vectors can be represented using disjunctions of these basis vectors. We show that the decision version of this problem is NPcomplete and that the optimization version cannot be approximated within any finite ratio. We also study two variations of this problem, where the Boolean basis vectors must be mutually otrhogonal. We show that the other variation is closely related with the wellknown Metric kmedian Problem in Boolean space. To solve these problems, two algorithms will be presented. One is designed for the variations mentioned above, and it is solely based on solving the kmedian problem, while another is a heuristic intended to solve the general Discrete Basis Problem. We will also study the results of extensive experiments made with these two algorithms with both synthetic and realworld data. The results are twofold: with the synthetic data, the algorithms did rather well, but with the realworld data the results were not as good.
ClosedForm Supervised Dimensionality Reduction with Generalized Linear Models (Technical Report
 IBM T.J. Watson Research Center
, 2008
"... We propose a family of supervised dimensionality reduction (SDR) algorithms that combine feature extraction (dimensionality reduction) with learning a predictive model in a unified optimization framework, using data and classappropriate generalized linear models (GLMs), and handling both classific ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
We propose a family of supervised dimensionality reduction (SDR) algorithms that combine feature extraction (dimensionality reduction) with learning a predictive model in a unified optimization framework, using data and classappropriate generalized linear models (GLMs), and handling both classification and regression problems. Our approach uses simple closedform update rules and is provably convergent. Promising empirical results are demonstrated on a variety of highdimensional datasets. 1.
Transformation invariant component analysis for binary images
 In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume I
, 2006
"... There are various situations where image data is binary: character recognition, result of image segmentation etc. As a first contribution, we compare Gaussian based principal component analysis (PCA), which is often used to model images, and ”binary PCA ” which models the binary data more naturally ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
There are various situations where image data is binary: character recognition, result of image segmentation etc. As a first contribution, we compare Gaussian based principal component analysis (PCA), which is often used to model images, and ”binary PCA ” which models the binary data more naturally using Bernoulli distributions. Furthermore, we address the problem of data alignment. Image data is often perturbed by some global transformations such as shifting, rotation, scaling etc. In such cases the data needs to be transformed to some canonical aligned form. As a second contribution, we extend the binary PCA to the ”transformation invariant mixture of binary PCAs ” which simultaneously corrects the data for a set of global transformations and learns the binary PCA model on the aligned data. 1 1.
NoisyOR Component Analysis and its Application to Link Analysis
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We develop a new component analysis framework, the NoisyOr Component Analyzer (NOCA), that targets highdimensional binary data. NOCA is a probabilistic latent variable model that assumes the expression of observed highdimensional binary data is driven by a small number of hidden binary sources ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We develop a new component analysis framework, the NoisyOr Component Analyzer (NOCA), that targets highdimensional binary data. NOCA is a probabilistic latent variable model that assumes the expression of observed highdimensional binary data is driven by a small number of hidden binary sources combined via noisyor units. The component analysis procedure is equivalent to learning of NOCA parameters. Since the classical EM formulation of the NOCA learning problem is intractable, we develop its variational approximation. We test the NOCA framework on two problems: (1) a synthetic imagedecomposition problem and (2) a cocitation data analysis problem for thousands of CiteSeer documents. We demonstrate good performance of the new model on both problems. In addition, we contrast the model to two mixturebased latentfactor models: the probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA).
Factorisation and denoising of 0–1 data: a variational approach
 Neurocomputing, special
"... Presenceabsence (01) observations are special in that often the absence of evidence is not evidence of absence. Here we develop an independent factor model, which has the unique capability to isolate the former as an independent discrete binary noise factor. This representation then forms the basi ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Presenceabsence (01) observations are special in that often the absence of evidence is not evidence of absence. Here we develop an independent factor model, which has the unique capability to isolate the former as an independent discrete binary noise factor. This representation then forms the basis of inferring missed presences by means of denoising. This is achieved in a probabilistic formalism, employing independent Beta latent source densities and a Bernoulli data likelihood model. Variational approximations are employed to make the inferences tractable. We relate our model to existing models of 01 data, demonstrating its advantages for the problem considered, and we present applications in several problem domains, including social network analysis and DNA fingerprint analysis. Key words: factor models, data denoising, 01 data 1
Learning to Read Between the Lines: The Aspect Bernoulli Model
 Proc. SIAM Int Conf on Data Mining
, 2004
"... We present a novel probabilistic multiple cause model for binary observations. In contrast to other approaches, the model is linear and it infers reasons behind both observed and unobserved attributes with the aid of an explanatory variable. We exploit this distinctive feature of the method to autom ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We present a novel probabilistic multiple cause model for binary observations. In contrast to other approaches, the model is linear and it infers reasons behind both observed and unobserved attributes with the aid of an explanatory variable. We exploit this distinctive feature of the method to automatically distinguish between attributes that are `o#' by content and those that are missing. Results on artificially corrupted binary images as well as the expansion of short text documents are given by way of demonstration.
CROC: A New Evaluation Criterion for Recommender Systems
"... Evaluation of a recommender system algorithm is a challenging task due to the many possible scenarios in which such systems may be deployed. We have designed a new performance plot called the CROC curve with an associated statistic: the area under the curve. Our CROC curve supplements the widely use ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Evaluation of a recommender system algorithm is a challenging task due to the many possible scenarios in which such systems may be deployed. We have designed a new performance plot called the CROC curve with an associated statistic: the area under the curve. Our CROC curve supplements the widely used ROC curve in recommender system evaluation by discovering performance characteristics that standard ROC evaluation often ignores. Empirical studies on two domains and including several recommender system algorithms demonstrate that combining ROC and CROC curves in evaluation can lead to a more informed characterization of performance than using either curve alone.
Weighted LowRank Approximations
 In 20th International Conference on Machine Learning
, 2003
"... We study the common problem of approximating a target matrix with a matrix of lower rank. We provide a simple and e#cient (EM) algorithm for solving weighted lowrank approximation problems, which, unlike their unweighted version, do not admit a closedform solution in general. We analyze, in a ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We study the common problem of approximating a target matrix with a matrix of lower rank. We provide a simple and e#cient (EM) algorithm for solving weighted lowrank approximation problems, which, unlike their unweighted version, do not admit a closedform solution in general. We analyze, in addition, the nature of locally optimal solutions that arise in this context, demonstrate the utility of accommodating the weights in reconstructing the underlying lowrank representation, and extend the formulation to nonGaussian noise models such as logistic models.