Results 1  10
of
55
Learning with Matrix Factorization
, 2004
"... Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning ..."
Abstract

Cited by 76 (6 self)
 Add to MetaCart
(Show Context)
Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent
A Unified View of Matrix Factorization Models
"... Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, EPCA, MMMF, pLSI, pLSIpHITS, Bregman coclustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as m ..."
Abstract

Cited by 57 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, EPCA, MMMF, pLSI, pLSIpHITS, Bregman coclustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as minimizing a generalized Bregman divergence, and we show that (i) a straightforward alternating projection algorithm can be applied to almost any model in our unified view; (ii) the Hessian for each projection has special structure that makes a Newton projection feasible, even when there are equality constraints on the factors, which allows for matrix coclustering; and (iii) alternating projections can be generalized to simultaneously factor a set of matrices that share dimensions. These observations immediately yield new optimization algorithms for the above factorization methods, and suggest novel generalizations of these methods such as incorporating row and column biases, and adding or relaxing clustering constraints. 1
The discrete basis problem
, 2005
"... We consider the Discrete Basis Problem, which can be described as follows: given a collection of Boolean vectors find a collection of k Boolean basis vectors such that the original vectors can be represented using disjunctions of these basis vectors. We show that the decision version of this problem ..."
Abstract

Cited by 38 (13 self)
 Add to MetaCart
(Show Context)
We consider the Discrete Basis Problem, which can be described as follows: given a collection of Boolean vectors find a collection of k Boolean basis vectors such that the original vectors can be represented using disjunctions of these basis vectors. We show that the decision version of this problem is NPcomplete and that the optimization version cannot be approximated within any finite ratio. We also study two variations of this problem, where the Boolean basis vectors must be mutually otrhogonal. We show that the other variation is closely related with the wellknown Metric kmedian Problem in Boolean space. To solve these problems, two algorithms will be presented. One is designed for the variations mentioned above, and it is solely based on solving the kmedian problem, while another is a heuristic intended to solve the general Discrete Basis Problem. We will also study the results of extensive experiments made with these two algorithms with both synthetic and realworld data. The results are twofold: with the synthetic data, the algorithms did rather well, but with the realworld data the results were not as good.
ClosedForm Supervised Dimensionality Reduction with Generalized Linear Models (Technical Report
 IBM T.J. Watson Research Center
, 2008
"... We propose a family of supervised dimensionality reduction (SDR) algorithms that combine feature extraction (dimensionality reduction) with learning a predictive model in a unified optimization framework, using data and classappropriate generalized linear models (GLMs), and handling both classific ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
We propose a family of supervised dimensionality reduction (SDR) algorithms that combine feature extraction (dimensionality reduction) with learning a predictive model in a unified optimization framework, using data and classappropriate generalized linear models (GLMs), and handling both classification and regression problems. Our approach uses simple closedform update rules and is provably convergent. Promising empirical results are demonstrated on a variety of highdimensional datasets. 1.
Transformation invariant component analysis for binary images
 In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume I
, 2006
"... There are various situations where image data is binary: character recognition, result of image segmentation etc. As a first contribution, we compare Gaussian based principal component analysis (PCA), which is often used to model images, and ”binary PCA ” which models the binary data more naturally ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
There are various situations where image data is binary: character recognition, result of image segmentation etc. As a first contribution, we compare Gaussian based principal component analysis (PCA), which is often used to model images, and ”binary PCA ” which models the binary data more naturally using Bernoulli distributions. Furthermore, we address the problem of data alignment. Image data is often perturbed by some global transformations such as shifting, rotation, scaling etc. In such cases the data needs to be transformed to some canonical aligned form. As a second contribution, we extend the binary PCA to the ”transformation invariant mixture of binary PCAs ” which simultaneously corrects the data for a set of global transformations and learns the binary PCA model on the aligned data. 1 1.
NoisyOR Component Analysis and its Application to Link Analysis
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We develop a new component analysis framework, the NoisyOr Component Analyzer (NOCA), that targets highdimensional binary data. NOCA is a probabilistic latent variable model that assumes the expression of observed highdimensional binary data is driven by a small number of hidden binary sources ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We develop a new component analysis framework, the NoisyOr Component Analyzer (NOCA), that targets highdimensional binary data. NOCA is a probabilistic latent variable model that assumes the expression of observed highdimensional binary data is driven by a small number of hidden binary sources combined via noisyor units. The component analysis procedure is equivalent to learning of NOCA parameters. Since the classical EM formulation of the NOCA learning problem is intractable, we develop its variational approximation. We test the NOCA framework on two problems: (1) a synthetic imagedecomposition problem and (2) a cocitation data analysis problem for thousands of CiteSeer documents. We demonstrate good performance of the new model on both problems. In addition, we contrast the model to two mixturebased latentfactor models: the probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA).
Factorisation and denoising of 0–1 data: a variational approach
 Neurocomputing, special
"... Presenceabsence (01) observations are special in that often the absence of evidence is not evidence of absence. Here we develop an independent factor model, which has the unique capability to isolate the former as an independent discrete binary noise factor. This representation then forms the basi ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
Presenceabsence (01) observations are special in that often the absence of evidence is not evidence of absence. Here we develop an independent factor model, which has the unique capability to isolate the former as an independent discrete binary noise factor. This representation then forms the basis of inferring missed presences by means of denoising. This is achieved in a probabilistic formalism, employing independent Beta latent source densities and a Bernoulli data likelihood model. Variational approximations are employed to make the inferences tractable. We relate our model to existing models of 01 data, demonstrating its advantages for the problem considered, and we present applications in several problem domains, including social network analysis and DNA fingerprint analysis. Key words: factor models, data denoising, 01 data 1
Dynamic Cognitive Tracing: Towards Unified Discovery of Student and Cognitive Models
 in Proceedings of the Fifth International Conference on Educational Data Mining. 2012, in press. Chania
"... This work describes a unified approach to two problems previously addressed separately in Intelligent Tutoring Systems: (i) Cognitive Modeling, which factorizes problem solving steps into the latent set of skills required to perform them [7]; and (ii) Student Modeling, which infers students ’ learni ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
This work describes a unified approach to two problems previously addressed separately in Intelligent Tutoring Systems: (i) Cognitive Modeling, which factorizes problem solving steps into the latent set of skills required to perform them [7]; and (ii) Student Modeling, which infers students ’ learning by observing student performance [9]. The practical importance of improving understanding of how students learn is to build better intelligent tutors [8]. The expected advantages of our integrated approach include (i) more accurate prediction of a student’s future performance, and (ii) clustering items into skills automatically, without expensive manual expert knowledge annotation. We introduce a unified model, Dynamic Cognitive Tracing, to explain student learning in terms of skill mastery over time, by learning the Cognitive Model and the Student Model jointly. We formulate our approach as a graphical model, and we validate it using sixty different synthetic datasets. Dynamic Cognitive Tracing significantly outperforms singleskill Knowledge Tracing on predicting future student performance. 1.
Learning to Read Between the Lines: The Aspect Bernoulli Model
 Proc. SIAM Int Conf on Data Mining
, 2004
"... We present a novel probabilistic multiple cause model for binary observations. In contrast to other approaches, the model is linear and it infers reasons behind both observed and unobserved attributes with the aid of an explanatory variable. We exploit this distinctive feature of the method to autom ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
We present a novel probabilistic multiple cause model for binary observations. In contrast to other approaches, the model is linear and it infers reasons behind both observed and unobserved attributes with the aid of an explanatory variable. We exploit this distinctive feature of the method to automatically distinguish between attributes that are `o#' by content and those that are missing. Results on artificially corrupted binary images as well as the expansion of short text documents are given by way of demonstration.
Sparse logistic principal components analysis for binary data
 The Annals of Applied Statistics
, 2010
"... ar ..."