Results 1  10
of
14
Relational Learning via Collective Matrix Factorization
, 2008
"... Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations would then encode ..."
Abstract

Cited by 60 (3 self)
 Add to MetaCart
Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations would then encode users ’ ratings of movies, movies ’ genres, and actors ’ roles in movies. A common prediction technique given one pairwise relation, for example a #users × #movies ratings matrix, is lowrank matrix factorization. In domains with multiple relations, represented as multiple matrices, we may improve predictive accuracy by exploiting information from one relation while predicting another. To this end, we propose a collective matrix factorization model: we simultaneously factor several matrices, sharing parameters among factors when an entity participates in multiple relations. Each relation can have a different value type and error distribution; so, we allow nonlinear relationships between the parameters and outputs, using Bregman divergences to measure error. We extend standard alternating projection algorithms to our model, and derive an efficient Newton update for the projection. Furthermore, we propose stochastic optimization methods to deal with large, sparse matrices. Our model generalizes several existing matrix factorization methods, and therefore yields new largescale optimization algorithms for these problems. Our model can handle any pairwise relational schema and a
Machine learning classifiers and fmri: A tutorial overview
 NeuroImage
, 2009
"... Interpreting brain image experiments requires analysis of complex, multivariate data. In recent years, one analysis approach that has grown in popularity is the use of machine learning algorithms to train classifiers to decode stimuli, mental states, behaviors and other variables of interest from fM ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
Interpreting brain image experiments requires analysis of complex, multivariate data. In recent years, one analysis approach that has grown in popularity is the use of machine learning algorithms to train classifiers to decode stimuli, mental states, behaviors and other variables of interest from fMRI data and thereby show the data contain enough information about them. In this tutorial overview we review some of the key choices faced in using this approach as well as how to derive statistically significant results, illustrating each point from a case study. Furthermore, we show how, in addition to answering the question of ‘is there information about a variable of interest ’ (pattern discrimination), classifiers can be used to tackle other classes of question, namely ‘where is the information ’ (pattern localization) and ‘how is that information encoded ’ (pattern characterization). 1
A Unified View of Matrix Factorization Models
"... Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, EPCA, MMMF, pLSI, pLSIpHITS, Bregman coclustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as m ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, EPCA, MMMF, pLSI, pLSIpHITS, Bregman coclustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as minimizing a generalized Bregman divergence, and we show that (i) a straightforward alternating projection algorithm can be applied to almost any model in our unified view; (ii) the Hessian for each projection has special structure that makes a Newton projection feasible, even when there are equality constraints on the factors, which allows for matrix coclustering; and (iii) alternating projections can be generalized to simultaneously factor a set of matrices that share dimensions. These observations immediately yield new optimization algorithms for the above factorization methods, and suggest novel generalizations of these methods such as incorporating row and column biases, and adding or relaxing clustering constraints. 1
ClosedForm Supervised Dimensionality Reduction with Generalized Linear Models (Technical Report
 IBM T.J. Watson Research Center
, 2008
"... We propose a family of supervised dimensionality reduction (SDR) algorithms that combine feature extraction (dimensionality reduction) with learning a predictive model in a unified optimization framework, using data and classappropriate generalized linear models (GLMs), and handling both classific ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
We propose a family of supervised dimensionality reduction (SDR) algorithms that combine feature extraction (dimensionality reduction) with learning a predictive model in a unified optimization framework, using data and classappropriate generalized linear models (GLMs), and handling both classification and regression problems. Our approach uses simple closedform update rules and is provably convergent. Promising empirical results are demonstrated on a variety of highdimensional datasets. 1.
Steganalysis in high dimensions: Fusing classifiers built on random subspaces
"... By working with highdimensional representations of covers, modern steganographic methods are capable of preserving a large number of complex dependencies among individual cover elements and thus avoid detection using current best steganalyzers. Inevitably, steganalysis needs to start using highdim ..."
Abstract

Cited by 9 (8 self)
 Add to MetaCart
By working with highdimensional representations of covers, modern steganographic methods are capable of preserving a large number of complex dependencies among individual cover elements and thus avoid detection using current best steganalyzers. Inevitably, steganalysis needs to start using highdimensional feature sets as well. This brings two key problems – construction of good highdimensional features and machine learning that scales well with respect to dimensionality. Depending on the classifier, high dimensionality may lead to problems with the lack of training data, infeasibly high complexity of training, degradation of generalization abilities, lack of robustness to cover source, and saturation of performance below its potential. To address these problems collectively known as the curse of dimensionality, we propose ensemble classifiers as an alternative to the much more complex support vector machines. Based on the character of the media being analyzed, the steganalyst first puts together a highdimensional set of diverse “prefeatures ” selected to capture dependencies among individual cover elements. Then, a family of weak classifiers is built on random subspaces of the prefeature space. The final classifier is constructed by fusing the decisions of individual classifiers. The advantage of this approach is its universality, low complexity, simplicity, and improved performance when compared to classifiers trained on the entire prefeature set. Experiments with the steganographic algorithms nsF5 and HUGO demonstrate the usefulness of this approach over current state of the art. 1.
Beyond Brain Blobs: Machine Learning Classifiers as Instruments for Analyzing Functional Magnetic Resonance Imaging Data
, 1998
"... Vector Decomposition MachineEsta tese é dedicada aos meus pais Paula e José, avós Clementina e Sidónio e à minha irmã Mariana, por terem sempre confiado em mim de todas as formas possiveis, mesmo The thesis put forth in this dissertation is that machine learning classifiers can be used as instrument ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Vector Decomposition MachineEsta tese é dedicada aos meus pais Paula e José, avós Clementina e Sidónio e à minha irmã Mariana, por terem sempre confiado em mim de todas as formas possiveis, mesmo The thesis put forth in this dissertation is that machine learning classifiers can be used as instruments for decoding variables of interest from functional magnetic resonance imaging (fMRI) data. There are two main goals in decoding: • Showing that the variable of interest can be predicted from the data in a statistically reliable manner (i.e. there’s enough information present). • Shedding light on how the data encode the information needed to predict, taking into account what the classifier used can learn and any criteria by which the data are filtered (e.g. how voxels and time points used are chosen). Chapter 2 considers the issues that arise when using traditional linear classifiers and several different voxel selection techniques to strive towards these
A Bayesian Matrix Factorization Model for Relational Data
"... Relational learning can be used to augment one data source with other correlated sources of information, to improve predictive accuracy. We frame a large class of relational learning problems as matrix factorization problems, and propose a hierarchical Bayesian model. Training our Bayesian model usi ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Relational learning can be used to augment one data source with other correlated sources of information, to improve predictive accuracy. We frame a large class of relational learning problems as matrix factorization problems, and propose a hierarchical Bayesian model. Training our Bayesian model using randomwalk MetropolisHastings is impractically slow, and so we develop a block MetropolisHastingssamplerwhichusesthegradientand Hessian of the likelihood to dynamically tune the proposal. We demonstrate that a predictive model of brain response to stimuli can be improved by augmenting it with side information about the stimuli. 1
Convex Sparse Coding, Subspace Learning, and SemiSupervised Extensions
"... Automated feature discovery is a fundamental problem in machine learning. Although classical feature discovery methods do not guarantee optimal solutions in general, it has been recently noted that certain subspace learning and sparse coding problems can be solved efficiently, provided the number of ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Automated feature discovery is a fundamental problem in machine learning. Although classical feature discovery methods do not guarantee optimal solutions in general, it has been recently noted that certain subspace learning and sparse coding problems can be solved efficiently, provided the number of features is not restricted a priori. We provide an extended characterization of this optimality result and describe the nature of the solutions under an expanded set of practical contexts. In particular, we apply the framework to a semisupervised learning problem, and demonstrate that feature discovery can cooccur with input reconstruction and supervised training while still admitting globally optimal solutions. A comparison to existing semisupervised feature discovery methods shows improved generalization and efficiency.
Linear Dimensionality Reduction for MarginBased Classification: HighDimensional Data and Sensor Networks
"... Abstract—Lowdimensional statistics of measurements play an important role in detection problems, including those encountered in sensor networks. In this work, we focus on learning lowdimensional linear statistics of highdimensional measurement data along with decision rules defined in the lowdim ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract—Lowdimensional statistics of measurements play an important role in detection problems, including those encountered in sensor networks. In this work, we focus on learning lowdimensional linear statistics of highdimensional measurement data along with decision rules defined in the lowdimensional space in the case when the probability density of the measurements and class labels is not given, but a training set of samples from this distribution is given. We pose a joint optimization problem for linear dimensionality reduction and marginbased classification, and develop a coordinate descent algorithm on the Stiefel manifold for its solution. Although the coordinate descent is not guaranteed to find the globally optimal solution, crucially, its alternating structure enables us to extend it for sensor networks with a messagepassing approach requiring little communication. Linear dimensionality reduction prevents overfitting when learning from finite training data. In the sensor network setting, dimensionality reduction not only prevents overfitting, but also reduces power consumption due to communication. The learned reduceddimensional space and decision rule is shown to be consistent and its Rademacher complexity is characterized. Experimental results are presented for a variety of datasets, including those from existing sensor networks, demonstrating the potential of our methodology in comparison with other dimensionality reduction approaches. Index Terms—Linear dimensionality reduction, sensor networks, Stiefel manifold, supervised classification. I.