Results 1 - 10
of
12
Relational Learning via Collective Matrix Factorization
, 2008
"... Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations would then encode ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations would then encode users ’ ratings of movies, movies ’ genres, and actors ’ roles in movies. A common prediction technique given one pairwise relation, for example a #users × #movies ratings matrix, is low-rank matrix factorization. In domains with multiple relations, represented as multiple matrices, we may improve predictive accuracy by exploiting information from one relation while predicting another. To this end, we propose a collective matrix factorization model: we simultaneously factor several matrices, sharing parameters among factors when an entity participates in multiple relations. Each relation can have a different value type and error distribution; so, we allow nonlinear relationships between the parameters and outputs, using Bregman divergences to measure error. We extend standard alternating projection algorithms to our model, and derive an efficient Newton update for the projection. Furthermore, we propose stochastic optimization methods to deal with large, sparse matrices. Our model generalizes several existing matrix factorization methods, and therefore yields new large-scale optimization algorithms for these problems. Our model can handle any pairwise relational schema and a
A Unified View of Matrix Factorization Models
"... Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, E-PCA, MMMF, pLSI, pLSI-pHITS, Bregman co-clustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as m ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, E-PCA, MMMF, pLSI, pLSI-pHITS, Bregman co-clustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as minimizing a generalized Bregman divergence, and we show that (i) a straightforward alternating projection algorithm can be applied to almost any model in our unified view; (ii) the Hessian for each projection has special structure that makes a Newton projection feasible, even when there are equality constraints on the factors, which allows for matrix co-clustering; and (iii) alternating projections can be generalized to simultaneously factor a set of matrices that share dimensions. These observations immediately yield new optimization algorithms for the above factorization methods, and suggest novel generalizations of these methods such as incorporating row and column biases, and adding or relaxing clustering constraints. 1
Machine learning classifiers and fmri: A tutorial overview
- NeuroImage
, 2009
"... Interpreting brain image experiments requires analysis of complex, multivariate data. In recent years, one analysis approach that has grown in popularity is the use of machine learning algorithms to train classifiers to decode stimuli, mental states, behaviors and other variables of interest from fM ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Interpreting brain image experiments requires analysis of complex, multivariate data. In recent years, one analysis approach that has grown in popularity is the use of machine learning algorithms to train classifiers to decode stimuli, mental states, behaviors and other variables of interest from fMRI data and thereby show the data contain enough information about them. In this tutorial overview we review some of the key choices faced in using this approach as well as how to derive statistically significant results, illustrating each point from a case study. Furthermore, we show how, in addition to answering the question of ‘is there information about a variable of interest ’ (pattern discrimination), classifiers can be used to tackle other classes of question, namely ‘where is the information ’ (pattern localization) and ‘how is that information encoded ’ (pattern characterization). 1
Closed-Form Supervised Dimensionality Reduction with Generalized Linear Models (Technical Report
- IBM T.J. Watson Research Center
, 2008
"... We propose a family of supervised dimensionality reduction (SDR) algorithms that combine feature extraction (dimensionality reduction) with learning a predictive model in a unified optimization framework, using data- and class-appropriate generalized linear models (GLMs), and handling both classific ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We propose a family of supervised dimensionality reduction (SDR) algorithms that combine feature extraction (dimensionality reduction) with learning a predictive model in a unified optimization framework, using data- and class-appropriate generalized linear models (GLMs), and handling both classification and regression problems. Our approach uses simple closed-form update rules and is provably convergent. Promising empirical results are demonstrated on a variety of high-dimensional datasets. 1.
Steganalysis in high dimensions: Fusing classifiers built on random subspaces
"... By working with high-dimensional representations of covers, modern steganographic methods are capable of preserving a large number of complex dependencies among individual cover elements and thus avoid detection using current best steganalyzers. Inevitably, steganalysis needs to start using high-dim ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
By working with high-dimensional representations of covers, modern steganographic methods are capable of preserving a large number of complex dependencies among individual cover elements and thus avoid detection using current best steganalyzers. Inevitably, steganalysis needs to start using high-dimensional feature sets as well. This brings two key problems – construction of good high-dimensional features and machine learning that scales well with respect to dimensionality. Depending on the classifier, high dimensionality may lead to problems with the lack of training data, infeasibly high complexity of training, degradation of generalization abilities, lack of robustness to cover source, and saturation of performance below its potential. To address these problems collectively known as the curse of dimensionality, we propose ensemble classifiers as an alternative to the much more complex support vector machines. Based on the character of the media being analyzed, the steganalyst first puts together a high-dimensional set of diverse “prefeatures ” selected to capture dependencies among individual cover elements. Then, a family of weak classifiers is built on random subspaces of the prefeature space. The final classifier is constructed by fusing the decisions of individual classifiers. The advantage of this approach is its universality, low complexity, simplicity, and improved performance when compared to classifiers trained on the entire prefeature set. Experiments with the steganographic algorithms nsF5 and HUGO demonstrate the usefulness of this approach over current state of the art. 1.
Beyond Brain Blobs: Machine Learning Classifiers as Instruments for Analyzing Functional Magnetic Resonance Imaging Data
, 1998
"... Vector Decomposition MachineEsta tese é dedicada aos meus pais Paula e José, avós Clementina e Sidónio e à minha irmã Mariana, por terem sempre confiado em mim de todas as formas possiveis, mesmo The thesis put forth in this dissertation is that machine learning classifiers can be used as instrument ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Vector Decomposition MachineEsta tese é dedicada aos meus pais Paula e José, avós Clementina e Sidónio e à minha irmã Mariana, por terem sempre confiado em mim de todas as formas possiveis, mesmo The thesis put forth in this dissertation is that machine learning classifiers can be used as instruments for decoding variables of interest from functional magnetic resonance imaging (fMRI) data. There are two main goals in decoding: • Showing that the variable of interest can be predicted from the data in a statistically reliable manner (i.e. there’s enough information present). • Shedding light on how the data encode the information needed to predict, taking into account what the classifier used can learn and any criteria by which the data are filtered (e.g. how voxels and time points used are chosen). Chapter 2 considers the issues that arise when using traditional linear classifiers and several different voxel selection techniques to strive towards these
A Bayesian Matrix Factorization Model for Relational Data
"... Relational learning can be used to augment one data source with other correlated sources of information, to improve predictive accuracy. We frame a large class of relational learning problems as matrix factorization problems, and propose a hierarchical Bayesian model. Training our Bayesian model usi ..."
Abstract
- Add to MetaCart
Relational learning can be used to augment one data source with other correlated sources of information, to improve predictive accuracy. We frame a large class of relational learning problems as matrix factorization problems, and propose a hierarchical Bayesian model. Training our Bayesian model using random-walk Metropolis-Hastings is impractically slow, and so we develop a block Metropolis-Hastingssamplerwhichusesthegradientand Hessian of the likelihood to dynamically tune the proposal. We demonstrate that a predictive model of brain response to stimuli can be improved by augmenting it with side information about the stimuli. 1
To appear in:
, 2009
"... projection kernels, Neurocomputing, doi:10.1016/j.neucom.2009.11.043 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and rev ..."
Abstract
- Add to MetaCart
projection kernels, Neurocomputing, doi:10.1016/j.neucom.2009.11.043 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Supervised Learning of Local Projection Kernels
Marie-Laure Paillere-Martinot
"... Schizophrenia is a complex psychiatric disorder that has eluded a characterization in terms of local abnormalities of brain activity, and is hypothesized to affect the collective, “emergent ” working of the brain. We propose a novel data-driven approach to capture emergent features using functional ..."
Abstract
- Add to MetaCart
Schizophrenia is a complex psychiatric disorder that has eluded a characterization in terms of local abnormalities of brain activity, and is hypothesized to affect the collective, “emergent ” working of the brain. We propose a novel data-driven approach to capture emergent features using functional brain networks [4] extracted from fMRI data, and demonstrate its advantage over traditional region-of-interest (ROI) and local, task-specific linear activation analyzes. Our results suggest that schizophrenia is indeed associated with disruption of global brain properties related to its functioning as a network, which cannot be explained by alteration of local activation patterns. Moreover, further exploitation of interactions by sparse Markov Random Field classifiers shows clear gain over linear methods, such as Gaussian Naive Bayes and SVM, allowing to reach 86 % accuracy (over 50 % baseline- random guess), which is quite remarkable given that it is based on a single fMRI experiment using a simple auditory task. 1

