Results 1 - 10
of
76
A Spectral Algorithm for Latent Dirichlet Allocation
"... Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating th ..."
Abstract
-
Cited by 49 (11 self)
- Add to MetaCart
(Show Context)
Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of topic models, including Latent Dirichlet Allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, called Excess Correlation Analysis, is based on a spectral decomposition of low-order moments via two singular value decompositions (SVDs). Moreover, the algorithm is scalable, since the SVDs are carried out only on k × k matrices, where k is the number of latent factors (topics) and is typically much smaller than the dimension of the observation (word) space. 1
Clustering with Multiple Graphs
"... Abstract—In graph-based learning models, entities are often represented as vertices in an undirected graph with weighted edges describing the relationships between entities. In many real-world applications, however, entities are often associated with relations of different types and/or from differen ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
(Show Context)
Abstract—In graph-based learning models, entities are often represented as vertices in an undirected graph with weighted edges describing the relationships between entities. In many real-world applications, however, entities are often associated with relations of different types and/or from different sources, which can be well captured by multiple undirected graphs over the same set of vertices. How to exploit such multiple sources of information to make better inferences on entities remains an interesting open problem. In this paper, we focus on the problem of clustering the vertices based on multiple graphs in both unsupervised and semi-supervised settings. As one of our contributions, we propose Linked Matrix Factorization (LMF) as a novel way of fusing information from multiple graph sources. In LMF, each graph is approximated by matrix factorization with a graph-specific factor and a factor common to all graphs, where the common factor provides features for all vertices. Experiments on SIAM journal data show that (1) we can improve the clustering accuracy through fusing multiple sources of information with several models, and (2) LMF yields superior or competitive results compared to other graph-based clustering methods. Keywords-clustering; multiple sources; graph; semisupervised learning I.
Co-regularized Multi-view Spectral Clustering
"... In many clustering problems, we have access to multiple views of the data each of which could be individually used for clustering. Exploiting information from multiple views, one can hope to find a clustering that is more accurate than the ones obtained using the individual views. Often these differ ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
(Show Context)
In many clustering problems, we have access to multiple views of the data each of which could be individually used for clustering. Exploiting information from multiple views, one can hope to find a clustering that is more accurate than the ones obtained using the individual views. Often these different views admit same underlying clustering of the data, so we can approach this problem by looking for clusterings that are consistent across the views, i.e., corresponding data points in each view should have same cluster membership. We propose a spectral clustering framework that achieves this goal by co-regularizing the clustering hypotheses, and propose two co-regularization schemes to accomplish this. Experimental comparisons with a number of baselines on two synthetic and three real-world datasets establish the efficacy of our proposed approaches. 1
Predictive subspace learning for multi-view data: a large margin approach
- In NIPS
, 2010
"... Learning from multi-view data is important in many applications, such as image classification and annotation. In this paper, we present a large-margin learning framework to discover a predictive latent subspace representation shared by mul-tiple views. Our approach is based on an undirected latent s ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
(Show Context)
Learning from multi-view data is important in many applications, such as image classification and annotation. In this paper, we present a large-margin learning framework to discover a predictive latent subspace representation shared by mul-tiple views. Our approach is based on an undirected latent space Markov network that fulfills a weak conditional independence assumption that multi-view observa-tions and response variables are independent given a set of latent variables. We provide efficient inference and parameter estimation methods for the latent sub-space model. Finally, we demonstrate the advantages of large-margin learning on real video and web image data for discovering predictive latent representations and improving the performance on image classification, annotation and retrieval. 1
Uncovering Groups via Heterogeneous Interaction Analysis
"... Abstract—With the pervasive availability of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment shared content (bookmark, photos, videos), and users can ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
(Show Context)
Abstract—With the pervasive availability of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment shared content (bookmark, photos, videos), and users can tag their own favorite content. Users can also connect to each other, and subscribe to or become a fan or a follower of others. These diverse individual activities result in a multi-dimensional network among actors, forming cross-dimension group structures with group members sharing certain similarities. It is challenging to effectively integrate the network information of multiple dimensions in order to discover cross-dimension group structures. In this work, we propose a two-phase strategy to identify the hidden structures shared across dimensions in multi-dimensional networks. We extract structural features from each dimension of the network via modularity analysis, and then integrate them all to find out a robust community structure among actors. Experiments on synthetic and realworld data validate the superiority of our strategy, enabling the analysis of collective behavior underneath diverse individual activities in a large scale.
Deep Canonical Correlation Analysis
"... We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated. Parameters of both transformations are jointly learned to maximize the (regularized) total correla ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
(Show Context)
We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated. Parameters of both transformations are jointly learned to maximize the (regularized) total correlation. It can be viewed as a nonlinear extension of the linear method canonical correlation analysis (CCA). It is an alternative to the nonparametric method kernel canonical correlation analysis (KCCA) for learning correlated nonlinear transformations. Unlike KCCA, DCCA does not require an inner product, and has the advantages of a parametric method: training time scales well with data size and the training data need not be referenced when computing the representations of unseen instances. In experiments on two real-world datasets, we find that DCCA learns representations with significantly higher correlation than those learned by CCA and KCCA. We also introduce a novel non-saturating sigmoid function based on the cube root that may be useful more generally in feedforward neural networks.
Stochastic Optimization for PCA and PLS
"... Abstract—We study PCA, PLS, and CCA as stochastic optimization problems, of optimizing a population objective based on a sample. We suggest several stochastic approximation (SA) methods for PCA and PLS, and investigate their empirical performance. ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
(Show Context)
Abstract—We study PCA, PLS, and CCA as stochastic optimization problems, of optimizing a population objective based on a sample. We suggest several stochastic approximation (SA) methods for PCA and PLS, and investigate their empirical performance.
Evolving Signal Processing for Brain–Computer Interfaces
, 2012
"... This paper discusses the challenges associated with building robust and useful BCI models from accumulated biological knowledge and data, and the technical problems associated with incorporating multimodal physiological, behavioral, and contextual data that may become ubiquitous in the future. ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
This paper discusses the challenges associated with building robust and useful BCI models from accumulated biological knowledge and data, and the technical problems associated with incorporating multimodal physiological, behavioral, and contextual data that may become ubiquitous in the future.
Multi-View Clustering via Joint Nonnegative Matrix Factorization
"... Many real-world datasets are comprised of different representations or views which often provide information complementary to each other. To integrate information from multiple views in the unsupervised setting, multiview clustering algorithms have been developed to cluster multiple views simultaneo ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
(Show Context)
Many real-world datasets are comprised of different representations or views which often provide information complementary to each other. To integrate information from multiple views in the unsupervised setting, multiview clustering algorithms have been developed to cluster multiple views simultaneously to derive a solution which uncovers the common latent structure shared by multiple views. In this paper, we propose a novel NMFbased multi-view clustering algorithm by searching for a factorization that gives compatible clustering solutions across multiple views. The key idea is to formulate a joint matrix factorization process with the constraint that pushes clustering solution of each view towards a common consensus instead of fixing it directly. The main challenge is how to keep clustering solutions across different views meaningful and comparable. To tackle this challenge, we design a novel and effective normalization strategy inspired by the connection between NMF and PLSA. Experimental results on synthetic and several real datasets demonstrate the effectiveness of our approach. 1
Multi-view learning of acoustic features for speaker recognition
- in ASRU
, 2009
"... Abstract—We consider learning acoustic feature transformations using an additional view of the data, in this case video of the speaker’s face. Specifically, we consider a scenario in which clean audio and video is available at training time, while at test time only noisy audio is available. We use c ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
(Show Context)
Abstract—We consider learning acoustic feature transformations using an additional view of the data, in this case video of the speaker’s face. Specifically, we consider a scenario in which clean audio and video is available at training time, while at test time only noisy audio is available. We use canonical correlation analysis (CCA) to learn linear projections of the acoustic observations that have maximum correlation with the video frames. We provide an initial demonstration of the approach on a speaker recognition task using data from the VidTIMIT corpus. The projected features, in combination with baseline MFCCs, outperform the baseline recognizer in noisy conditions. The techniques we present are quite general, although here we apply them to the case of a specific speaker recognition task. This is the first work of which we are aware in which multiple views are used to learn an acoustic feature projection at training time, while using only the acoustics at test time. I.