Results 1 - 10
of
16
On smoothing and inference for topic models
- In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence
, 2009
"... Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the close connections between these approaches. We find that the main differences are attributable to the amount of smoothing applied to the counts. When the hyperparameters are optimized, the differences in performance among the algorithms diminish significantly. The ability of these algorithms to achieve solutions of comparable accuracy gives us the freedom to select computationally efficient approaches. Using the insights gained from this comparative study, we show how accurate topic models can be learned in several seconds on text corpora with thousands of documents. 1
Structured Sparse Principal Component Analysis
, 2009
"... We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes. This structured sparse PCA is based on a structured regularization recently introduced by [1]. While ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes. This structured sparse PCA is based on a structured regularization recently introduced by [1]. While classical sparse priors only deal with cardinality, the regularization we use encodes higher-order information about the data. We propose an efficient and simple optimization procedure to solve this problem. Experiments with two practical tasks, face recognition and the study of the dynamics of a protein complex, demonstrate the benefits of the proposed structured approach over unstructured approaches. 1
Relation Regularized Matrix Factorization
"... In many applications, the data, such as web pages and research papers, contain relation (link) structure among entities in addition to textual content information. Matrix factorization (MF) methods, such as latent semantic indexing (LSI), have been successfully used to map either content information ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
In many applications, the data, such as web pages and research papers, contain relation (link) structure among entities in addition to textual content information. Matrix factorization (MF) methods, such as latent semantic indexing (LSI), have been successfully used to map either content information or relation information into a lower-dimensional latent space for subsequent processing. However, how to simultaneously model both the relation information and the content information effectively with an MF framework is still an open research problem. In this paper, we propose a novel MF method called relation regularized matrix factorization (RRMF) for relational data analysis. By using relation information to regularize the content MF procedure, RRMF seamlessly integrates both the relation information and the content information into a principled framework. We propose a linear-time learning algorithm with convergence guarantee to learn the parameters of RRMF. Extensive experiments on real data sets show that RRMF can achieve state-of-the-art performance. 1
Large-scale Matrix Factorization with Distributed Stochastic Gradient Descent
- In KDD
, 2011
"... We provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of nonzero elements. Our approach rests on stochastic gradient descent (SGD), an iterative stochastic optimization algorithm. Based on a novel “stratified ” variant of SGD, we ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of nonzero elements. Our approach rests on stochastic gradient descent (SGD), an iterative stochastic optimization algorithm. Based on a novel “stratified ” variant of SGD, we obtain a new matrixfactorization algorithm, called DSGD, that can be fully distributed and run on web-scale datasets using, e.g., MapReduce. DSGD can handle a wide variety of matrix factorizations and has good scalability properties. 1
Regularized Latent Semantic Indexing
"... Topic modeling can boost the performance of information retrieval, but its real-world application is limited due to scalability issues. Scaling to larger document collections via parallelization is an active area of research, but most solutions require drastic steps such as vastly reducing input voc ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Topic modeling can boost the performance of information retrieval, but its real-world application is limited due to scalability issues. Scaling to larger document collections via parallelization is an active area of research, but most solutions require drastic steps such as vastly reducing input vocabulary. We introduce Regularized Latent Semantic Indexing (RLSI), a new method which is designed for parallelization. It is as effective as existing topic models, and scales to larger datasets without reducing input vocabulary. RLSI formalizes topic modeling as a problem of minimizing a quadratic loss function regularized by ℓ1 and/or ℓ2 norm. This formulation allows the learning process to be decomposed into multiple suboptimization problems which can be optimized in parallel, for example via MapReduce. We particularly propose adopting ℓ1 norm on topics and ℓ2 norm on document representations, to create a model with compact and readable topics and useful for retrieval. Relevance ranking experiments on three TREC datasets show that RLSI performs better than LSI, PLSI, and LDA, and the improvements are sometimes statistically significant. Experiments on a web dataset, containing about 1.6 million documents and 7 million terms, demonstrate a similar boost in performance on a larger corpus and vocabulary than in previous studies.
Extracting Features from Ratings: The Role of Factor Models
"... Abstract. Performing effective preference-based data retrieval requires detailed and preferentially meaningful structurized information about the current user as well as the items under consideration. A common problem is that representations of items often only consist of mere technical attributes, ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Performing effective preference-based data retrieval requires detailed and preferentially meaningful structurized information about the current user as well as the items under consideration. A common problem is that representations of items often only consist of mere technical attributes, which do not resemble human perception. This is particularly true for integral items such as movies or songs. It is often claimed that meaningful item features could be extracted from collaborative rating data, which is becoming available through social networking services. However, there is only anecdotal evidence supporting this claim; but if it is true, the extracted information could very valuable for preference-based data retrieval. In this paper, we propose a methodology to systematically check this common claim. We performed a preliminary investigation on a large collection of movie ratings and present initial evidence. 1
Rare Category Analysis
, 2010
"... In many real world problems, rare categories (minority classes) play an essential role despite of their extreme scarcity. For example, in financial fraud detection, the vast majority of the financial transactions are legitimate, and only a small number may be fraudulent; in Medicare fraud detection, ..."
Abstract
- Add to MetaCart
In many real world problems, rare categories (minority classes) play an essential role despite of their extreme scarcity. For example, in financial fraud detection, the vast majority of the financial transactions are legitimate, and only a small number may be fraudulent; in Medicare fraud detection, the percentage of bogus claims is small, but the total loss is significant; in network intrusion detection, malicious network activities are hidden among huge volumes of routine network traffic; in astronomy, only 0.001 % of the objects in sky survey images are truly beyond the scope of current science and may lead to new discoveries; in spam image detection, the near-duplicate spam images are difficult to discover from the large number of non-spam image; in rare disease diagnosis, the rare diseases affect less than 1 out of 2000 people, but the consequences can be very severe. Therefore, the discovery, characterization and prediction of rare categories or rare examples may protect us from fraudulent or malicious behaviors, provide the aid for scientific discoveries, and even save lives. This thesis focuses on rare category analysis, where the majority classes have a smooth distribution, and the minority classes exhibit a compactness property. Furthermore, we focus on the challenging cases where the support regions of the majority and minority classes overlap
CMU-ML-09-102 Generalized Learning Factors Analysis: Improving Cognitive Models with Machine Learning
, 2008
"... and the National Science Foundation (PSLC) under contract no. SBE-0354420. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government ..."
Abstract
- Add to MetaCart
and the National Science Foundation (PSLC) under contract no. SBE-0354420. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: cognitive models, intelligent tutoring systems, machine learning, educational data mining, learning factors, psychometrics, additive factor models, latent variable models, exponential principal component analysis, logistic regression, combinatorial search ii To my parents and to my wife iii CONTENTS Generalized Learning Factors Analysis: Improving Cognitive Models with Machine Learning........................................................................................................................ i
StructuredSparse Principal ComponentAnalysis
"... WepresentanextensionofsparsePCA,orsparse dictionary learning, where the sparsity patterns ofalldictionaryelementsarestructuredandconstrainedtobelongtoaprespecifiedsetofshapes. This structured sparse PCA is based on a structuredregularizationrecentlyintroducedbyJenatton et al. (2009). While classical ..."
Abstract
- Add to MetaCart
WepresentanextensionofsparsePCA,orsparse dictionary learning, where the sparsity patterns ofalldictionaryelementsarestructuredandconstrainedtobelongtoaprespecifiedsetofshapes. This structured sparse PCA is based on a structuredregularizationrecentlyintroducedbyJenatton et al. (2009). While classical sparse priors only deal with cardinality, the regularization we use encodes higher-order information about the data. We propose an efficient and simple optimization procedure to solve this problem. Experimentswithtwopracticaltasks,thedenoising ofsparsestructuredsignalsandfacerecognition, demonstrate the benefits of the proposed structuredapproach over unstructured approaches. 1
Graphical Models and Overlay Networks for Reasoning about Large Distributed Systems
, 2010
"... This thesis examines reasoning under uncertainty in distributed systems. Unlike in centralized systems, where the observations reside in a single location, the observations in distributed systems are often scattered across the network. To reason accurately, a networked device often needs to incorpo ..."
Abstract
- Add to MetaCart
This thesis examines reasoning under uncertainty in distributed systems. Unlike in centralized systems, where the observations reside in a single location, the observations in distributed systems are often scattered across the network. To reason accurately, a networked device often needs to incorporate observations from other nodes and must do so with limited computation and communication even for large problems. The reasoning is further complicated by unstable network conditions, characteristic to many real-world networks: the nodes may fail, communication links may become unreliable, and the entire network may get fragmented into several components that cannot communicate with each other. These aspects make distributed inference very challenging. We consider one general problem of distributed filtering for estimating the state of a dynamical system and three independent applications: simultaneous localization and tracking, where a camera network localizes itself by observing a moving object, internal localization of large-scale modular robots, where a robot determines the relative poses of its internal parts,

