Results 1  10
of
65
Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis
 Journal of Machine Learning Research
, 2007
"... Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in highdimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a ..."
Abstract

Cited by 124 (12 self)
 Add to MetaCart
(Show Context)
Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in highdimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a class are multimodal. An unsupervised dimensionality reduction method called localitypreserving projection (LPP) can work well with multimodal data due to its locality preserving property. However, since LPP does not take the label information into account, it is not necessarily useful in supervised learning scenarios. In this paper, we propose a new linear supervised dimensionality reduction method called local Fisher discriminant analysis (LFDA), which effectively combines the ideas of FDA and LPP. LFDA has an analytic form of the embedding transformation and the solution can be easily computed just by solving a generalized eigenvalue problem. We demonstrate the practical usefulness and high scalability of the LFDA method in data visualization and classification tasks through extensive simulation studies. We also show that LFDA can be extended to nonlinear dimensionality reduction scenarios by applying the kernel trick.
Nonparametric Latent Feature Models for Link Prediction
"... As the availability and importance of relational data—such as the friendships summarized on a social networking website—increases, it becomes increasingly important to have good models for such data. The kinds of latent structure that have been considered for use in predicting links in such networks ..."
Abstract

Cited by 106 (1 self)
 Add to MetaCart
(Show Context)
As the availability and importance of relational data—such as the friendships summarized on a social networking website—increases, it becomes increasingly important to have good models for such data. The kinds of latent structure that have been considered for use in predicting links in such networks have been relatively limited. In particular, the machine learning community has focused on latent class models, adapting Bayesian nonparametric methods to jointly infer how many latent classes there are while learning which entities belong to each class. We pursue a similar approach with a richer kind of latent variable—latent features—using a Bayesian nonparametric approach to simultaneously infer the number of features at the same time we learn which entities have each feature. Our model combines these inferred features with known covariates in order to perform link prediction. We demonstrate that the greater expressiveness of this approach allows us to improve performance on three datasets. 1
Community Evolution in Dynamic MultiMode Networks
 KDD'08
, 2008
"... A multimode network typically consists of multiple heterogeneous social actors among which various types of interactions could occur. Identifying communities in a multimode network can help understand the structural properties of the network, address the data shortage and unbalanced problems, and ..."
Abstract

Cited by 64 (14 self)
 Add to MetaCart
A multimode network typically consists of multiple heterogeneous social actors among which various types of interactions could occur. Identifying communities in a multimode network can help understand the structural properties of the network, address the data shortage and unbalanced problems, and assist tasks like targeted marketing and finding influential actors within or between groups. In general, a network and the membership of groups often evolve gradually. In a dynamic multimode network, both actor membership and interactions can evolve, which poses a challenging problem of identifying community evolution. In this work, we try to address this issue by employing the temporal information to analyze a multimode network. A spectral framework and its scalability issue are carefully studied. Experiments on both synthetic data and realworld large scale networks demonstrate the efficacy of our algorithm and suggest its generality in solving problems with complex relationships.
Clustering documents with an exponentialfamily approximation of the dirichlet compound multinomial distribution
 In ICML
, 2006
"... The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact that if a word occurs once in a document, it is likely to occur repeatedly. We derive a new family of distributions that ..."
Abstract

Cited by 53 (2 self)
 Add to MetaCart
(Show Context)
The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact that if a word occurs once in a document, it is likely to occur repeatedly. We derive a new family of distributions that are approximations to DCM distributions and constitute an exponential family, unlike DCM distributions. We use these socalled EDCM distributions to obtain insights into the properties of DCM distributions, and then derive an algorithm for EDCM maximumlikelihood training that is many times faster than the corresponding method for DCM distributions. Next, we investigate expectationmaximization with EDCM components and deterministic annealing as a new clustering algorithm for documents. Experiments show that the new algorithm is competitive with the best methods in the literature, and superior from the point of view of finding models with low perplexity. 1.
An Infinite Latent Attribute Model for Network Data
 In Proceedings of the International Conference on Machine Learning (ICML
, 2012
"... Latent variable models for network data extract a summary of the relational structure underlying an observed network. The simplest possible models subdivide nodes of the network into clusters; the probability of a link between any two nodes then depends only on their cluster assignment. Currently av ..."
Abstract

Cited by 27 (7 self)
 Add to MetaCart
(Show Context)
Latent variable models for network data extract a summary of the relational structure underlying an observed network. The simplest possible models subdivide nodes of the network into clusters; the probability of a link between any two nodes then depends only on their cluster assignment. Currently available models can be classified by whether clusters are disjoint or are allowed to overlap. These models can explain a “flat ” clustering structure. Hierarchical Bayesian models provide a natural approach to capture more complex dependencies. We propose a model in which objects are characterised by a latent feature vector. Each feature is itself partitioned into disjoint groups (subclusters), corresponding to a second layer of hierarchy. In experimental comparisons, the model achieves significantly improved predictive performance on social and biological link prediction tasks. The results indicate that models with a single layer hierarchy oversimplify real networks. 1.
Accounting for Burstiness in Topic Models
"... Many different topic models have been used successfully for a variety of applications. However, even stateoftheart topic models suffer from the important flaw that they do not capture the tendency of words to appear in bursts; it is a fundamental property of language that if a word is used once i ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
Many different topic models have been used successfully for a variety of applications. However, even stateoftheart topic models suffer from the important flaw that they do not capture the tendency of words to appear in bursts; it is a fundamental property of language that if a word is used once in a document, it is more likely to be used again. We introduce a topic model that uses Dirichlet compound multinomial (DCM) distributions to model this burstiness phenomenon. On both text and nontext datasets, the new model achieves better heldout likelihood than standard latent Dirichlet allocation (LDA). It is straightforward to incorporate the DCM extension into topic models that are more complex than LDA. 1.
Semisupervised Penalized Output Kernel Regression for Link Prediction
"... Link prediction is addressed as an output kernel learning task through semisupervised Output Kernel Regression. Working in the framework of RKHS theory with vectorvalued functions, we establish a new representer theorem devoted to semisupervised least square regression. We then apply it to get a n ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
Link prediction is addressed as an output kernel learning task through semisupervised Output Kernel Regression. Working in the framework of RKHS theory with vectorvalued functions, we establish a new representer theorem devoted to semisupervised least square regression. We then apply it to get a new model (POKR: Penalized Output Kernel Regression) and show its relevance using numerical experiments on artificial networks and two real applications using a very low percentage of labeled data in a transductive setting. 1.
Learning a Parametric Embedding by Preserving Local Structure
, 2009
"... The paper presents a new unsupervised dimensionality reduction technique, called parametric tSNE, that learns a parametric mapping between the highdimensional data space and the lowdimensional latent space. Parametric tSNE learns the parametric mapping in such a way that the local structure of t ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
The paper presents a new unsupervised dimensionality reduction technique, called parametric tSNE, that learns a parametric mapping between the highdimensional data space and the lowdimensional latent space. Parametric tSNE learns the parametric mapping in such a way that the local structure of the data is preserved as well as possible in the latent space. We evaluate the performance of parametric tSNE in experiments on three datasets, in which we compare it to the performance of two other unsupervised parametric dimensionality reduction techniques. The results of experiments illustrate the strong performance of parametric tSNE, in particular, in learning settings in which the dimensionality of the latent space is relatively low.
LowRank Tensors for Scoring Dependency Structures
"... Accurate scoring of syntactic structures such as headmodifier arcs in dependency parsing typically requires rich, highdimensional feature representations. A small subset of such features is often selected manually. This is problematic when features lack clear linguistic meaning as in embeddings o ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
Accurate scoring of syntactic structures such as headmodifier arcs in dependency parsing typically requires rich, highdimensional feature representations. A small subset of such features is often selected manually. This is problematic when features lack clear linguistic meaning as in embeddings or when the information is blended across features. In this paper, we use tensors to map highdimensional feature vectors into low dimensional representations. We explicitly maintain the parameters as a lowrank tensor to obtain low dimensional representations of words in their syntactic roles, and to leverage modularity in the tensor for easy training with online algorithms. Our parser consistently outperforms the Turbo and MST parsers across 14 different languages. We also obtain the best published UAS results on 5 languages.1 1