Results 1 -
7 of
7
Empirical Study of Topic Modeling in Twitter
- PROCEEDINGS OF THE SIGKDD WORKSHOP ON SOCIAL MEDIA ANALYTICS (SOMA)
, 2010
"... Social networks such as Facebook, LinkedIn, and Twitter have been a crucial source of information for a wide spectrum of users. In Twitter, popular information that is deemed important by the community propagates through the network. Studying the characteristics of content in the messages becomes im ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Social networks such as Facebook, LinkedIn, and Twitter have been a crucial source of information for a wide spectrum of users. In Twitter, popular information that is deemed important by the community propagates through the network. Studying the characteristics of content in the messages becomes important for a number of tasks, such as breaking news detection, personalized message recommendation, friends recommendation, sentiment analysis and others. While many researchers wish to use standard text mining tools to understand messages on Twitter, the restricted length of those messages prevents them from being employed to their full potential. We address the problem of using standard topic models in microblogging environments by studying how the models can be trained on the dataset. We propose several schemes to train a standard topic model and compare their quality and effectiveness through a set of carefully designed experiments from both qualitative and quantitative perspectives. We show that by training a topic model on aggregated messages we can obtain a higher quality of learned model which results in significantly better performance in two realworld classification problems. We also discuss how the state-ofthe-art Author-Topic model fails to model hierarchical relationships between entities in Social Media.
HCDF: A Hybrid Community Discovery Framework
"... We introduce a novel Bayesian framework for hybrid community discovery in graphs. Our framework, HCDF (short for Hybrid Community Discovery Framework), can effectively incorporate hints from a number of other community detection algorithms and produce results that outperform the constituent parts. W ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We introduce a novel Bayesian framework for hybrid community discovery in graphs. Our framework, HCDF (short for Hybrid Community Discovery Framework), can effectively incorporate hints from a number of other community detection algorithms and produce results that outperform the constituent parts. We describe two HCDF-based approaches which are: (1) effective, in terms of link prediction performance and robustness to small perturbations in network structure; (2) consistent, in terms of effectiveness across various application domains; (3) scalable to very large graphs; and (4) nonparametric. Our extensive evaluation on a collection of diverse and large real-world graphs, with millions of links, show that our HCDF-based approaches (a) achieve up to 0.22 improvement in link prediction performance as measured by area under ROC curve (AUC), (b) never have an AUC that drops below 0.91 in the worst case, and (c) find communities that are robust to small perturbations of the network structure as defined by Variation of Information (an entropybased distance metric). 1
HSN-PAM: Finding Hierarchical Probabilistic Groups from Large-Scale Networks
"... Real-world social networks are often hierarchical, reflecting the fact that some communities are composed of a few smaller, sub-communities. This paper describes a hierarchical Bayesian model based scheme, namely HSN-PAM (Hierarchical Social Network-Pachinko Allocation Model), for discovering probab ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Real-world social networks are often hierarchical, reflecting the fact that some communities are composed of a few smaller, sub-communities. This paper describes a hierarchical Bayesian model based scheme, namely HSN-PAM (Hierarchical Social Network-Pachinko Allocation Model), for discovering probabilistic, hierarchical communities in social networks. This scheme is powered by a previously developed hierarchical Bayesian model. In this scheme, communities are classified into two categories: super-communities and regular-communities. Two different network encoding approaches are explored to evaluate this scheme on research collaborative networks, including CiteSeer and NanoSCI. The experimental results demonstrate that HSN-PAM is effective for discovering hierarchical community structures in large-scale social networks. 1
Information Theoretic Criteria for Community Detection
"... Abstract. Many algorithms for finding community structure in graphs search for a partition that maximizes modularity. However, recent work has identified two important limitations of modularity as a community qualitycriterion:aresolutionlimit;andabiastowardsfindingequal-sized communities. Informatio ..."
Abstract
- Add to MetaCart
Abstract. Many algorithms for finding community structure in graphs search for a partition that maximizes modularity. However, recent work has identified two important limitations of modularity as a community qualitycriterion:aresolutionlimit;andabiastowardsfindingequal-sized communities. Information-theoretic approaches that search for partitions that minimize description length are a recent alternative to modularity. This paper shows that two information-theoretic algorithms are themselves subject to a resolution limit, identifies the component of each approach that is responsible for the resolution limit, proposes a variant, SGE (Sparse Graph Encoding), that addresses this limitation, and demonstrates on three artificial data sets that (1) SGE does not exhibit a resolution limit on sparse graphs in which other approaches do, and that (2) modularity and the compression-based algorithms, including SGE, behave similarly on graphs not subject to the resolution limit. 1
Continuous Time Group Discovery in Dynamic Graphs
"... With the rise in availability and importance of graphs and networks, it has become increasingly important to have good models to describe their behavior. While much work has focused on modeling static graphs, we focus on group discovery in dynamic graphs. We adapt a dynamic extension of Latent Diric ..."
Abstract
- Add to MetaCart
With the rise in availability and importance of graphs and networks, it has become increasingly important to have good models to describe their behavior. While much work has focused on modeling static graphs, we focus on group discovery in dynamic graphs. We adapt a dynamic extension of Latent Dirichlet Allocation to this task and demonstrate good performance on two datasets. 1
Extracting and Ranking Viral Communities Using Seeds and Content Similarity
"... We study the community extraction problem within the context of networks of blogs and forums. When starting from a small set of known seed nodes, we argue that the use of content information (beyond explicit link information) plays an essential role in the identification of the relevant community. O ..."
Abstract
- Add to MetaCart
We study the community extraction problem within the context of networks of blogs and forums. When starting from a small set of known seed nodes, we argue that the use of content information (beyond explicit link information) plays an essential role in the identification of the relevant community. Our approach lends itself to a new and insightful ranking scheme for members of the extracted community and an efficient algorithm for inflating/deflating the extracted community. Using a considerably large commercial data set of blog and forum sites, we provide experimental evidence to demonstrate the utility, efficiency, and stability of our methods. Categories and Subject Descriptors H.2.8 [Database Management]: database applications—Data mining;
V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors
"... This work proposes V-SMART-Join, a scalable MapReducebased framework for discovering all pairs of similar entities. The V-SMART-Join framework is applicable to sets, multisets, and vectors. V-SMART-Join is motivated by the observed skew in the underlying distributions of Internet traffic, and is a f ..."
Abstract
- Add to MetaCart
This work proposes V-SMART-Join, a scalable MapReducebased framework for discovering all pairs of similar entities. The V-SMART-Join framework is applicable to sets, multisets, and vectors. V-SMART-Join is motivated by the observed skew in the underlying distributions of Internet traffic, and is a family of 2-stage algorithms, where the first stage computes and joins the partial results, and the second stage computes the similarity exactly for all candidate pairs. The V-SMART-Join algorithms are very efficient and scalable in the number of entities, as well as their cardinalities. They were up to 30 times faster than the state of the art algorithm, VCL, when compared on a real dataset of a small size. We also established the scalability of the proposed algorithms by running them on a dataset of a realistic size, on which VCL never succeeded to finish. Experiments were run using real datasets of IPs and cookies, where each IP is represented as a multiset of cookies, and the goal is to discover similar IPs to identify Internet proxies. 1.

