## Probabilistic community discovery using hierarchical latent gaussian mixture model (2007)

### Cached

### Download Links

- [clgiles.ist.psu.edu]
- [clgiles.ist.psu.edu]
- [www.aaai.org]
- [www.aaai.org]
- [www.haizhengzhang.com]
- [agentlab.psu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In AAAI |

Citations: | 20 - 2 self |

### BibTeX

@INPROCEEDINGS{Zhang07probabilisticcommunity,

author = {Haizheng Zhang and C. Lee Giles and Henry C. Foley and John Yen},

title = {Probabilistic community discovery using hierarchical latent gaussian mixture model},

booktitle = {In AAAI},

year = {2007},

pages = {663--668}

}

### OpenURL

### Abstract

Complex networks exist in a wide array of diverse domains, ranging from biology, sociology, and computer science. These real-world networks, while disparate in nature, often comprise of a set of loose clusters(a.k.a communities), whose members are better connected to each other than to the rest of the network. Discovering such inherent community structures can lead to deeper understanding about the networks and therefore has raised increasing interests among researchers from various disciplines. This paper describes GWN-LDA(Generic weighted network-Latent Dirichlet Allocation) model, a hierarchical Bayesian model derived from the widely-received LDA model, for discovering probabilistic community profiles in social networks. In this model, communities are modeled as latent variables and defined as distributions over the social actor space. In addition, each social actor belongs to every community with different probability. This paper also proposes two different network encoding approaches and explores the impact of these two approaches to the community discovery performance. This model is evaluated on two research collaborative networks:CiteSeer and NanoSCI. The experimental results demonstrate that this approach is promising for discovering community structures in large-scale networks.

### Citations

2596 | Latent dirichlet allocation - Blei, Ng, et al. - 2003 |

815 |
Community structure in social and biological networks
- Girvan, Newman
(Show Context)
Citation Context ...ve been studied in a variety of networks, including World Wide Web(Flake, Lawrence, & Giles 2000), distributed information retrieval(Zhang et al. 2004), social networks(Clauset, Newman, & Moore 2004; =-=Girvan & Newman 2002-=-; Newman 2004b; Palla et al. 2005; Scott 2000; Zhou et al. 2006b; Newman 2004a), and biological networks(Girvan & Newman 2002; Palla et al. 2005; Wilkinson & Huberman 2004). Most of these approaches a... |

435 |
Social network analysis: a handbook
- Scott
- 2001
(Show Context)
Citation Context ...ld Wide Web(Flake, Lawrence, & Giles 2000), distributed information retrieval(Zhang et al. 2004), social networks(Clauset, Newman, & Moore 2004; Girvan & Newman 2002; Newman 2004b; Palla et al. 2005; =-=Scott 2000-=-; Zhou et al. 2006b; Newman 2004a), and biological networks(Girvan & Newman 2002; Palla et al. 2005; Wilkinson & Huberman 2004). Most of these approaches are characterized by the use of distance-based... |

409 | Finding community structure in very large networks - Clauset, Newman, et al. |

351 | Fast algorithm for detecting community structure in networks - Newman - 2004 |

291 |
Uncovering the overlapping community structure of complex networks in nature and society
- Palla, Derényi, et al.
- 2005
(Show Context)
Citation Context ...orks, including World Wide Web(Flake, Lawrence, & Giles 2000), distributed information retrieval(Zhang et al. 2004), social networks(Clauset, Newman, & Moore 2004; Girvan & Newman 2002; Newman 2004b; =-=Palla et al. 2005-=-; Scott 2000; Zhou et al. 2006b; Newman 2004a), and biological networks(Girvan & Newman 2002; Palla et al. 2005; Wilkinson & Huberman 2004). Most of these approaches are characterized by the use of di... |

245 | Efficient identification of web communities - Flake, Lawrence, et al. - 2000 |

153 | A.: Learning hierarchical models of scenes, objects, and parts
- Sudderth, Torralba, et al.
(Show Context)
Citation Context ...es has attracted significant interest and it has been applied to many domains including document modeling (Blei, Ng, & Jordan 2003; ?), text classification (Blei, Ng, & Jordan 2003), image processing(=-=Sudderth et al. 2005-=-), contextual community discovery(Zhou et al. 2006b; 2006a). GWN-LDA, similar to a previously developed model (SSN-LDA), encodes the structural information of networks into profiles and discovers comm... |

133 | Coautorship networks and patterns of scientific collaboration - Newman - 2004 |

122 | Pachinko allocation: DAG-structured mixture models of topic correlations
- Li
- 2006
(Show Context)
Citation Context ... & Jordan 2003). Its ability of modeling topics using latent variables has attracted significant interest and it has been applied to many domains including document modeling (Blei, Ng, & Jordan 2003; =-=Li & McCallum 2006-=-), text classification (Blei, Ng, & Jordan 2003), image processing(Sudderth et al. 2005), contextual community discovery(Zhou et al. 2006b; 2006a). GWN-LDA, similar to a previously developed model (SS... |

53 |
A method for finding communities of related genes
- Wilkinson, Huberman
(Show Context)
Citation Context ...ks(Clauset, Newman, & Moore 2004; Girvan & Newman 2002; Newman 2004b; Palla et al. 2005; Scott 2000; Zhou et al. 2006b; Newman 2004a), and biological networks(Girvan & Newman 2002; Palla et al. 2005; =-=Wilkinson & Huberman 2004-=-). Most of these approaches are characterized by the use of distance-based measures, including Centrality indices(a.k.a betweeness)(Freeman 1977; Girvan & Newman 2002; Wilkinson & Huberman 2004; Ruan ... |

23 | An ldabased community structure discovery approach for large-scale social networks
- Zhang, Qiu, et al.
- 2007
(Show Context)
Citation Context ...|�α)(12) where p(ιi,j = k|�α) = = � M� m=1 p(�ι|Θ)p(Θ| � (α))dΘ (13) ∆( nm � + �α) ∆(�α) (14) The Gibbs sampling process is analogous to the SSN-LDA model and the detailed algorithm is elaborated in (=-=Zhang et al. 2007-=-).sTable 1: Statistics for datasets CiteSeer and NanoSCI,PN denotes the number of papers; EN denotes the number of edges; AAP denotes the average author number per paper, and SLC denotes the size of t... |

17 | C.L.: Topic Evolution and Social Interactions: How Authors Effect Research. CIKM-2006 - Zhou, Ji, et al. - 2006 |

13 |
A set of measures of centrality based upon betweeness
- Freeman
- 1977
(Show Context)
Citation Context ...s(Girvan & Newman 2002; Palla et al. 2005; Wilkinson & Huberman 2004). Most of these approaches are characterized by the use of distance-based measures, including Centrality indices(a.k.a betweeness)(=-=Freeman 1977-=-; Girvan & Newman 2002; Wilkinson & Huberman 2004; Ruan & Zhang 2006) or Minimum cut approaches(Flake, Lawrence, & Giles 2000). The common ground for these studies is the definition of community dista... |

8 |
Identification and evaluation of weak community structures in networks
- Ruan, Zhang
- 2006
(Show Context)
Citation Context ... 2004). Most of these approaches are characterized by the use of distance-based measures, including Centrality indices(a.k.a betweeness)(Freeman 1977; Girvan & Newman 2002; Wilkinson & Huberman 2004; =-=Ruan & Zhang 2006-=-) or Minimum cut approaches(Flake, Lawrence, & Giles 2000). The common ground for these studies is the definition of community distance measures and (iterative) clustering process for minimizing such ... |

8 | Adjusting mixture weights of gaussian mixture model via regularized probabilistic latent semantic analysis - Si, Jin - 2005 |

7 | The structure and infrastructure of the global nanotechnology literature - Kostoff, Stump, et al. |

6 | A multi-agent approach for peer-to-peer based information retrieval system
- Zhang, Croft, et al.
- 2004
(Show Context)
Citation Context ...model.Background and Related Works Community discovery problems have been studied in a variety of networks, including World Wide Web(Flake, Lawrence, & Giles 2000), distributed information retrieval(=-=Zhang et al. 2004-=-), social networks(Clauset, Newman, & Moore 2004; Girvan & Newman 2002; Newman 2004b; Palla et al. 2005; Scott 2000; Zhou et al. 2006b; Newman 2004a), and biological networks(Girvan & Newman 2002; Pal... |

3 |
A multi-agent approach for peer-to-peer information retrieval
- Zhang, Croft, et al.
- 2004
(Show Context)
Citation Context ...model.sBackground and Related Works Community discovery problems have been studied in a variety of networks, including World Wide Web(Flake, Lawrence, & Giles 2000), distributed information retrieval(=-=Zhang et al. 2004-=-), social networks(Clauset, Newman, & Moore 2004; Girvan & Newman 2002; Newman 2004b; Palla et al. 2005; Scott 2000; Zhou et al. 2006b; Newman 2004a), and biological networks(Girvan & Newman 2002; Pal... |

1 |
Bayesian Data Analysis, Secone Edition
- Gelman, Carlin, et al.
- 2004
(Show Context)
Citation Context ...ke of simplicity, we denote the hyperparameter for the prior distribution is Ψ = { �µ0, �κ0} and Υ = {�υ, λ} respectively. According to this prior distribution definition, the joint prior density is (=-=Gelman et al. 2004-=-): p(Ω) ∝ |Σ| −( (υ0 +d) ∗ 2 exp +1) � −1 2 tr(Λ0Σ −1 ) − κ0 2 (µ − µ0) T Σ −1 (µ − µ0) �sThe parameters υ0 and Λ0 describe the degree of freedom and the scale matrix for the inverse-Wishart distribut... |