## Topic and role discovery in social networks (2005)

### Cached

### Download Links

- [www.cs.umass.edu]
- [people.cs.umass.edu]
- [www.cs.umass.edu]
- [www.cs.cmu.edu]
- [www.cs.colorado.edu]
- [www.cs.cmu.edu]
- [ciir.cs.umass.edu]
- [maroo.cs.umass.edu]
- [ciir-publications.cs.umass.edu]
- [www.cs.umass.edu]
- [people.cs.umass.edu]
- [maroo.cs.umass.edu]
- [ciir-publications.cs.umass.edu]
- [people.ee.duke.edu]
- [www.aaai.org]
- [www.cs.cmu.edu]
- [www.ijcai.org]
- [www.cs.umass.edu]
- [ijcai.org]
- [www.jair.org]
- [www.jair.org]
- [jair.org]
- [www.cs.cmu.edu]
- [maroo.cs.umass.edu]
- [ciir-publications.cs.umass.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In IJCAI |

Citations: | 152 - 14 self |

### BibTeX

@INPROCEEDINGS{Mccallum05topicand,

author = {Andrew Mccallum and Andrés Corrada-emmanuel and Xuerui Wang},

title = {Topic and role discovery in social networks},

booktitle = {In IJCAI},

year = {2005},

pages = {786--791}

}

### Years of Citing Articles

### OpenURL

### Abstract

Previous work in social network analysis (SNA) has modeled the existence of links from one entity to another, but not the language content or topics on those links. We present the Author-Recipient-Topic (ART) model for social network analysis, which learns topic distributions based on the direction-sensitive messages sent between entities. The model builds on Latent Dirichlet Allocation (LDA) and the Author-Topic (AT) model, adding the key attribute that distribution over topics is conditioned distinctly on both the sender and recipient—steering the discovery of topics according to the relationships between people. We give results on both the Enron email corpus and a researcher’s email archive, providing evidence not only that clearly relevant topics are discovered, but that the ART model better predicts people’s roles. 1 Introduction and Related Work Social network analysis (SNA) is the study of mathematical models for interactions among people, organizations and groups. With the recent availability of large datasets of human

### Citations

2366 | Latent Dirichlet Allocation
- Blei, Ng, et al.
(Show Context)
Citation Context ...earch in machine learning and natural language models for clustering words in order to discover the few underlying topics that are combined to form documents in a corpus. Latent Dirichlet Allocation [=-=Blei et al., 2003-=-] robustly discovers multinomial word distributions of these topics. Hierarchical Dirichlet Processes [Teh et al., 2004] can determine an appropriate number of topics for a corpus. The Author-Topic Mo... |

1652 |
Social Network Analysis: Methods and applications
- Wasserman, Faust
- 1994
(Show Context)
Citation Context ...as been growing interest in social network analysis. Historically, research in the field has been led by social scientists and physicists [Lorrain & White, 1971; Albert & Barabási, 2002; Watts, 2003; =-=Wasserman & Faust, 1994-=-], and previous work has emphasized binary interaction data, with directed and/or weighted edges. There has not, however, previously been significant work by researchers with backgrounds in statistica... |

1198 | Statistical Mechanics of Complex Networks
- Albert, Barabasi
(Show Context)
Citation Context ...ions among the 9/11 hijackers, there has been growing interest in social network analysis. Historically, research in the field has been led by social scientists and physicists [Lorrain & White, 1971; =-=Albert & Barabási, 2002-=-; Watts, 2003; Wasserman & Faust, 1994], and previous work has emphasized binary interaction data, with directed and/or weighted edges. There has not, however, previously been significant work by rese... |

625 |
Finding scientific topics
- Griffiths, Steyvers
- 2004
(Show Context)
Citation Context ...,w,α,β,a,r) 6: update nadxdizdi and mzdiwdi 7: end for 8: end for 9: until the Markov chain reaches its equilibrium 10: compute the posterior estimates of θ and φ (Blei et al., 2003), Gibbs sampling (=-=Griffiths & Steyvers, 2004-=-; Steyvers et al., 2004; RosenZvi, Griffiths, Steyvers, & Smyth, 2004), and expectation propagation (Griffiths & Steyvers, 2004; Minka & Lafferty, 2002). We choose Gibbs sampling for its ease of imple... |

538 | Hierarchical dirichlet processes
- Teh, Jordan, et al.
- 2006
(Show Context)
Citation Context ...Latent Semantic Indexing (Hofmann, 2001) and Latent Dirichlet Allocation (Blei, Ng, & Jordan, 2003) robustly discover multinomial word distributions of these topics. Hierarchical Dirichlet Processes (=-=Teh, Jordan, Beal, & Blei, 2004-=-) can determine an appropriate number of topics for a corpus. The Author-Topic Model (Steyvers, Smyth, Rosen-Zvi, & Griffiths, 2004) learns topics conditioned on the mixture of authors that composed a... |

444 | Unsupervised learning by probabilistic latent semantic analysis
- Hofmann
(Show Context)
Citation Context ...chine learning and natural language models for clustering words in order to discover the few underlying topics that are combined to form documents in a corpus. Probabilistic Latent Semantic Indexing (=-=Hofmann, 2001-=-) and Latent Dirichlet Allocation (Blei, Ng, & Jordan, 2003) robustly discover multinomial word distributions of these topics. Hierarchical Dirichlet Processes (Teh, Jordan, Beal, & Blei, 2004) can de... |

392 | Correlated topic models
- Blei, Lafferty
(Show Context)
Citation Context ... and text are combined in the Topics over Time (TOT) model (Wang & McCallum, 2006), which finds trends in time-sensitive topics using a continuous distribution over time-stamps. Dynamic Topic Models (=-=Blei & Lafferty, 2006-=-b) incorporate time into topic models through transitions in a Markov process. The ART model could be easily extended to incorporate temporal information. 255sMcCallum, Wang, & Corrada-Emmanuel As dis... |

233 | The author-topic model for authors and documents - Rosen-Zvi, Griffiths, et al. - 2004 |

223 | An introduction to MCMC for machine learning - Andrieu, Freitas, et al. |

206 |
Six Degrees: The Science of a Connected Age
- Watts
- 2003
(Show Context)
Citation Context ...kers, there has been growing interest in social network analysis. Historically, research in the field has been led by social scientists and physicists [Lorrain & White, 1971; Albert & Barabási, 2002; =-=Watts, 2003-=-; Wasserman & Faust, 1994], and previous work has emphasized binary interaction data, with directed and/or weighted edges. There has not, however, previously been significant work by researchers with ... |

204 | Navigation in a small world - Kleinberg |

138 | Topics over time: a non-markov continuous-time model of topical trends
- Wang, McCallum
- 2006
(Show Context)
Citation Context ...roves both the groups and topics discovered. Other modalities of information can be combined to discover hidden structure. For example, time and text are combined in the Topics over Time (TOT) model (=-=Wang & McCallum, 2006-=-), which finds trends in time-sensitive topics using a continuous distribution over time-stamps. Dynamic Topic Models (Blei & Lafferty, 2006b) incorporate time into topic models through transitions in... |

137 | Learning systems of concepts with an infinite relational model
- Kemp, Tenenbaum, et al.
- 2006
(Show Context)
Citation Context ... an inordinately high number of connections, or with connections to a particularly well-connected subset (group or block) of the network (Nowicki & Snijders, 2001; Kemp, Griffiths, & Tenenbaum, 2004; =-=Kemp, Tenenbaum, Griffiths, Yamada, & Ueda, 2006-=-; Kubica, Moore, Schneider, & Yang, 2002; Airoldi, Blei, Fienberg, & Xing, 2006; Kurihara, Kameya, & Sato, 2006). Furthermore, using these properties we can assign “roles” to certain nodes (Lorrain & ... |

132 |
Structural Equivalence of Individuals in Social Networks
- Lorrain, White
- 1971
(Show Context)
Citation Context ...salience of the connections among the 9/11 hijackers, there has been growing interest in social network analysis. Historically, research in the field has been led by social scientists and physicists [=-=Lorrain & White, 1971-=-; Albert & Barabási, 2002; Watts, 2003; Wasserman & Faust, 1994], and previous work has emphasized binary interaction data, with directed and/or weighted edges. There has not, however, previously been... |

129 | Multi-label text classification with a mixture model trained by em
- McCallum
- 1999
(Show Context)
Citation Context ...istribution φz. The robustness of the model is greatly enhanced by integrating out uncertainty about the per-document topic distribution θ. The Author model (also termed a Multi-label Mixture Model) [=-=McCallum, 1999-=-], is a Bayesian network that simultaneously models document content and its authors’ interests with a 1-1 correspondence between topics and authors. For each document d, a set of authors ad is observ... |

128 | Identity and search in social networks - Watts, Dodds, et al. - 2002 |

123 | Integrating topics and syntax
- Griffiths, Steyvers, et al.
- 2005
(Show Context)
Citation Context ...paper citations (Erosheva, Fienberg, & Lafferty, 2004), capturing correlations among topics (Blei & Lafferty, 2006a; Li & McCallum, 2006), taking advantage of both topical and syntactic dependencies (=-=Griffiths, Steyvers, Blei, & Tenenbaum, 2004-=-), and discovering topically-relevant phrases by Markov dependencies in word sequences (Wang, McCallum, & Wei, 2007). Many of these models could be easily combined with the ART model, and would likely... |

116 | Probabilistic author-topic models for information discovery
- STEYVERS, SMYTH, et al.
- 2004
(Show Context)
Citation Context ...ustly discovers multinomial word distributions of these topics. Hierarchical Dirichlet Processes [Teh et al., 2004] can determine an appropriate number of topics for a corpus. The Author-Topic Model [=-=Steyvers et al., 2004-=-] learns topics conditioned on the mixture of authors that composed a document. However, none of these models are appropriate for SNA, in which we aim to capture the directed interactions and relation... |

116 | Pachinko allocation: Dag-structured mixture models of topic correlations
- Li, McCallum
(Show Context)
Citation Context ...ent years for many different tasks, including joint modeling of words and research paper citations (Erosheva, Fienberg, & Lafferty, 2004), capturing correlations among topics (Blei & Lafferty, 2006a; =-=Li & McCallum, 2006-=-), taking advantage of both topical and syntactic dependencies (Griffiths, Steyvers, Blei, & Tenenbaum, 2004), and discovering topically-relevant phrases by Markov dependencies in word sequences (Wang... |

116 |
Estimation and prediction for stochastic blockstructures
- Nowicki, Snijders
- 2001
(Show Context)
Citation Context ... is heavy-tailed, we can also find those particular nodes with an inordinately high number of connections, or with connections to a particularly well-connected subset (group or block) of the network (=-=Nowicki & Snijders, 2001-=-; Kemp, Griffiths, & Tenenbaum, 2004; Kemp, Tenenbaum, Griffiths, Yamada, & Ueda, 2006; Kubica, Moore, Schneider, & Yang, 2002; Airoldi, Blei, Fienberg, & Xing, 2006; Kurihara, Kameya, & Sato, 2006). ... |

110 | Expectation-propagation for the generative aspect model
- Minka, Lafferty
- 2002
(Show Context)
Citation Context ...θ and φ (Blei et al., 2003), Gibbs sampling (Griffiths & Steyvers, 2004; Steyvers et al., 2004; RosenZvi, Griffiths, Steyvers, & Smyth, 2004), and expectation propagation (Griffiths & Steyvers, 2004; =-=Minka & Lafferty, 2002-=-). We choose Gibbs sampling for its ease of implementation. Note that we adopt conjugate priors (Dirichlet) for the multinomial distributions, and thus we can easily integrate out θ and φ, analyticall... |

54 | Stochastic link and group detection
- Kubica, Moore, et al.
- 2002
(Show Context)
Citation Context ...th connections to a particularly well-connected subset (group or block) of the network (Nowicki & Snijders, 2001; Kemp, Griffiths, & Tenenbaum, 2004; Kemp, Tenenbaum, Griffiths, Yamada, & Ueda, 2006; =-=Kubica, Moore, Schneider, & Yang, 2002-=-; Airoldi, Blei, Fienberg, & Xing, 2006; Kurihara, Kameya, & Sato, 2006). Furthermore, using these properties we can assign “roles” to certain nodes (Lorrain & White, 1971; Wolfe & Jensen, 2004). Howe... |

52 | The Enron Email Dataset Database Schema and Brief Statistical Report
- Shetty, Adibi
- 2004
(Show Context)
Citation Context ...d Work Social network analysis (SNA) is the study of mathematical models for interactions among people, organizations and groups. With the recent availability of large datasets of human interactions [=-=Shetty & Adibi, 2004-=-; Wu et al., 2003], the popularity of services like Friendster.com and LinkedIn.com, and the salience of the connections among the 9/11 hijackers, there has been growing interest in social network ana... |

46 | The author-recipient-topic model for topic and role discovery in social networks: Experiments with Enron and academic email
- McCallum, Corrada-Emmanuel, et al.
- 2004
(Show Context)
Citation Context ...d the McCallum dataset consists of 23,488 messages written by 825 authors, sent or received by McCallum during Jan.-Oct., 2004. Gibbs sampling is employed to conduct all experiements (as detailed in [=-=McCallum et al., 2004-=-]). 3.1 Topics and Prominent Relations from ART Table 1 shows the highest probability words from six topics in an ART model trained on the 147 Enron users with 50 topics. (The quoted titles are our ow... |

40 | Discovering latent classes in relational data
- Kemp, Griffiths, et al.
- 2004
(Show Context)
Citation Context ...lso find those particular nodes with an inordinately high number of connections, or with connections to a particularly well-connected subset (group or block) of the network (Nowicki & Snijders, 2001; =-=Kemp, Griffiths, & Tenenbaum, 2004-=-; Kemp, Tenenbaum, Griffiths, Yamada, & Ueda, 2006; Kubica, Moore, Schneider, & Yang, 2002; Airoldi, Blei, Fienberg, & Xing, 2006; Kurihara, Kameya, & Sato, 2006). Furthermore, using these properties ... |

37 | Topical ngrams: Phrase and topic discovery, with an application to information retrieval
- Wang, McCallum, et al.
- 2007
(Show Context)
Citation Context ...2006), taking advantage of both topical and syntactic dependencies (Griffiths, Steyvers, Blei, & Tenenbaum, 2004), and discovering topically-relevant phrases by Markov dependencies in word sequences (=-=Wang, McCallum, & Wei, 2007-=-). Many of these models could be easily combined with the ART model, and would likely prove useful. 4. Experimental Results We present results with the Enron email corpus and the personal email of one... |

33 | Expertise modeling for matching papers with reviewers
- Mimno, McCallum
(Show Context)
Citation Context ...at ART is specifically designed to capture language used in a directed network of correspondents. Another more recent model that associates topics with people is the Author-Persona-Topic (APT) model (=-=Mimno & McCallum, 2007-=-). APT is designed specifically to capture the expertise of a person, modeling expertise as a mixture of topical intersections, and is demonstrated on the task of matching reviewers to submitted resea... |

23 | Group and topic discovery from relations and their attributes
- Wang, Mohanty, et al.
- 2005
(Show Context)
Citation Context ...describe experiments with one of these variants. The importance of modeling the language associated with social network interactions has also recently been demonstrated in the Group-Topic (GT) model (=-=Wang, Mohanty, & McCallum, 2006-=-). Unlike ART, which discovers roles, GT discovers groups. Like ART, it uses text data to find interesting and useful patterns that would not be possible with edge relations alone. GT simultaneously c... |

16 | Stochastic blockmodels: some first steps, Social Netw 5 - Holland, Laskey, et al. - 1983 |

16 | Learning author-topic models from text corpora - Rosen-Zvi, Chemudugunta, et al. - 2010 |

12 |
Playing multiple roles: Discovering overlapping roles in social networks
- Identify, Wolfe, et al.
- 2004
(Show Context)
Citation Context ...er of connections, or with connections to a particularly well-connected subset of the network. Furthermore, using these properties we can assign “roles” to certain nodes, e.g. [Lorrain & White, 1971; =-=Wolfe & Jensen, 2003-=-]. However, it is clear that network properties are not enough to discover all the roles in a social network. Consider email messages in a corporate setting, and imagine a situation where a tightly kn... |

4 |
Hierarchical Dirichlet processes (Technical Report
- Teh, Jordan, et al.
- 2004
(Show Context)
Citation Context ...s that are combined to form documents in a corpus. Latent Dirichlet Allocation [Blei et al., 2003] robustly discovers multinomial word distributions of these topics. Hierarchical Dirichlet Processes [=-=Teh et al., 2004-=-] can determine an appropriate number of topics for a corpus. The Author-Topic Model [Steyvers et al., 2004] learns topics conditioned on the mixture of authors that composed a document. However, none... |

4 |
A Frequency-based stochastic blockmodel
- Kurihara, Kameya, et al.
- 2006
(Show Context)
Citation Context ...network (Nowicki & Snijders, 2001; Kemp, Griffiths, & Tenenbaum, 2004; Kemp, Tenenbaum, Griffiths, Yamada, & Ueda, 2006; Kubica, Moore, Schneider, & Yang, 2002; Airoldi, Blei, Fienberg, & Xing, 2006; =-=Kurihara, Kameya, & Sato, 2006-=-). Furthermore, using these properties we can assign “roles” to certain nodes (Lorrain & White, 1971; Wolfe & Jensen, 2004). However, it is clear that network properties are not enough to discover all... |

2 |
Stochastic blockmodels of mixedmembership: General formulation and nested variational inference
- Airoldi, Blei, et al.
- 2006
(Show Context)
Citation Context ...nnected subset (group or block) of the network (Nowicki & Snijders, 2001; Kemp, Griffiths, & Tenenbaum, 2004; Kemp, Tenenbaum, Griffiths, Yamada, & Ueda, 2006; Kubica, Moore, Schneider, & Yang, 2002; =-=Airoldi, Blei, Fienberg, & Xing, 2006-=-; Kurihara, Kameya, & Sato, 2006). Furthermore, using these properties we can assign “roles” to certain nodes (Lorrain & White, 1971; Wolfe & Jensen, 2004). However, it is clear that network propertie... |

1 | r) ∝ αzdi + nadxdizdi − 1 �T t=1 (αt + nadxdit) − 1 αzdi + nadxdizdi - unknown authors - 2004 |

1 | Role Discovery in Social Networks Erosheva - Topic - 2004 |

1 | Role Discovery in Social Networks Andrieu - Topic - 2003 |

1 | r) ∝ ∝ αzdi ∑T + nadxdizdi − 1 t=1 (αt + nadxdit - unknown authors - 2004 |

1 | 270 and Role Discovery in Social Networks Erosheva - Fienberg, S, et al. - 2004 |