## Combinational Collaborative Filtering for Personalized Community Recommendation (2008)

### Cached

### Download Links

- [infolab.stanford.edu]
- [alumni.cs.ucsb.edu]
- [www.cs.ucsb.edu]
- [www.cs.ucsb.edu]
- [static.googleusercontent.com]
- DBLP

### Other Repositories/Bibliography

Venue: | KDD'08 |

Citations: | 15 - 6 self |

### BibTeX

@MISC{Chen08combinationalcollaborative,

author = {Wen-Yen Chen and Dong Zhang and Edward Y. Chang},

title = {Combinational Collaborative Filtering for Personalized Community Recommendation },

year = {2008}

}

### OpenURL

### Abstract

Rapid growth in the amount of data available on social networking sites has made information retrieval increasingly challenging for users. In this paper, we propose a collaborative filtering method, Combinational Collaborative Filtering (CCF), to perform personalized community recommendations by considering multiple types of co-occurrences in social data at the same time. This filtering method fuses semantic and user information, then applies a hybrid training strategy that combines Gibbs sampling and Expectation-Maximization algorithm. To handle the large-scale dataset, parallel computing is used to speed up the model training. Through an empirical study on the Orkut dataset, we show CCF to be both effective and scalable.

### Citations

9193 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...ibbs & EM Hybrid Training Given the model structure, the next step is to learn model parameters. There are some standard learning algorithms, such as Gibbs sampling [6], Expectation-Maximization (EM) =-=[5]-=-, and Gradient descent. For CCF, we propose a hybrid training strategy: We first run Gibbs sampling for a few iterations, then switch to EM. The model trained by Gibbs sampling provides the initializa... |

4100 |
Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...descriptions, respectively. 2.2.1 Gibbs & EM Hybrid Training Given the model structure, the next step is to learn model parameters. There are some standard learning algorithms, such as Gibbs sampling =-=[6]-=-, Expectation-Maximization (EM) [5], and Gradient descent. For CCF, we propose a hybrid training strategy: We first run Gibbs sampling for a few iterations, then switch to EM. The model trained by Gib... |

2662 | Latent dirichlet allocation
- Blei, Ng, et al.
- 1022
(Show Context)
Citation Context ...ed Work Several algorithms have been proposed to deal with either bags of words or bags of users. Specifically, Probabilistic Latent Semantic Analysis (PLSA) [7] and Latent DirichletAllocation (LDA) =-=[3]-=- model document-word co-occurrence, which is similar to the bags of words community view. Probabilistic Hypertext Induced Topic Selection (PHITS) [4], a variant of PLSA, models document-citation co-oc... |

584 | Probabilistic Latent Semantic Analysis
- Hofmann
- 1999
(Show Context)
Citation Context ...s effectiveness and scalability. 1.1 Related Work Several algorithms have been proposed to deal with either bags of words or bags of users. Specifically, Probabilistic Latent Semantic Analysis (PLSA) =-=[7]-=- and Latent DirichletAllocation (LDA) [3] model document-word co-occurrence, which is similar to the bags of words community view. Probabilistic Hypertext Induced Topic Selection (PHITS) [4], a varia... |

434 | Cluster ensembles – a knowledge reuse framework for combining multiple partitions
- Strehl, Ghosh
- 2002
(Show Context)
Citation Context ...AT and CLS. The entropies H(CAT ) and H(CLS) are used for normalizing the mutual information to be in the range [0, 1]. In practice, we made use of the following formulation to estimate the NMI score =-=[12]-=-: PK s=1 NMI = q`P s PK t=1 ns,t log ns log ns n ´ `P t “ n·ns,t ns·nt ” nt log nt n ´ , (17) where n is the number of communities, ns and nt denote the numbers of community in category s and cluster ... |

144 |
Using MPI-2: Advanced Features of the Message-Passing Interface
- Gropp, Lusk, et al.
- 1999
(Show Context)
Citation Context ... suitable for parallelizing iterative algorithms than MapReduce. Since standard MPI implementations (MPICH2) cannot be directly ported to our system, we implemented our own system by modifying MPICH2 =-=[13]-=-. Parallel Gibbs sampling We distribute the computation among machines based on community IDs. Thus, each machine i only deals with a specified subset of communities ci, and is aware of all users u an... |

138 | Learning to probabilistically identify authoritative documents
- Cohn, Chang
- 2000
(Show Context)
Citation Context ...s (PLSA) [7] and Latent DirichletAllocation (LDA) [3] model document-word co-occurrence, which is similar to the bags of words community view. Probabilistic Hypertext Induced Topic Selection (PHITS) =-=[4]-=-, a variant of PLSA, models document-citation co-occurrence, which is similar to the bags of users community view. However, a system that considers just bags of users cannot take advantage of content ... |

136 | Probabilistic author-topic models for information discovery
- Steyvers, Smyth, et al.
(Show Context)
Citation Context ...sources to alleviate the information sparsity problem of a single source. Several other algorithms have been proposed to model publication and email data 1 . For instance, the author-topic (AT) model =-=[11]-=- employs two factors in characterizing a document: the document’s authors and topics. Modeling both factors as variables within a Bayesian network allows the AT model to group the words used in a docu... |

59 | Distributed inference for latent dirichlet allocation
- Newman, Asuncion, et al.
- 2007
(Show Context)
Citation Context ... C ik UZ mk , C DZ n − C ik DZ nk ) to a specified root, then the root broadcasts the global difference (sum of all local differences) to other machines to update global counts (C UZ mk and C DZ nk ) =-=[9]-=-. This is a MPI AllReduce operation in MPI. We summarize the process in Algorithm 1. Algorithm 1: Parallel Gibbs Sampling of CCF Input: N × M community-user matrix N × V community-description matrix; ... |

53 |
O.: Evaluating Similarity Measures: A Large-Scale Study in the Orkut Social Network
- Spertus, Sahami, et al.
- 2005
(Show Context)
Citation Context ...he number of EM iterations (ranging from 100 to 500). The reported results are the average performance over all runs. 2 All user data were anonymized, and user privacy is safeguarded, as performed in =-=[10]-=-.0.4 0.35 0.3 CCF precision C−U precision CCF recall C−U recall Percentage 0.25 0.2 0.15 0.1 0.05 0 0 50 100 150 200 Length of the recommendation list Figure 5: The precision and recall as functions ... |

50 | The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email
- McCallum, Corrada-Emmanuel, et al.
(Show Context)
Citation Context ...sian network allows the AT model to group the words used in a document corpus into semantic topics, and to determine an author’s topic associations. For emails, the author-recipient-topic (ART) model =-=[8]-=- considers email recipient as an additional factor. This model can discover relevant topics from the sender-recipient structure in emails, and enjoys an improved ability to measure role-similarity bet... |

48 | Variational methods for the dirichlet process
- Blei
(Show Context)
Citation Context ...itive to initialization. A better initialization tends to allow EM to find a “better” optimum. Second, Gibbs sampling is too slow to be effective for large-scale datasets in high-dimensional problems =-=[2]-=-. A hybrid method can enjoy the advantages of Gibbs and EM. Gibbs sampling Gibbs sampling is a simple and widely applicable Markov chain Monte Carlo algorithm, which provides a simple method for obtai... |

6 |
Generative model-based clustering of documents: a comparative study
- Zhong, Ghosh
- 2005
(Show Context)
Citation Context ...AT and CLS. The entropies H(CAT ) and H(CLS) are used for normalizing the mutual information to be in the range [0, 1]. In practice, we made use of the following formulation to estimate the NMI score =-=[12, 13]-=-: PK s=1 NMI = q`P s PK t=1 ns,t log ns log ns n ´ `P t “ n·ns,t ns·nt ” nt log nt n ´ , (17) where n is the number of communities, ns and nt denote the numbers of community in category s and cluster ... |