## Ranking and Semi-supervised Classification on Large Scale Graphs Using Map-Reduce

### Cached

### Download Links

Citations: | 7 - 0 self |

### BibTeX

@MISC{Rao_rankingand,

author = {Delip Rao and David Yarowsky},

title = {Ranking and Semi-supervised Classification on Large Scale Graphs Using Map-Reduce},

year = {}

}

### OpenURL

### Abstract

Label Propagation, a standard algorithm for semi-supervised classification, suffers from scalability issues involving memory and computation when used with largescale graphs from real-world datasets. In this paper we approach Label Propagation as solution to a system of linear equations which can be implemented as a scalable parallel algorithm using the map-reduce framework. In addition to semi-supervised classification, this approach to Label Propagation allows us to adapt the algorithm to make it usable for ranking on graphs and derive the theoretical connection between Label Propagation and PageRank. We provide empirical evidence to that effect using two natural language tasks – lexical relatedness and polarity induction. The version of the Label Propagation algorithm presented here scales linearly in the size of the data with a constant main memory requirement, in contrast to the quadratic cost of both in traditional approaches. 1

### Citations

2162 | T.Winograd. The pagerank citation ranking: Bringing order to the web
- Page, Brin, et al.
- 1999
(Show Context)
Citation Context ...lgorithm, the mass associated with each node determines its rank. 6.1 Connection to PageRank It is interesting to note that Algorithm 1 brings out a connection between Label Propagation and PageRank (=-=Page et al., 1998-=-). PageRank is a random walk model that allows the random walk to “jump” to its initial state with a nonzero probability (α). Given the probability transition matrix P = [Prs], where Prs is the probab... |

1759 | MapReduce: Simplified data processing on large clusters
- Dean, Ghemawat
- 2008
(Show Context)
Citation Context ...rther, these operations only rely on local information (from neighboring vertices of the graph). This leads to the parallel algorithm (Algorithm 2) implemented using the map-reduce model. Map-Reduce (=-=Dean and Ghemawat, 2004-=-) is a paradigm for implementing distributed algorithms with two user supplied functions “map” and “reduce”. The map function processes the input key/value pairs with the key being a unique iden60tif... |

289 | Efficient noise-tolerant learning from statistical queries
- Kearns
- 1998
(Show Context)
Citation Context ...s tremendous interest in application of distributed computing to scale up machine learning algorithms. Chu et al. (2006) describe a family of learning algorithms that fit the Statistical Query Model (=-=Kearns, 1993-=-). These algorithms can be written in a special summation form that is amenable to parallel speed-up. Examples of such algorithms include Naive Bayes, Logistic Regression, backpropagation in Neural Ne... |

252 | Contextual Correlates of Semantic Similarity - Miller, Charles - 1991 |

80 | Confidence Estimation for Machine Translation
- Blatz, Fitzgerald, et al.
- 2003
(Show Context)
Citation Context ...pic of interest with applications in word sense disambiguation (Patwardhan et al., 2005), paraphrasing (Kauchak and Barzilay, 2006), question answering (Prager et al., 2001), and machine translation (=-=Blatz et al., 2004-=-), to name a few. Following the tradition in previous literature we evaluate on the Miller and Charles (1991) dataset. We compare our rankings with the human judegments using the Spearman rank correla... |

58 | Paraphrasing for automatic evaluation
- Kauchak, Barzilay
- 2006
(Show Context)
Citation Context ...nking, we consider the problem of deriving lexical relatedness between terms. This has been a topic of interest with applications in word sense disambiguation (Patwardhan et al., 2005), paraphrasing (=-=Kauchak and Barzilay, 2006-=-), question answering (Prager et al., 2001), and machine translation (Blatz et al., 2004), to name a few. Following the tradition in previous literature we evaluate on the Miller and Charles (1991) da... |

30 | 2006. Identifying and analyzing judgment opinions
- Kim, Hovy
(Show Context)
Citation Context ... data. We compare our results 2 http://www.wjh.harvard.edu/∼inquirer/ 62(a) (b) Figure 3: Scalability results: (a) Scaleup (b) Speedup (F-scores) with another scalable previous work by Kim and Hovy (=-=Kim and Hovy, 2006-=-) in Table 2 for the same seed set. Their approach starts with a few seeds of positive and negative terms and bootstraps the list by considering all synonyms of positive word as positive and antonyms ... |

27 | Fully distributed em for very large datasets - Wolfe, Haghighi, et al. - 2008 |

25 | Introduction to parallel computing, 2nd Edition - Grama, Gupta, et al. - 2003 |

24 | Use of WordNet hypernyms for answering what-is questions
- PRAGER, CHU-CARROLL
- 2001
(Show Context)
Citation Context ...l relatedness between terms. This has been a topic of interest with applications in word sense disambiguation (Patwardhan et al., 2005), paraphrasing (Kauchak and Barzilay, 2006), question answering (=-=Prager et al., 2001-=-), and machine translation (Blatz et al., 2004), to name a few. Following the tradition in previous literature we evaluate on the Miller and Charles (1991) dataset. We compare our rankings with the hu... |

11 | SenseRelate::TargetWord – A generalized framework for word sense disambiguation
- Patwardhan, Banerjee, et al.
- 2005
(Show Context)
Citation Context ...ing vertices. 8.1 Ranking To evaluate ranking, we consider the problem of deriving lexical relatedness between terms. This has been a topic of interest with applications in word sense disambiguation (=-=Patwardhan et al., 2005-=-), paraphrasing (Kauchak and Barzilay, 2006), question answering (Prager et al., 2001), and machine translation (Blatz et al., 2004), to name a few. Following the tradition in previous literature we e... |

4 | Scalable Computing for Power Law Graphs: Experience with Parallel PageRank - Gleich, Zhukov - 2005 |

4 | Clustering and efficient use of unlabeled examples - Szummer, Jaakkola |