Results 1 
4 of
4
Ranking and Semisupervised Classification on Large Scale Graphs Using MapReduce
"... Label Propagation, a standard algorithm for semisupervised classification, suffers from scalability issues involving memory and computation when used with largescale graphs from realworld datasets. In this paper we approach Label Propagation as solution to a system of linear equations which can be ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Label Propagation, a standard algorithm for semisupervised classification, suffers from scalability issues involving memory and computation when used with largescale graphs from realworld datasets. In this paper we approach Label Propagation as solution to a system of linear equations which can be implemented as a scalable parallel algorithm using the mapreduce framework. In addition to semisupervised classification, this approach to Label Propagation allows us to adapt the algorithm to make it usable for ranking on graphs and derive the theoretical connection between Label Propagation and PageRank. We provide empirical evidence to that effect using two natural language tasks – lexical relatedness and polarity induction. The version of the Label Propagation algorithm presented here scales linearly in the size of the data with a constant main memory requirement, in contrast to the quadratic cost of both in traditional approaches. 1
Adaptive Graph Walk Based Similarity Measures in EntityRelation Graphs
, 2008
"... Relational or semistructured data is naturally represented by a graph schema, where nodes denote entities and directed typed edges represent the relations between them. Such graphs are heterogeneous in the sense that they describe different types of objects and multiple types of links. For example, ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Relational or semistructured data is naturally represented by a graph schema, where nodes denote entities and directed typed edges represent the relations between them. Such graphs are heterogeneous in the sense that they describe different types of objects and multiple types of links. For example, email data can be described in a graph that includes messages, persons, dates and other objects; in this graph, a message may be associated with a person with different relations, such as ”sentto”, ”sentfrom ” and so on. In the past, researchers have suggested to apply random graph walks in order to elicit a measure of similarity between entities that are not directly connected in a graph. In this thesis, we suggest a general framework, in which different arbitrary queries (for instance, ”what persons are most related to this email message?”) are addressed using random walks. Naturally, there are many types of queries possible that correspond to various flavors of interentity similarity; several learning techniques are therefore suggested and evaluated that adapt the graphwalk
Inferring Regulatory Networks from Multiple Sources of Genomic Data
, 2004
"... This thesis addresses the problems of modeling the gene regulatory system from multiple sources of largescale datasets. In the first part, we develop a computational framework of building and validating simple, mechanistic models of gene regulation from multiple sources of data. These models, which ..."
Abstract
 Add to MetaCart
This thesis addresses the problems of modeling the gene regulatory system from multiple sources of largescale datasets. In the first part, we develop a computational framework of building and validating simple, mechanistic models of gene regulation from multiple sources of data. These models, which we call physical network models, annotate the network of molecular interactions with several types of attributes (variables). We associate model attributes with physical interaction and knockout gene expression data according to the confidence measures of data and the hypothesis that gene regulation is achieved via molecular interaction cascades. By applying standard model inference algorithms, we are able to obtain the configurations of model attributes which optimally fit the data. Because existing datasets do not provide sufficient constraints to the models, there are many optimal configurations which fit the data equally well. In the second part, we develop an information theoretic score to measure the expected capacity of new knockout experiments in terms of reducing the model uncertainty. We collaborate with biologists to perform suggested knockout
unknown title
"... Information regularization is a principle for assigning labels to unlabeled data points in a semisupervised setting. The broader principle is based on finding labels that minimize the information induced between examples and labels relative to a topology over the examples; any label variation withi ..."
Abstract
 Add to MetaCart
Information regularization is a principle for assigning labels to unlabeled data points in a semisupervised setting. The broader principle is based on finding labels that minimize the information induced between examples and labels relative to a topology over the examples; any label variation within a small local region of examples ties together the identities of examples and their labels. Such variation should be minimized unless supported directly or indirectly by the available labeled examples. The principle can be cast in terms of Tikhonov style regularization for maximizing likelihood of labeled examples with an information theoretic regularization penalty. We consider two ways of representing the topology over examples, either based on complete knowledge of the marginal density, or by grouping together examples whose labels should be related. We discuss the learning algorithms and sample complexity issues that result from each representation. 1