## Data Mining in Social Networks (2002)

### Cached

### Download Links

Venue: | In National Academy of Sciences Symposium on Dynamic Social Network Modeling and Analysis |

Citations: | 30 - 1 self |

### BibTeX

@INPROCEEDINGS{Jensen02datamining,

author = {David Jensen and Jennifer Neville},

title = {Data Mining in Social Networks},

booktitle = {In National Academy of Sciences Symposium on Dynamic Social Network Modeling and Analysis},

year = {2002},

pages = {2002}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. Several techniques for learning statistical models have been developed recently by researchers in machine learning and data mining. All of these techniques must address a similar set of representational and algorithmic choices and must face a set of statistical challenges unique to learning from relational data.

### Citations

4937 |
C4.5: Programs for Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ...e 5: An example relational probability tree (RPT) Our construction algorithm for RPTs is a recursive partitioning algorithm similar in spirit to CART (Breiman, Friedman, Olshen and Stone 1984), C4.5 (=-=Quinlan 1993-=-), and CHAID (Kass 1980). However, the RPT algorithm searches over the attributes of different object types in the subgraph and multiple methods of aggregating the values of those attributes and creat... |

3912 | Classification and regression trees - Breiman, ed - 1993 |

1355 | Toward principles for the design of ontologies used for knowledge sharing
- Gruber
- 1995
(Show Context)
Citation Context ... object types as well as two possible families of schemas constructed from those object types (a full schema would also specify a set of link types). Such a hierarchy is sometimes called an ontology (=-=Gruber 1993-=-). Querying and Learning Figure 2: An example ontology of movie objects. To address learning tasks of this kind, our research group is constructing PROXIMITY — a system for machine learning and data m... |

1055 | Stochastic logic programs
- Muggleton
(Show Context)
Citation Context ...stical model can be learned directly from data, easing the job of data analysts, and greatly improving the fidelity of the resulting model. Older techniques include inductive logic programming (ILP) (=-=Muggleton 1992-=-; Dzeroski and Lavrac 2001) and social network analysis (Wasserman and Faust 1994). For example, we have employed relational probability trees (RPTs) to learn models that predict the box office succes... |

510 | Learning probabilistic relational models - Getoor, Friedman, et al. - 2001 |

157 |
An Exploratory Technique for Investigating Large Quantities of Categorical Data
- Kass
- 1980
(Show Context)
Citation Context ... probability tree (RPT) Our construction algorithm for RPTs is a recursive partitioning algorithm similar in spirit to CART (Breiman, Friedman, Olshen and Stone 1984), C4.5 (Quinlan 1993), and CHAID (=-=Kass 1980-=-). However, the RPT algorithm searches over the attributes of different object types in the subgraph and multiple methods of aggregating the values of those attributes and creating binaryssplits on th... |

152 | Theories for mutagenicity: a study in first-order and feature-based induction - Srinivasan, Muggleton, et al. - 1996 |

108 |
Relational Data Mining
- Dˇzeroski, Lavrač
- 2001
(Show Context)
Citation Context ... be learned directly from data, easing the job of data analysts, and greatly improving the fidelity of the resulting model. Older techniques include inductive logic programming (ILP) (Muggleton 1992; =-=Dzeroski and Lavrac 2001-=-) and social network analysis (Wasserman and Faust 1994). For example, we have employed relational probability trees (RPTs) to learn models that predict the box office success of a movie based on attr... |

105 | Probabilistic Classification and Clustering in Relational Data - Taskar, Segal, et al. |

104 | B.: Learning probabilistic models of relational structure - Getoor, Friedman, et al. - 2001 |

95 | Linkage and autocorrelation cause feature selection bias in relational learning - Jensen, Neville - 2002 |

73 | Multiple comparisons in induction algorithms
- Jensen, Cohen
- 2000
(Show Context)
Citation Context ...reached. Our current stopping criteria uses a Bonferroni-adjusted chi-square test analogous to that used in CHAID. However, such methods face a variety of problems due to multiple comparison effects (=-=Jensen and Cohen 2000-=-), and we are exploring the use of randomization tests (Jensen 1992) to better adjust for such effects. This two-step approach of querying and then learning is necessary because of the semistructured ... |

55 | M.: Relational clichés: Constraining constructive induction during relational learning
- Silverstein, Pazzani
- 1991
(Show Context)
Citation Context ...d evaluated when constructing the tree. Some techniques such as ILP offer far more extensive search of such "constructed" attributes, greatly expanding the set of possible models that can be learned (=-=Silverstein and Pazzani 1991-=-). Other techniques do no search whatsoever, relying on the existing attributes on objects and links. • Use of background knowledge — Data analysts often have substantial background knowledge that can... |

42 | 1BC: A first-order Bayesian classifier
- Flach, Lachiche
- 1999
(Show Context)
Citation Context ...tional data include probabilistic relational models (PRMs) (Friedman, Getoor, Koller, and Pfeffer 1999), Bayesian logic programs (BLPs) (Kersting and de Raedt 2000), first-order Bayesian classifiers (=-=Flach and Lachiche 1999-=-), and relational probability trees (RPTs) (Jensen and Neville 2002). In each of these cases, both the structure and the parameters of a statistical model can be learned directly from data, easing the... |

18 |
Supporting relational knowledge discovery: Lessons in architecture and algorithm design
- Neville, Jensen
- 2002
(Show Context)
Citation Context ...set, and it can bias analysis in important ways. To enable truly effective data mining, analysts must be able to change the schema easily, and thus reconceptualize the domain (Jensen & Neville 2002b; =-=Neville & Jensen 2002-=-). Comparison and Contrast Techniques for relational learning can be better understood by examining them in the context of a set of design choices and statistical issues. This section describes severa... |

16 | A visual language for querying and updating graphs - Blau, Immerman, et al. - 2002 |

10 | Schemas and models - Jensen, Neville - 2002 |

6 |
Induction with Randomization Testing
- Jensen
- 1992
(Show Context)
Citation Context ...test analogous to that used in CHAID. However, such methods face a variety of problems due to multiple comparison effects (Jensen and Cohen 2000), and we are exploring the use of randomization tests (=-=Jensen 1992-=-) to better adjust for such effects. This two-step approach of querying and then learning is necessary because of the semistructured data model that underlies Proximity. In Proximity's graph database,... |

1 | Luc De Raedt (2000). Bayesian logic programs - Kersting |