Results

**1 - 5**of**5**### Research on Statistical Relational Learning

"... This paper presents an overview of the research on learning statistical models of relational data being carried out at the University of Washington. Our work falls into five main directions: learning models of social networks; learning models of sequential relational processes; scaling up stati ..."

Abstract
- Add to MetaCart

This paper presents an overview of the research on learning statistical models of relational data being carried out at the University of Washington. Our work falls into five main directions: learning models of social networks; learning models of sequential relational processes; scaling up statistical relational learning to massive data sources; learning for knowledge integration; and learning programs in procedural languages. We describe some of the common themes and research issues arising from this work.

### Test Set Bounds for Relational Data that vary with Strength of Dependence Test Set Bounds for Relational Data that Vary with Strength of Dependence

"... A large portion of the data that is collected in various application domains such as online social networking, finance, biomedicine, etc. is relational in nature. A subfield of Machine Learning namely; Statistical Relational Learning (SRL) is concerned with performing statistical inference on relati ..."

Abstract
- Add to MetaCart

A large portion of the data that is collected in various application domains such as online social networking, finance, biomedicine, etc. is relational in nature. A subfield of Machine Learning namely; Statistical Relational Learning (SRL) is concerned with performing statistical inference on relational data. A defining property of relational data that separates it from independently and identically distributed data (i.i.d.) is the existence of correlations between individual datapoints. A major portion of the theory developed in machine learning assumes the data is i.i.d. In this paper we develop theory for the relational setting. In particular, we derive distribution-free bounds on the generalization error of a classifier for the relational setting, where the class of data generation models we consider are inspired from the type joint distributions that are represented by relational classification models developed by the SRL community. A key aspect of the bound we derive is that the tightness of the bound is a function of the strength of dependence between related datapoints, with the bound reducing to the standard Hoeffding’s or McDiarmid’s inequality when there is no dependence. To the best of our knowledge this is the first bound for relational data whose tightness varies with the strength of dependence. Moreover, the bound provides insight in the computation of effective sample size which is an important notion introduced by Jensen and Neville (2002).

### COMPUTATIONAL TECHNIQUES FOR INFERRING REGULATORY NETWORKS

"... To Mom, for making this dream possible, Ian, for supporting and sharing it and Lillian for making it all worthwhile. ii In this era where healthcare is one of the world’s largest and fastest growing industries, there is great interest in understanding what is happening within our cells and organs at ..."

Abstract
- Add to MetaCart

To Mom, for making this dream possible, Ian, for supporting and sharing it and Lillian for making it all worthwhile. ii In this era where healthcare is one of the world’s largest and fastest growing industries, there is great interest in understanding what is happening within our cells and organs at the molecular level. Fortunately, innovations and improvements in technology continue to spur the quantity and types of high-throughput (a process where large amounts of samples can be measured by a system at once) biological data that can be measured. Additionally, abundant information from many years of detailed research can be found in annotated or computationally extracted databases. These data sets, especially combined, have great potential for novel discoveries that can lead to advances in biology and medicine. The main focus of this thesis is the investigation of machine learning techniques for inferring gene regulatory networks from the combination of high-throughput time series gene expression array data and other data sources. A gene regulatory network is a collection

### Test Set Bounds for Relational Data that vary with Strength of Dependence AMIT DHURANDHAR

"... A large portion of the data that is collected in various application domains such as online social networking, finance, biomedicine, etc. is relational in nature. A subfield of Machine Learning namely, Statistical Relational Learning (SRL) is concerned with performing statistical inference on relati ..."

Abstract
- Add to MetaCart

A large portion of the data that is collected in various application domains such as online social networking, finance, biomedicine, etc. is relational in nature. A subfield of Machine Learning namely, Statistical Relational Learning (SRL) is concerned with performing statistical inference on relational data. A defining property of relational data that separates it from independently and identically distributed data (i.i.d.) is the existence of correlations between individual datapoints. A major portion of the theory developed in machine learning assumes the data is i.i.d. In this paper we develop theory for the relational setting. In particular, we derive distribution-free bounds on the generalization error of a classifier for the relational setting, where the class of data generation models we consider are inspired from the type joint distributions that are represented by relational classification models developed by the SRL community. A key aspect of the bound we derive is that the tightness of the bound is a function of the strength of dependence between related datapoints, with the bound reducing to the standard Hoeffding’s or McDiarmid’s inequality when there is no dependence. To the best of our knowledge this is the first bound for relational data whose tightness varies with the strength of dependence. Moreover, the bound provides insight in the computation of effective sample size which is an important notion introduced by [Jensen and

### Distribution-free Bounds for Relational Classification

"... Statistical Relational Learning (SRL) is a sub-area in Machine Learning which addresses the problem of performing statistical inference on data that is correlated and not independently and identically distributed (i.i.d.) -- as is generally assumed. For the traditional i.i.d. setting, distribution ..."

Abstract
- Add to MetaCart

Statistical Relational Learning (SRL) is a sub-area in Machine Learning which addresses the problem of performing statistical inference on data that is correlated and not independently and identically distributed (i.i.d.) -- as is generally assumed. For the traditional i.i.d. setting, distribution free bounds exist, such as the Hoeffding bound, which are used to provide confidence bounds on the generalization error of a classification algorithm given its hold-out error on a sample size of N. Bounds of this form are currently not present for the type of interactions that are considered in the data by relational classification algorithms. In this paper we extend the Hoeffding bounds to the relational setting. In particular, we derive distribution free bounds for certain classes of data generation models that do not produce i.i.d. data and are based on the type of interactions that are considered by relational classification algorithms that have been developed in SRL. We conduct empirical studies on synthetic and real data which show that these data generation models are indeed realistic and the derived bounds are tight enough for practical use.