## Two Algorithms for Transfer Learning

Citations: | 7 - 0 self |

### BibTeX

@MISC{Marx_twoalgorithms,

author = {Zvika Marx and Michael T. Rosenstein and Thomas G. Dietterich and Pack Kaelbling},

title = {Two Algorithms for Transfer Learning},

year = {}

}

### OpenURL

### Abstract

Summary. Transfer learning aims at improving the performance on a target task given some degree of learning on one or more source tasks. This chapter introduces two transfer learning algorithms that can be employed when the source and target domains share the same feature space and class labels. The first algorithm is a hierarchical Bayesian extension of naive Bayes; the second is a version of logistic regression in which the prior distribution over the weight values is learned from an ensemble of source tasks. The methods are tested on a real-world task of predicting whether a person will accept or decline a meeting invitation. The results demonstrate consistent successful transfer of learning when there is an ensemble of source tasks. 1

### Citations

1226 | Additive logistic regression: a statistical view of boosting
- Friedman, Hastie, et al.
- 2000
(Show Context)
Citation Context ... for the intercept weight, w0, is typically set to be relatively large to avoid excessive penalties for deviations from µ0. The model can be fit via iteratively reweighted least squares [9], boosting =-=[8]-=-, or improved iterative scaling [2]. Following Chelba and Acero [5], we adjust this scheme for transfer learning as follows. Let K be the number of source A tasks and n be the number of features in th... |

883 |
The Elements of
- Hastie, Tibshirani, et al.
- 2009
(Show Context)
Citation Context .... The variance for the intercept weight, w0, is typically set to be relatively large to avoid excessive penalties for deviations from µ0. The model can be fit via iteratively reweighted least squares =-=[9]-=-, boosting [8], or improved iterative scaling [2]. Following Chelba and Acero [5], we adjust this scheme for transfer learning as follows. Let K be the number of source A tasks and n be the number of ... |

605 | M.: On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
- Domingos, Pazzani
- 1997
(Show Context)
Citation Context ... logistic regression. 2.1 Hierarchical Naive Bayes The standard naive Bayes algorithm—which we call here flat naive Bayes— has proven to be effective for learning classifiers in non-transfer settings =-=[6]-=-. The flat naive Bayes algorithm constructs a separate probabilistic model for each output class, under the “naive” assumption that each feature has an independent impact on the probability of the cla... |

471 | Multitask learning
- Caruana
- 1997
(Show Context)
Citation Context ...s work has demonstrated that learning for some target B task can be effectively influenced by inductive bias learned from one or more source As2 Marx, Rosenstein, Kaelbling and Dietterich tasks e.g., =-=[1, 3, 13, 15]-=-. Even for the restricted class of problems addressed by supervised learning, transfer can be realized in many different ways. For instance, Caruana [3] trained a neural network on several tasks simul... |

170 | A comparison of prediction accuracy, complexity, and training time for thirty-three old and new classification algorithms
- Lim, Loh, et al.
- 2000
(Show Context)
Citation Context ...robability distribution. This joint distribution is then applied as a prior distribution in the target task. Logistic regression is one of the best known and most-effective methods for classification =-=[10]-=-. The logistic regression model has the following form: 1 P (y = 1|x) = � 1 + exp −(w0 + �n j=1 wjxj) � , (1) where y is the class label, x is a vector of n features, the wj are real-valued weights, a... |

155 | Slice sampling
- Neal
- 2003
(Show Context)
Citation Context ...o other parameters are very different by increasing the variance of the hyperprior. To compute the posterior distributions, we developed an extension of the “slice sampling” method introduced by Neal =-=[11]-=-. This method is easily extended to handle multiple source tasks simply by asserting that corresponding parameter values for each naive Bayes classifier are all drawn from a common hyperprior distribu... |

145 | A model of inductive bias learning
- Baxter
- 2000
(Show Context)
Citation Context ...s work has demonstrated that learning for some target B task can be effectively influenced by inductive bias learned from one or more source As2 Marx, Rosenstein, Kaelbling and Dietterich tasks e.g., =-=[1, 3, 13, 15]-=-. Even for the restricted class of problems addressed by supervised learning, transfer can be realized in many different ways. For instance, Caruana [3] trained a neural network on several tasks simul... |

140 | Is learning the n-th thing any easier than learning the first - Thrun - 1996 |

91 | Discovering structure in multiple learning tasks: The TC algorithm
- Thrun, O’Sullivan
- 1996
(Show Context)
Citation Context ...s work has demonstrated that learning for some target B task can be effectively influenced by inductive bias learned from one or more source As2 Marx, Rosenstein, Kaelbling and Dietterich tasks e.g., =-=[1, 3, 13, 15]-=-. Even for the restricted class of problems addressed by supervised learning, transfer can be realized in many different ways. For instance, Caruana [3] trained a neural network on several tasks simul... |

86 | R: A Survey of Smoothing Techniques for ME Models - SF, Rosenfeld |

71 |
Adaptation of maximum entropy capitalizer: Little data can help a lot
- Chelba, Acero
- 2004
(Show Context)
Citation Context ...s the training examples and j = 0, . . . , n indexes the features. Typically, the values µj = 0 and σj = σ are employed, with σ (a constant, positive value) set by holdout or cross-validation methods =-=[5]-=-. The variance for the intercept weight, w0, is typically set to be relatively large to avoid excessive penalties for deviations from µ0. The model can be fit via iteratively reweighted least squares ... |

62 | Improving SVM accuracy by training on auxiliary data sources
- Wu, Dietterich
(Show Context)
Citation Context |

42 |
The improved iterative scaling algorithm: A gentle introduction
- Berger
- 1997
(Show Context)
Citation Context ...ypically set to be relatively large to avoid excessive penalties for deviations from µ0. The model can be fit via iteratively reweighted least squares [9], boosting [8], or improved iterative scaling =-=[2]-=-. Following Chelba and Acero [5], we adjust this scheme for transfer learning as follows. Let K be the number of source A tasks and n be the number of features in the (common) feature space. First, we... |

22 | Composition of conditional random fields for transfer learning
- Sutton, McCallum
- 2005
(Show Context)
Citation Context ...get task. Wu and Dietterich [15] transferred source training examples either as support vectors or as constraints (or both) and demonstrated improved image classification by SVMs. Sutton and McCallum =-=[12]-=- demonstrated effective transfer by “cascading” a class of graphical models, with the predictions from one classifier serving as features for the next one in the cascade. The rest of this chapter is o... |