## Boosting for transfer learning (2007)

### Cached

### Download Links

- [imls.engr.oregonstate.edu]
- [www.cs.ust.hk]
- [www.machinelearning.org]
- [www.cse.ust.hk]
- [143.89.40.4]
- [www.cse.ust.hk]
- DBLP

### Other Repositories/Bibliography

Venue: | In ICML |

Citations: | 100 - 11 self |

### BibTeX

@INPROCEEDINGS{Dai07boostingfor,

author = {Wenyuan Dai and Qiang Yang and Gui-rong Xue and Yong Yu},

title = {Boosting for transfer learning},

booktitle = {In ICML},

year = {2007}

}

### OpenURL

### Abstract

Traditional machine learning makes a basic assumption: the training and test data should be under the same distribution. However, in many cases, this identicaldistribution assumption does not hold. The assumption might be violated when a task from one new domain comes, while there are only labeled data from a similar old domain. Labeling the new data can be costly and it would also be a waste to throw away all the old data. In this paper, we present a novel transfer learning framework called TrAdaBoost, which extends boosting-based learning algorithms (Freund & Schapire, 1997). TrAdaBoost allows users to utilize a small amount of newly labeled data to leverage the old data to construct a high-quality classification model for the new data. We show that this method can allow us to learn an accurate model using only a tiny amount of new data and a large amount of old data, even when the new data are not sufficient to train a model alone. We show that TrAdaBoost allows knowledge to be effectively transferred from the old data to the new. The effectiveness of our algorithm is analyzed theoretically and empirically to show that our iterative algorithm can converge well to an accurate model.

### Citations

2538 | A decision-theoretic generalization of on-line learning and an application to boosting
- Freund, Schapire
- 1995
(Show Context)
Citation Context ... costly and it would also be a waste to throw away all the old data. In this paper, we present a novel transfer learning framework called TrAdaBoost, which extends boosting-based learning algorithms (=-=Freund & Schapire, 1997-=-). TrAdaBoost allows users to utilize a small amount of newly labeled data to leverage the old data to construct a high-quality classification model for the new data. We show that this method can allo... |

2115 | Sample Selection Bias as a Specification Error - Heckman - 1979 |

1449 |
On information and sufficiency
- S, Leibler
- 1951
(Show Context)
Citation Context ... the category rec, while negative ones from talk. The other data sets are named in the same way. The diff-distribution and same-distribution data sets are split based on subcategories. KL-divergence (=-=Kullback & Leibler, 1951-=-) on the feature space between each corresponding diffdistribution and same-distribution sets is presented in this table. It can be seen that the KL-divergences for all the data sets are much larger t... |

1442 | A training algorithm for optimal margin classifiers
- Boser
- 1992
(Show Context)
Citation Context ...es for all the data sets are much larger than the samedistribution case in which the KL-divergence should be close to zero. 5.2. Comparison Methods In the experiments, we use Support Vector Machines (=-=Boser et al., 1992-=-; Joachims, 1999) as the basic learners in TrAdaBoost. SVM light (Joachims, 2002) with linear kernel is applied in the experiments to implement the SVM and TSVM classifiers. Furthermore, we also added... |

777 | Boosting the margin : A new explanation for the effectiveness of voting methods. Annals of Statistics
- Schapire, Freund, et al.
- 1998
(Show Context)
Citation Context ...terations on the people vs places data set. Although the curves are not quite smooth, they converge well, which accords with the theoretical analysis in Section 4 and the past results about AdaBoost (=-=Schapire et al., 1997-=-). But, this is accompanied with a low rate of convergence. In Figure 3, TrAdaBoost does not converge well until at least 50 iterations. Finally, we test how the difference in the distribution between... |

730 | Transductive inference for text classification using support vector machines
- Joachims
- 1999
(Show Context)
Citation Context ...sets are much larger than the samedistribution case in which the KL-divergence should be close to zero. 5.2. Comparison Methods In the experiments, we use Support Vector Machines (Boser et al., 1992; =-=Joachims, 1999-=-) as the basic learners in TrAdaBoost. SVM light (Joachims, 2002) with linear kernel is applied in the experiments to implement the SVM and TSVM classifiers. Furthermore, we also added some constraint... |

632 | A brief introduction to boosting
- Schapire
- 1999
(Show Context)
Citation Context ...s, m is the size of thessame-distribution training data Ts, andɛ is the error on the same-distribution training data from hf . The conclusion given by Theorem 4 is the same as what has been shown in (=-=Schapire, 1999-=-). Theorem 4 presents an upper-bound of the generalization error on the same-distribution data. The upper-bound in Equation (4) shows that the generalization error depends on the same-distribution tra... |

501 | Multitask learning
- Caruana
- 1997
(Show Context)
Citation Context ...ransfer learning. Early transfer learning works raised some important issues, such as learning how to learn (Schmidhuber, 1994), learning one more thing (Thrun & Mitchell, 1995), multi-task learning (=-=Caruana, 1997-=-). A related topic is multi-task learning whose objective is to discover the common knowledge in multiple tasks. This common knowledge belongs to almost all the tasks, and is helpful for solving a new... |

433 |
Learning to Classify Text Using Support Vector Machines
- Joachims
- 2001
(Show Context)
Citation Context ... KL-divergence should be close to zero. 5.2. Comparison Methods In the experiments, we use Support Vector Machines (Boser et al., 1992; Joachims, 1999) as the basic learners in TrAdaBoost. SVM light (=-=Joachims, 2002-=-) with linear kernel is applied in the experiments to implement the SVM and TSVM classifiers. Furthermore, we also added some constraints to the basic learners to avoid the case of training weights be... |

156 | Domain adaptation for statistical classifiers - Daumé, Marcu |

142 | Correcting sample selection bias by unlabeled data - Huang, Smola, et al. - 2006 |

129 | Improving predictive inference under covariate shift by weighting the log-likelihood function
- Shimodaira
- 2000
(Show Context)
Citation Context ...y data, and discussed when transfer learning would improve the performance and when decrease. Another closely related task is learning under sample selection bias (Zadrozny, 2004) or covariate shift (=-=Shimodaira, 2000-=-), which deals with the case when all the same-distribution data are unlabeled. In the Novelprize work, Heckman (1979) investigated correcting sample selection bias in econometrics. Bickel and Scheffe... |

93 | 2003. Exploiting task relatedness for multiple task learning - Ben-David, Schuller |

65 | Improving SVM accuracy by training on auxiliary data sources
- Wu, Dietterich
- 2004
(Show Context)
Citation Context ...s the baselines, we also compare TrAdaBoost with the method developed for learning with auxiliary data proposed by Wu and Dietterich (2004), which is denoted as AUX. The parameter Cp /Ca (as used in (=-=Wu & Dietterich, 2004-=-)) is set to 4 after tuning. Our framework TrAdaBoost with SVM and TSVM as the basic learners has been performed in the experiments. We denote them as TrAdaBoost(SVM) andsTable 2. The descriptions of ... |

62 | Learning one more thing
- Thrun, Mitchell
- 1995
(Show Context)
Citation Context ...osed new approaches to solve the problems of transfer learning. Early transfer learning works raised some important issues, such as learning how to learn (Schmidhuber, 1994), learning one more thing (=-=Thrun & Mitchell, 1995-=-), multi-task learning (Caruana, 1997). A related topic is multi-task learning whose objective is to discover the common knowledge in multiple tasks. This common knowledge belongs to almost all the ta... |

38 | Dirichlet-enhanced spam filtering based on biased samples - Bickel, Scheffer - 2007 |

30 | Logistic Regression with An auxiliary Data Source - Liao, Ya, et al. - 2004 |

27 | Correcting sample selection bias in maximum entropy density estimation
- Dudík, Schapire, et al.
- 2006
(Show Context)
Citation Context ...ction bias in econometrics. Bickel and Scheffer (2007) studied the sample selection bias problem in the spam filtering domain. Other researches addressing on correcting sample selection bias include (=-=Dudík et al., 2006-=-; Huang et al., 2007) etc. 3. Transfer Learning through TrAdaBoost To enable transfer learning, we use part of the labeled training data that have the same distribution as the test data to play a role... |

2 |
Create a new context c such that lex(c) = l
- Schmidhuber
- 1994
(Show Context)
Citation Context ...rning research. Several researchers have proposed new approaches to solve the problems of transfer learning. Early transfer learning works raised some important issues, such as learning how to learn (=-=Schmidhuber, 1994-=-), learning one more thing (Thrun & Mitchell, 1995), multi-task learning (Caruana, 1997). A related topic is multi-task learning whose objective is to discover the common knowledge in multiple tasks. ... |