## Kernelizing the output of tree-based methods (2006)

### Cached

### Download Links

- [eprints.pascal-network.org]
- [www.montefiore.ulg.ac.be]
- [imls.engr.oregonstate.edu]
- [www.icml2006.org]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the 23rd International Conference on Machine Learning Edited by: Cohen W, Moore A. ACM |

Citations: | 10 - 5 self |

### BibTeX

@INPROCEEDINGS{Wehenkel06kernelizingthe,

author = {Louis Wehenkel},

title = {Kernelizing the output of tree-based methods},

booktitle = {In Proceedings of the 23rd International Conference on Machine Learning Edited by: Cohen W, Moore A. ACM},

year = {2006},

pages = {345--352},

publisher = {ACM}

}

### OpenURL

### Abstract

We extend tree-based methods to the prediction of structured outputs using a kernelization of the algorithm that allows one to grow trees as soon as a kernel can be defined on the output space. The resulting algorithm, called output kernel trees (OK3), generalizes classification and regression trees as well as treebased ensemble methods in a principled way. It inherits several features of these methods such as interpretability, robustness to irrelevant variables, and input scalability. When only the Gram matrix over the outputs of the learning sample is given, it learns the output kernel as a function of inputs. We show that the proposed algorithm works well on an image reconstruction task and on a biological network inference problem. 1.

### Citations

3969 |
Classification and regression trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...ction f : X → Y that minimizes the expectation of some loss function over the joint distribution of input/output pairs: Ex,y{ℓ(f(x), y)}. (1)s2.1. Standard regression trees Standard regression trees (=-=Breiman et al., 1984-=-) propose a solution to this problem when the output space is the real axis, Y = IR, and the loss functions ℓ is the square error, ℓ(f(x), y) = (f(x) − y) 2 . The general idea of regression trees is t... |

2529 | Bagging predictors
- Breiman
- 1996
(Show Context)
Citation Context ...ents, we will compare OK3 with single trees and OK3 ensembles grown with the randomization of bagging and extra-trees. Bagging grows eachstree from a bootstrap sample of the original learning sample (=-=Breiman, 1996-=-) while locally maximizing scores to choose splits. The extra-trees, on the other hand, are grown from the complete learning sample while randomizing the choice of the split at each node. We refer the... |

2054 | Learning with Kernels
- Scholkopf, Smola
- 2002
(Show Context)
Citation Context ...8) leads to the prediction (4). The pre-image problem is common to all methods learning from kernelized outputs (see Section 2.8). Several techniques have been proposed to approximate (8) (see, e.g., =-=Scholkopf & Smola, 2002-=-; Ralaivola & d’Alché-Buc, 2003). The approximation we propose is inspired from the simple approximation used in (Weston et al., 2002) which replaces the arg miny ′ ∈Y over Y by an arg miny ′ ∈LS over... |

382 | Large margin methods for structured and interdependent output variables
- Tsochantaridis, Joachims, et al.
(Show Context)
Citation Context ...on of a kernel. Already successful for structured inputs, original kernel-based methods have recently been proposed to address the structured output problem (Weston et al., 2002; Cortes et al., 2005; =-=Tsochantaridis et al., 2005-=-; Taskar et al., 2005; Weston et al., 2005). Appearing in Proceedings of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, 2006. Copyright 2006 by the author(s)/owner(s). In this... |

187 | Diffusion kernels on graphs and other discrete input spaces
- Kondor, Lafferty
- 2002
(Show Context)
Citation Context ...lar locations; (3) Phylogenetic profile: 145 boolean variables coding the presence/absence of an orthologuous protein across 145 organisms. To represent the graph structure we use a diffusion kernel (=-=Kondor & Lafferty, 2002-=-), yielding a Gram matrix K = exp(−βL), (where L = D − A is the Laplacian, D the diagonal matrix of node connectivities, and A the adjacency matrix). We evaluate our algorithm by 10-fold cross-validat... |

166 | Learning structured prediction models: a large margin approach
- Taskar, Chatalbashev, et al.
- 2005
(Show Context)
Citation Context ...ssful for structured inputs, original kernel-based methods have recently been proposed to address the structured output problem (Weston et al., 2002; Cortes et al., 2005; Tsochantaridis et al., 2005; =-=Taskar et al., 2005-=-; Weston et al., 2005). Appearing in Proceedings of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, 2006. Copyright 2006 by the author(s)/owner(s). In this paper, we start from... |

134 | Extremely randomized trees
- Geurts, Ernst, et al.
(Show Context)
Citation Context ...nly for pairs of inputs which reach at least in one of the trees the same leaf, and it is actually a (positive semi-definite) kernel over the input space inferred in a supervised way by tree growing (=-=Geurts et al., 2006-=-). An ensemble prediction can thus be obtained by an approximate pre-image in the following way: � ˆyT (x) = arg min y ′ ∈Y ||φ(y′ NLS ) − i=1 = arg min y ′ ∈Y k(y′ , y ′ NLS )−2 i=1 kT (xi, x)φ(yi)||... |

124 | B.: Kernel k-Means: spectral clustering and normalized cuts
- Dhillon, Guan, et al.
(Show Context)
Citation Context ...roblems. The price to pay however is an increase of computing time, partially due to the requirement of pre-image computations during the training stage. Our method is also related to kernel k-means (=-=Dhillon et al., 2004-=-). Both methods try to minimize the same loss function and are based on the same kernel trick, i.e., the computation of the average distance to the center of mass from kernel evaluations only. The mai... |

104 | Top-down induction of clustering trees - Blockeel, Raedt, et al. - 1998 |

85 |
Kanehisa M: Protein network inference from multiple genomic data: a supervised approach
- Yamanishi, JP
(Show Context)
Citation Context ... data only with the one in (Yamanishi et al., 2005), we observe here a much higher AUC, which suggests that OK3 is well suited for handling discrete attributes. Discussion. The algorithm proposed in (=-=Yamanishi et al., 2004-=-) and (Vert & Yamanishi, 2004) determines a mapping of inputs x(v) into a vector f(x(v)) of IR L , such that vertices v, v ′ of G ′ known to be adjacent are mapped to nearby vectors f(x(v)), f(x(v ′ )... |

63 |
Kernel dependency estimation
- Weston, Chapelle, et al.
- 2002
(Show Context)
Citation Context ...e of a space of interest into the definition of a kernel. Already successful for structured inputs, original kernel-based methods have recently been proposed to address the structured output problem (=-=Weston et al., 2002-=-; Cortes et al., 2005; Tsochantaridis et al., 2005; Taskar et al., 2005; Weston et al., 2005). Appearing in Proceedings of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, 2006.... |

41 | The em algorithm for Kernel matrix completion with auxiliary data
- Tsuda, Akaho, et al.
- 2004
(Show Context)
Citation Context ...input space. Our second experiment below will show that the problem of supervised graph inference may be formulated like this. This provides also a way to solve the kernel completion task defined in (=-=Tsuda et al., 2003-=-). With respect to the transductive approach of (Tsuda et al., 2003) however, we build a model in the form of a function ˆ kT and do not exploit inputs of the test samples. Although this will not be i... |

31 |
Supervised graph inference
- Vert, Yamanishi
- 2005
(Show Context)
Citation Context ...zation of tree-based methods and its properties. Section 3 describes and comments numerical experiments on two problems: a pattern completion task (Weston et al., 2002) and a graph inference problem (=-=Vert & Yamanishi, 2004-=-). Section 4 provides some perspectives. 2. Output kernel trees The general problem of supervised learning may be formulated as follows: from a learning sample LS = {(xi, yi)|i = 1, . . . , NLS} with ... |

30 | A general regression technique for learning transductions
- Cortes, Mohri, et al.
- 2005
(Show Context)
Citation Context ...est into the definition of a kernel. Already successful for structured inputs, original kernel-based methods have recently been proposed to address the structured output problem (Weston et al., 2002; =-=Cortes et al., 2005-=-; Tsochantaridis et al., 2005; Taskar et al., 2005; Weston et al., 2005). Appearing in Proceedings of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, 2006. Copyright 2006 by th... |

24 |
Tree structured methods for longitudinal data
- Segal
- 1992
(Show Context)
Citation Context ...r below its worst case value of O(N 2 LS ). 2.8. Related algorithms OK3 is related to a couple of works in the tree world. To the best of our knowledge, multiple output regression trees date back to (=-=Segal, 1992-=-) which has proposed to replace variance (2) by an average Mahalanobis distance to the center of mass. This is strictly equivalent in using OK3 with a kernel k(yi, yj) = yiV −1 y T j Kernelizing the O... |

22 | Hierarchical multi-classification
- Blockeel, Bruynooghe, et al.
- 2002
(Show Context)
Citation Context ... Our work is also closely related to the predictive clustering trees (PCT) proposed by Blockeel et al. (1998) and applied, e.g., for hierarchical classification (Todorovski et al., 2002) and ranking (=-=Blockeel et al., 2002-=-). PCT generalize classification and regression trees by replacing � the notion of variance by the 1 N general form: i=1 d(yi, p) 2 where d is some arbiN trary distance metric and p, called the protot... |

18 | Dynamical modeling with kernels for nonlinear time series prediction - Ralaivola |

15 | O.: Joint kernel maps
- Weston, Schoelkopf, et al.
(Show Context)
Citation Context ...inputs, original kernel-based methods have recently been proposed to address the structured output problem (Weston et al., 2002; Cortes et al., 2005; Tsochantaridis et al., 2005; Taskar et al., 2005; =-=Weston et al., 2005-=-). Appearing in Proceedings of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, 2006. Copyright 2006 by the author(s)/owner(s). In this paper, we start from a different family o... |

8 | Relational ranking with predictive clustering trees - Dˇzeroski, Todorovski, et al. |