#### DMCA

## Cost-Sensitive Tree of Classifiers

### Cached

### Download Links

Citations: | 13 - 6 self |

### Citations

5792 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...ization. In particular, we first train gradient boosted regression trees with a squared loss penalty (Friedman, 2001), H ′(xi) = ∑T t=1 ht(xi), where each function ht(·) is a limited-depth CART tree (=-=Breiman, 1984-=-). We then apply the mapping xi → φ(xi) to all inputs, where φ(xi) = [h1(xi), . . . , hT (xi)] ⊤ . To avoid confusion between CART trees and the CSTC tree, we refer to CART trees ht(·) as weak learner... |

1826 | Robust real-time face detection - Viola, Jones - 2004 |

1292 | Least angle regression
- Efron, Hastie, et al.
- 2004
(Show Context)
Citation Context ...refully constructed to reduce the average testtime complexity of machine learning algorithms, while maximizing their accuracy. Different from prior work, which reduces the total cost for every input (=-=Efron et al., 2004-=-) or which stages the feature extraction into linear cascades (Viola & Jones, 2004; Lefakis & Fleuret, 2010; Saberian & Vasconcelos, 2010; Pujara et al., 2011; Chen et al., 2012), a CSTC tree incorpor... |

1006 |
Learning with kernels: support vector machines, regularization, optimization, and beyond
- Schölkopf, Smola
- 2002
(Show Context)
Citation Context ... Note that for a single element this relaxation relaxes the l0 norm to the l1 norm, ‖aij‖0 → √ (aij) 2 = |aij|, and recovers the commonly used approximation to encourage sparsity (Efron et al., 2004; =-=Schölkopf & Smola, 2001-=-). We plug the cost-term (2) into the loss in (1) and apply the relaxation (3) to all l0 norms to obtain ( ∑ ∑ ℓi+ρ|β| +λ et|βt| + ∑ √ ∑ (Fαtβt) 2 ) , i } {{ } regularized loss t } {{ } ev. cost penal... |

943 | Greedy function approximation: A gradient boosting machine
- Friedman
(Show Context)
Citation Context ... construction of the weak learners. It should be noted that, in spite of the high accuracy achieved by these techniques, the algorithms are based heavily on stage-wise regression (gradient boosting) (=-=Friedman, 2001-=-), and are less likely to work with more general weak learners. Gao & Koller (2011) use locally weighted regression during test time to predict the information gain of unknown features. Different from... |

873 | Hierarchical mixtures of experts and the EM algorithm
- Jordan, Jacobs
- 1994
(Show Context)
Citation Context ...average precision over time in an object detection setting. In this case, the dataset has multi-labeled inputs and thus warrants a different approach than ours. Hierarchical Mixture of Experts (HME) (=-=Jordan & Jacobs, 1994-=-) also builds tree-structured classifiers. However, in contrast to CSTC, this work is not motivated by reductions in test-time cost and results in fundamentally different models. In CSTC, each classif... |

654 | Cumulated Gain-based Evaluation of ir Techniques - Jarvelin |

132 | Feature hashing for large scale multitask learning
- Weinberger, Dasgupta, et al.
(Show Context)
Citation Context ...ies of the current state-of-the-art at a small fraction of the computational cost. 1. Introduction Machine learning algorithms are widely used in many real-world applications, ranging from emailspam (=-=Weinberger et al., 2009-=-) and adult content filtering (Fleck et al., 1996), to web-search engines (Zheng et al., 2008). As machine learning transitions into these industry fields, managing the CPU cost at testtime becomes in... |

96 |
Convex Optimization. Cambridge Univ Pr
- Boyd, Vandenberghe
- 2004
(Show Context)
Citation Context ...mma 1. Given any g(x) > 0, the following holds: [ ] √ 1 g(x) g(x) = min + z . (7) z>0 2 z The lemma can be proved as z = √ g(x) minimizes the function on the right hand side. Further, it is shown in (=-=Boyd & Vandenberghe, 2004-=-) that the right hand side is jointly convex in x and z, so long as g(x) is convex. For each square-root or l1 term we introduce an auxiliary variable (i.e., z above) and alternate between minimizing ... |

84 | A general boosting method and its application to learning ranking functions for web search
- Zheng, Zha, et al.
(Show Context)
Citation Context ... Machine learning algorithms are widely used in many real-world applications, ranging from emailspam (Weinberger et al., 2009) and adult content filtering (Fleck et al., 1996), to web-search engines (=-=Zheng et al., 2008-=-). As machine learning transitions into these industry fields, managing the CPU cost at testtime becomes increasingly important. In applications of such large scale, computation must be budgeted and a... |

80 | Label embedding trees for large multi-class tasks
- Bengio, Weston, et al.
- 2010
(Show Context)
Citation Context ... the final prediction, incurring the same cost for all test-inputs. Recent tree-structured classifiers include the work of Deng et al. (2011), who speed up the training and evaluation of label trees (=-=Bengio et al., 2010-=-), by avoiding many binary one-vs-all classifier evaluations. Differently, we focus on problems in which feature extraction time dominates the test-time cost which motivates different algorithmic setu... |

66 | Yahoo! learning to rank challenge overview
- Chapelle, Chang
- 2011
(Show Context)
Citation Context ...nd testing data both approach 0. 5.2. Yahoo! Learning to Rank To evaluate the performance of CSTC on real-world tasks, we test our algorithm on the public Yahoo! Learning to Rank Challenge data set3 (=-=Chapelle & Chang, 2011-=-). The set contains 19,944 queries and 473,134 documents. Each query-document pair xi consists of 519 features. An extraction cost, which takes on a value in the set {1, 5, 20, 50, 100, 150, 200}, is ... |

49 | Sparse regression using mixed norms - Kowalski - 2009 |

38 | Fast and balanced: Efficient label tree learning for large scale object recognition - Deng, Satheesh, et al. |

30 | Early exit optimizations for additive machine learned ranking systems - Cambazoglu, Zaragoza, et al. - 2010 |

26 | Learning fast classifiers for image spam - Dredze, Gevaryahu, et al. - 2007 |

24 | The greedy miser: Learning under test-time budgets - Xu, Weinberger, et al. - 2012 |

23 | Active classification based on value of classifier - Gao, Koller - 2011 |

19 | Boosted multi-task learning - Chapelle, Shivaswamy, et al. - 2010 |

19 |
Classifier cascade for minimizing feature evaluation cost
- Chen, Xu, et al.
- 2012
(Show Context)
Citation Context ...cost for every input (Efron et al., 2004) or which stages the feature extraction into linear cascades (Viola & Jones, 2004; Lefakis & Fleuret, 2010; Saberian & Vasconcelos, 2010; Pujara et al., 2011; =-=Chen et al., 2012-=-), a CSTC tree incorporates input-dependent feature selection into training and dynamically allocates higher feature budgets for infrequently traveled tree-paths. By introducing a probabilistic tree-t... |

19 | Speedboost: Anytime prediction with uniform nearoptimality - Grubb, Bagnell - 2012 |

17 | Boosting classifier cascades
- Saberian, Vasconcelos
(Show Context)
Citation Context ...Different from prior work, which reduces the total cost for every input (Efron et al., 2004) or which stages the feature extraction into linear cascades (Viola & Jones, 2004; Lefakis & Fleuret, 2010; =-=Saberian & Vasconcelos, 2010-=-; Pujara et al., 2011; Chen et al., 2012), a CSTC tree incorporates input-dependent feature selection into training and dynamically allocates higher feature budgets for infrequently traveled tree-path... |

12 | Timely object recognition - Karayev, Baumgartner, et al. - 2012 |

12 | Joint cascade optimization using a product of boosted classifiers
- Lefakis, Fleuret
- 2010
(Show Context)
Citation Context ...ximizing their accuracy. Different from prior work, which reduces the total cost for every input (Efron et al., 2004) or which stages the feature extraction into linear cascades (Viola & Jones, 2004; =-=Lefakis & Fleuret, 2010-=-; Saberian & Vasconcelos, 2010; Pujara et al., 2011; Chen et al., 2012), a CSTC tree incorporates input-dependent feature selection into training and dynamically allocates higher feature budgets for i... |

11 | Local supervised learning through space partitioning
- Wang, Saligrame
- 2012
(Show Context)
Citation Context ...n tree or the risk. Possibly most similar to our work are (Busa-Fekete et al., 2012), who learn a directed acyclic graph via a Markov decision process to select features for different instances, and (=-=Wang & Saligrama, 2012-=-), who adaptively partition the feature space and learn local region-specific classifiers. Although each work is similar in motivation, the algorithmic frameworks are very different and can be regarde... |

10 | Using classifier cascades for scalable e-mail classification
- Pujara, Daumé, et al.
- 2011
(Show Context)
Citation Context ...ch reduces the total cost for every input (Efron et al., 2004) or which stages the feature extraction into linear cascades (Viola & Jones, 2004; Lefakis & Fleuret, 2010; Saberian & Vasconcelos, 2010; =-=Pujara et al., 2011-=-; Chen et al., 2012), a CSTC tree incorporates input-dependent feature selection into training and dynamically allocates higher feature budgets for infrequently traveled tree-paths. By introducing a p... |

10 | Fast classification using sparse decision DAGs
- Benbouzid, Busa-Fekete, et al.
- 2012
(Show Context)
Citation Context ...ree that reduces the feature extraction cost. Different from this work, they do not directly minimize the total test-time cost of the decision tree or the risk. Possibly most similar to our work are (=-=Busa-Fekete et al., 2012-=-), who learn a directed acyclic graph via a Markov decision process to select features for different instances, and (Wang & Saligrama, 2012), who adaptively partition the feature space and learn local... |

5 |
et al. Fast classification using sparse decision dags
- Busa-Fekete, Benbouzid, et al.
- 2012
(Show Context)
Citation Context ...tree that reduces the feature extraction cost. Different from this work, they do not directly minimize the total testtime cost of the decision tree or the risk. Possibly most similar to our work are (=-=Busa-Fekete et al., 2012-=-), who learn a directed acyclic graph via a Markov decision process to select features for different instances, and (Wang & Saligrama, 2012), who adaptively partition the feature space and learn local... |

4 |
Finding naked people. ECCV
- Fleck, Forsyth, et al.
- 1996
(Show Context)
Citation Context ...n of the computational cost. 1. Introduction Machine learning algorithms are widely used in many realworld applications, ranging from email-spam (Weinberger et al., 2009) and adult content filtering (=-=Fleck et al., 1996-=-), to web-search engines (Zheng et al., 2008). As machine learning transitions into these industry fields, managing the CPU cost at test-time becomes increasingly important. In applications of such la... |

1 | Cost-Sensitive Tree of Classifiers - Chapelle, Shivaswamy, et al. |