Results 1 -
4 of
4
Pagelevel template detection via isotonic smoothing
- In Proc. of In Conference on World Wide Web
, 2007
"... We develop a novel framework for the page-level template detection problem. Our framework is built on two main ideas. The first is the automatic generation of training data for a classifier that, given a page, assigns a templateness score to every DOM node of the page. The second is the global smoot ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We develop a novel framework for the page-level template detection problem. Our framework is built on two main ideas. The first is the automatic generation of training data for a classifier that, given a page, assigns a templateness score to every DOM node of the page. The second is the global smoothing of these per-node classifier scores by solving a regularized isotonic regression problem; the latter follows from a simple yet powerful abstraction of templateness on a page. Our extensive experiments on human-labeled test data show that our approach detects templates effectively.
Strict L ∞ Isotonic Regression
"... Given a real-valued function f with weights w on a finite DAG G = (V, E), an isotonic regression of (f, w) is an order-preserving real-valued function on V which minimizes the regression error among all such functions. When the regression error is defined via the L ∞ norm typically there is not a un ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Given a real-valued function f with weights w on a finite DAG G = (V, E), an isotonic regression of (f, w) is an order-preserving real-valued function on V which minimizes the regression error among all such functions. When the regression error is defined via the L ∞ norm typically there is not a unique isotonic regression, unlike the behavior for the Lp norms, 1 < p < ∞. Here a partial ordering is imposed on isotonic regressions, one that refines the notion of minimizing the largest regression errors. This order results in a unique minimal L ∞ isotonic regression, called the strict L ∞ isotonic regression. Further, strict L ∞ isotonic regression is the limit, as p goes to infinity, of Lp isotonic regression. Algorithms are given showing that if G has n vertices, then for linear or tree orderings pool adjacent violators (PAV) yields the strict isotonic regression in or Θ(n logn) time, and for arbitrary DAGs it can be determined in time proportional to the time required to generate the transitive closure. Several algorithms for generating non-strict L ∞ isotonic regressions have previously appeared in the literature. We examine their behavior as mappings from weighted functions over G to isotonic functions over G, showing that the fastest algorithms are not monotonic mappings, and no previously studied algorithm preserves level set trimming. In contrast, the strict L ∞ isotonic regression, and Lp regression for all 1 < p < ∞, is monotonic and preserves level set trimming. 1
An Approach to Computing Multidimensional Isotonic Regressions
"... This paper gives an approach for determining isotonic regressions for data at points in multidimensional space, with the ordering given by domination. Recent algorithmic advances for 2-dimensional isotonic regressions have made them useful for significantly larger data sets, and here we provide an a ..."
Abstract
- Add to MetaCart
This paper gives an approach for determining isotonic regressions for data at points in multidimensional space, with the ordering given by domination. Recent algorithmic advances for 2-dimensional isotonic regressions have made them useful for significantly larger data sets, and here we provide an advance for dimensions 3 and larger. Given a set V of n d-dimensional points, it is shown that an isotonic regression on V can be determined in ˜ Θ(n2), ˜ Θ(n3), and ˜ Θ(n) time for the L1, L2, and L ∞ metrics, respectively. This improves upon previous results by a factor of ˜ Θ(n). The core of the approach is in extending the regression to a set of points V ′ ⊃ V where the domination ordering on V ′ can be represented with relatively few edges.
Enhanced Hierarchical . . . Smoothing
, 2008
"... Hierarchical topic taxonomies have proliferated on the World Wide Web [5, 18], and exploiting the output space decompositions they induce in automated classification systems is an active area of research. In many domains, classifiers learned on a hierarchy of classes have been shown to outperform t ..."
Abstract
- Add to MetaCart
Hierarchical topic taxonomies have proliferated on the World Wide Web [5, 18], and exploiting the output space decompositions they induce in automated classification systems is an active area of research. In many domains, classifiers learned on a hierarchy of classes have been shown to outperform those learned on a flat set of classes. In this paper we argue that the hierarchical arrangement of classes leads to intuitive relationships between the corresponding classifiers ’ output scores, and that enforcing these relationships as a post-processing step after classification can improve its accuracy. We formulate the task of smoothing classifier outputs as a regularized isotonic tree regression problem, and present a dynamic programming based method that solves it optimally. This new problem generalizes the classic isotonic tree regression problem, and both, the new formulation and algorithm, might be of independent interest. In our empirical analysis of two real-world text classification scenarios, we show that our approach to smoothing classifier outputs results in improved classification accuracy.

