Results 1 
7 of
7
Pagelevel template detection via isotonic smoothing
 In Proc. of In Conference on World Wide Web
, 2007
"... We develop a novel framework for the pagelevel template detection problem. Our framework is built on two main ideas. The first is the automatic generation of training data for a classifier that, given a page, assigns a templateness score to every DOM node of the page. The second is the global smoot ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
We develop a novel framework for the pagelevel template detection problem. Our framework is built on two main ideas. The first is the automatic generation of training data for a classifier that, given a page, assigns a templateness score to every DOM node of the page. The second is the global smoothing of these pernode classifier scores by solving a regularized isotonic regression problem; the latter follows from a simple yet powerful abstraction of templateness on a page. Our extensive experiments on humanlabeled test data show that our approach detects templates effectively.
Enhanced hierarchical classification via isotonic smoothing
 Proc. Int’l. Conf. World Wide Web
, 2008
"... kpunera @ yahooinc.com Hierarchical topic taxonomies have proliferated on the World Wide Web [5, 18], and exploiting the output space decompositions they induce in automated classification systems is an active area of research. In many domains, classifiers learned on a hierarchy of classes have bee ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
kpunera @ yahooinc.com Hierarchical topic taxonomies have proliferated on the World Wide Web [5, 18], and exploiting the output space decompositions they induce in automated classification systems is an active area of research. In many domains, classifiers learned on a hierarchy of classes have been shown to outperform those learned on a flat set of classes. In this paper we argue that the hierarchical arrangement of classes leads to intuitive relationships between the corresponding classifiers ’ output scores, and that enforcing these relationships as a postprocessing step after classification can improve its accuracy. We formulate the task of smoothing classifier outputs as a regularized isotonic tree regression problem, and present a dynamic programming based method that solves it optimally. This new problem generalizes the classic isotonic tree regression problem, and both, the new formulation and algorithm, might be of independent interest. In our empirical analysis of two realworld text classification scenarios, we show that our approach to smoothing classifier outputs results in improved classification accuracy.
Strict L ∞ Isotonic Regression
"... Given a realvalued function f with weights w on a finite DAG G = (V, E), an isotonic regression of (f, w) is an orderpreserving realvalued function on V which minimizes the regression error among all such functions. When the regression error is defined via the L ∞ norm typically there is not a un ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Given a realvalued function f with weights w on a finite DAG G = (V, E), an isotonic regression of (f, w) is an orderpreserving realvalued function on V which minimizes the regression error among all such functions. When the regression error is defined via the L ∞ norm typically there is not a unique isotonic regression, unlike the behavior for the Lp norms, 1 < p < ∞. Here a partial ordering is imposed on isotonic regressions, one that refines the notion of minimizing the largest regression errors. This order results in a unique minimal L ∞ isotonic regression, called the strict L ∞ isotonic regression. Further, strict L ∞ isotonic regression is the limit, as p goes to infinity, of Lp isotonic regression. Algorithms are given showing that if G has n vertices, then for linear or tree orderings pool adjacent violators (PAV) yields the strict isotonic regression in or Θ(n logn) time, and for arbitrary DAGs it can be determined in time proportional to the time required to generate the transitive closure. Several algorithms for generating nonstrict L ∞ isotonic regressions have previously appeared in the literature. We examine their behavior as mappings from weighted functions over G to isotonic functions over G, showing that the fastest algorithms are not monotonic mappings, and no previously studied algorithm preserves level set trimming. In contrast, the strict L ∞ isotonic regression, and Lp regression for all 1 < p < ∞, is monotonic and preserves level set trimming. 1
Lipschitz unimodal and isotonic regression on paths and trees
, 2008
"... Let M = (V, A) be a planar graph, let γ ≥ 0 be a real parameter, and t: V → R a height function. A γLipschitz unimodal regression (γLUR) of t is a function s: V → R such that s has a unique local minimum, s(u) − s(v)  ≤ γ for each {u, v} ∈ A, and ‖s − t‖2 = ∑ v∈V (s(v) − t(v))2 is minimized. ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Let M = (V, A) be a planar graph, let γ ≥ 0 be a real parameter, and t: V → R a height function. A γLipschitz unimodal regression (γLUR) of t is a function s: V → R such that s has a unique local minimum, s(u) − s(v)  ≤ γ for each {u, v} ∈ A, and ‖s − t‖2 = ∑ v∈V (s(v) − t(v))2 is minimized. Here, a local minimum of s is a vertex v such that s(u)> s(v) for any neighbor u of v. For a directed planar graph, s: V → R is the γLipschitz isotonic regression (γLIR) of t if s(u) ≤ s(v) ≤ s(u)+γ for each directed edge (u, v) and ‖s − t‖2 is minimized. These problems arise, for example, in topological simplification of a height function. We present nearlineartime algorithms for LUR and LIR problems for two special cases where M is a path or a tree.
An Approach to Computing Multidimensional Isotonic Regressions
"... This paper gives an approach for determining isotonic regressions for data at points in multidimensional space, with the ordering given by domination. Recent algorithmic advances for 2dimensional isotonic regressions have made them useful for significantly larger data sets, and here we provide an a ..."
Abstract
 Add to MetaCart
This paper gives an approach for determining isotonic regressions for data at points in multidimensional space, with the ordering given by domination. Recent algorithmic advances for 2dimensional isotonic regressions have made them useful for significantly larger data sets, and here we provide an advance for dimensions 3 and larger. Given a set V of n ddimensional points, it is shown that an isotonic regression on V can be determined in ˜ Θ(n2), ˜ Θ(n3), and ˜ Θ(n) time for the L1, L2, and L ∞ metrics, respectively. This improves upon previous results by a factor of ˜ Θ(n). The core of the approach is in extending the regression to a set of points V ′ ⊃ V where the domination ordering on V ′ can be represented with relatively few edges.
Enhanced Hierarchical . . . Smoothing
, 2008
"... Hierarchical topic taxonomies have proliferated on the World Wide Web [5, 18], and exploiting the output space decompositions they induce in automated classification systems is an active area of research. In many domains, classifiers learned on a hierarchy of classes have been shown to outperform t ..."
Abstract
 Add to MetaCart
Hierarchical topic taxonomies have proliferated on the World Wide Web [5, 18], and exploiting the output space decompositions they induce in automated classification systems is an active area of research. In many domains, classifiers learned on a hierarchy of classes have been shown to outperform those learned on a flat set of classes. In this paper we argue that the hierarchical arrangement of classes leads to intuitive relationships between the corresponding classifiers ’ output scores, and that enforcing these relationships as a postprocessing step after classification can improve its accuracy. We formulate the task of smoothing classifier outputs as a regularized isotonic tree regression problem, and present a dynamic programming based method that solves it optimally. This new problem generalizes the classic isotonic tree regression problem, and both, the new formulation and algorithm, might be of independent interest. In our empirical analysis of two realworld text classification scenarios, we show that our approach to smoothing classifier outputs results in improved classification accuracy.
Weighted L ∞ Isotonic Regression
"... Algorithms are given for determining weighted L ∞ isotonic regressions satisfying order constraints given by a directed acyclic graph (dag) withnvertices andmedges. An algorithm is given takingΘ(mlogn) time for the general case. However, it relies on parametric search, so a practical approach is int ..."
Abstract
 Add to MetaCart
Algorithms are given for determining weighted L ∞ isotonic regressions satisfying order constraints given by a directed acyclic graph (dag) withnvertices andmedges. An algorithm is given takingΘ(mlogn) time for the general case. However, it relies on parametric search, so a practical approach is introduced, based on calculating prefix solutions. While not as fast in the general case, for linear and tree orderings prefix algorithms are used to determine isotonic and unimodal regressions inΘ(nlogn) time. Algorithms are also given for determining isotonic regressions when the values are constrained to a specified set of values, such as the integers, and for situations where there are significantly fewer different weights, or fewer different values, than vertices. L ∞ isotonic regressions are not unique, so we examine properties of the regressions an algorithm produces, in addition to the time it takes. In this aspect the prefix approach is superior to the parametric search approach.