Results 1 - 10
of
26
PROBABILITY INEQUALITIES FOR SUMS OF BOUNDED RANDOM VARIABLES
, 1962
"... Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S is bounded or bounded above. The bounds for Pr(S-ES> nt) depend only on the endpoints of the ranges of the smum ..."
Abstract
-
Cited by 1128 (2 self)
- Add to MetaCart
Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S is bounded or bounded above. The bounds for Pr(S-ES> nt) depend only on the endpoints of the ranges of the smumands and the mean, or the mean and the variance of S. These results are then used to obtain analogous inequalities for certain sums of dependent random variables such as U statistics and the sum of a random sample without replacement from a finite population.
Universal Limit Laws for Depths in Random Trees
- SIAM Journal on Computing
, 1998
"... Random binary search trees, b-ary search trees, median-of-(2k+1) trees, quadtrees, simplex trees, tries, and digital search trees are special cases of random split trees. For these trees, we o#er a universal law of large numbers and a limit law for the depth of the last inserted point, as well as a ..."
Abstract
-
Cited by 41 (7 self)
- Add to MetaCart
Random binary search trees, b-ary search trees, median-of-(2k+1) trees, quadtrees, simplex trees, tries, and digital search trees are special cases of random split trees. For these trees, we o#er a universal law of large numbers and a limit law for the depth of the last inserted point, as well as a law of large numbers for the height.
Minimax-optimal classification with dyadic decision trees
- IEEE TRANSACTIONS ON INFORMATION THEORY
, 2006
"... Decision trees are among the most popular types of classifiers, with interpretability and ease of im-plementation being among their chief attributes. Despite the widespread use of decision trees, theoretical analysis of their performance has only begun to emerge in recent years. In this paper it is ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
Decision trees are among the most popular types of classifiers, with interpretability and ease of im-plementation being among their chief attributes. Despite the widespread use of decision trees, theoretical analysis of their performance has only begun to emerge in recent years. In this paper it is shown that a new family of decision trees, dyadic decision trees (DDTs), attain nearly optimal (in a minimax sense) rates of convergence for a broad range of classification problems. Furthermore, DDTs are surprisingly adaptive in three important respects: They automatically (1) adapt to favorable conditions near the Bayes decision boundary; (2) focus on data distributed on lower dimensional manifolds; and (3) reject irrelevant features. DDTs are constructed by penalized empirical risk minimization using a new data-dependent penalty and may be computed exactly with computational complexity that is nearly linear in the training sample size. DDTs are the first classifier known to achieve nearly optimal rates for the diverse class of distributions studied here while also being practical and implementable. This is also the first study (of which we are aware) to consider rates for adaptation to intrinsic data dimension and relevant features.
Concentration inequalities
- Advanced Lectures in Machine Learning
, 2004
"... Abstract. Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical a ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Abstract. Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical analysis of various problems in machine learning and made it possible to derive new efficient algorithms. This text attempts to summarize some of the basic tools. 1
Probabilistic bounds on the coefficients of polynomials with only real zeros
- J. Combin. Theory Ser. A
, 1997
"... The work of Harper and subsequent authors has shown that nite sequences (a 0;;an) arising from combinatorial problems are often such that the polynomial A(z): = P n k=0 akz k has only real zeros. Basic examples include rows from the arrays of binomial coe cients, Stir-ling numbers of the rst and sec ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
The work of Harper and subsequent authors has shown that nite sequences (a 0;;an) arising from combinatorial problems are often such that the polynomial A(z): = P n k=0 akz k has only real zeros. Basic examples include rows from the arrays of binomial coe cients, Stir-ling numbers of the rst and second kinds, and Eulerian numbers. Assuming the ak are non-negative, A(1)> 0 and that A(z) is not constant, it is known that A(z) has only real zeros i the normalized sequence (a 0=A(1);;an=A(1)) is the probability distribution of the Research supported in part by N.S.F. Grant MCS9404345 1 number of successes in n independent trials for some sequence of suc-cess probabilities. Such sequences (a 0;;an) are also known to be characterized by total positivity of the in nite matrix (ai,j) indexed by non-negative integers i and j. This papers reviews inequalities and approximations for such sequences, called Polya frequency sequences which follow from their probabilistic representation. In combinatorial examples these inequalities yield a number of improvements of known estimates.
Pattern classification and learning theory
"... 1.1 A binary classification problem Pattern recognition (or classification or discrimination) is about guessing or predicting the unknown class of an observation. An observation is a collection of numerical measurements, represented by a d-dimensional vector x. The unknown nature of the observation ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
1.1 A binary classification problem Pattern recognition (or classification or discrimination) is about guessing or predicting the unknown class of an observation. An observation is a collection of numerical measurements, represented by a d-dimensional vector x. The unknown nature of the observation is called a class. It is denoted by y and takes values in the set f0; 1g. (For simplicity, we restrict our attention to binary classification.) In pattern recognition, one creates a function g(x) : R d! f0; 1g which represents one's guess of y given x. The mapping g is called a classifier. A classifier errs on x if g(x) 6 = y. To model the learning problem, we introduce a probabilistic setting, and let (X; Y) be an R d \Theta f0; 1g-valued random pair. The random pair (X; Y) may be described in a variety of ways: for example, it is defined by the pair (_; j), where _ is the probability measure for X and j is the regression of Y on X. More precisely, for a Borel-measurable set A ` R d
Expected time analysis for delaunay point location
- Comput. Geom. Theory Appl
, 2004
"... Abstract. We consider point location in Delaunay triangulations with the aid of simple data structures. In particular, we analyze methods in which a simple data structure is used to first locate a point close to the query point. For points uniformly distributed on the unit square, we show that the e ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Abstract. We consider point location in Delaunay triangulations with the aid of simple data structures. In particular, we analyze methods in which a simple data structure is used to first locate a point close to the query point. For points uniformly distributed on the unit square, we show that the expected point location complexities are Θ ( √ n) for the Green-Sibson rectilinear search, Θ(n 1/3) for Jump and Walk, Θ(n 1/4) for BinSearch and Walk (which uses a 1-dimensional search tree), Θ(n 0.056...) for search based on a random 2-d tree, and Θ(log n) for search aided by a 2-d median tree.
Distances and Finger Search in Random Binary Search Trees
- SIAM Journal on Computing
, 2004
"... For the random binary search tree with n nodes inserted the number of ancestors of the elements with ranks k and l, 1 <= k < l <= n, as well as the path distance between these elements in the tree are considered. For both quantities, central limit theorems for appropriately rescaled versions are der ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
For the random binary search tree with n nodes inserted the number of ancestors of the elements with ranks k and l, 1 <= k < l <= n, as well as the path distance between these elements in the tree are considered. For both quantities, central limit theorems for appropriately rescaled versions are derived. For the path distance, the condition l-k -> ∞ as $n -> ∞ is required. We obtain tail bounds and the order of higher moments for the path distance. The path distance measures the complexity of finger search in the tree.
Strong consistency of MLE for finite uniform mixtures when the scale parameters are exponentially small
, 2003
"... We consider maximum likelihood estimation of finite mixture of uniform distributions. We prove that the maximum likelihood estimator is strongly consistent, if the scale parameters of the component uniform distributions are restricted from below by exp(-n ), 0 < d < 1, where n is the sample size. ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We consider maximum likelihood estimation of finite mixture of uniform distributions. We prove that the maximum likelihood estimator is strongly consistent, if the scale parameters of the component uniform distributions are restricted from below by exp(-n ), 0 < d < 1, where n is the sample size. 1

