Concentration Of Measure And Isoperimetric Inequalities In Product Spaces
, 1995
"... . The concentration of measure phenomenon in product spaces roughly states that, if a set A in a product# N of probability spaces has measure at least one half, "most" of the points of# N are "close" to A. We proceed to a systematic exploration of this phenomenon. The meaning ..."
Abstract

Cited by 271 (3 self)
. The concentration of measure phenomenon in product spaces roughly states that, if a set A in a product# N of probability spaces has measure at least one half, "most" of the points of# N are "close" to A. We proceed to a systematic exploration of this phenomenon. The meaning of the word "most" is made rigorous by isoperimetrictype inequalities that bound the measure of the exceptional sets. The meaning of the work "close" is defined in three main ways, each of them giving rise to related, but di#erent inequalities. The inequalities are all proved through a common scheme of proof. Remarkably, this simple approach not only yields qualitatively optimal results, but, in many cases, captures near optimal numerical constants. A large number of applications are given, in particular to Percolation, Geometric Probability, Probability in Banach Spaces, to demonstrate in concrete situations the extremely wide range of application of the abstract tools. AMS Classification numbers: Primary 60E15, 28A35, 60G99; Secondary 60G15, 68C15. Typeset by A M ST E X 1 2 M. TALAGRAND Table of Contents I.
Mining timechanging data streams
 IN PROC. OF THE 2001 ACM SIGKDD INTL. CONF. ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2001
"... Most statistical and machinelearning algorithms assume that the data is a random sample drawn from a stationary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months or years, and the underlying processes genera ..."
Abstract

Cited by 258 (5 self)
Most statistical and machinelearning algorithms assume that the data is a random sample drawn from a stationary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months or years, and the underlying processes generating them changed during this time, sometimes radically. Although a number of algorithms have been proposed for learning timechanging concepts, they generally do not scale well to very large databases. In this paper we propose an efficient algorithm for mining decision trees from continuouslychanging data streams, based on the ultrafast VFDT decision tree learner. This algorithm, called CVFDT, stays current while making the most of old data by growing an alternative subtree whenever an old one becomes questionable, and replacing the old with the new when the new becomes more accurate. CVFDT learns a model which is similar in accuracy to the one that would be learned by reapplying VFDT to a moving window of examples every time a new example arrives, but with O(1) complexity per example, as opposed to O(w), where w is the size of the window. Experiments on a set of large timechanging data streams demonstrate the utility of this approach.
The Social Cost of Cheap Pseudonyms
 Journal of Economics and Management Strategy
, 2000
"... We consider the problems of societal norms for cooperation and reputation when it is possible to obtain "cheap pseudonyms", something which is becoming quite common in a wide variety of interactions on the Internet. This introduces opportunities to misbehave without paying reputational con ..."
Abstract

Cited by 226 (9 self)
We consider the problems of societal norms for cooperation and reputation when it is possible to obtain "cheap pseudonyms", something which is becoming quite common in a wide variety of interactions on the Internet. This introduces opportunities to misbehave without paying reputational consequences. A large degree of cooperation can still emerge, through a convention in which newcomers "pay their dues" by accepting poor treatment from players who have established positive reputations. One might hope for an open society where newcomers are treated well, but there is an inherent social cost in making the spread of reputations optional. We prove that no equilibrium can sustain significantly more cooperation than the duespaying equilibrium in a repeated random matching game with a large number of players in which players have finite lives and the ability to change their identities, and there is a small but nonvanishing probability of mistakes. Although one could remove the ineffici...
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
, 2003
"... Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some ..."
Abstract

Cited by 186 (4 self)
Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of in nitesimal gradient ascent, and the results here imply that generalized in nitesimal gradient ascent (GIGA) is universally consistent.
Sparse Greedy Matrix Approximation for Machine Learning
, 2000
"... In kernel based methods such as Regularization Networks large datasets pose signi cant problems since the number of basis functions required for an optimal solution equals the number of samples. We present a sparse greedy approximation technique to construct a compressed representation of the ..."
Abstract

Cited by 183 (11 self)
In kernel based methods such as Regularization Networks large datasets pose signi cant problems since the number of basis functions required for an optimal solution equals the number of samples. We present a sparse greedy approximation technique to construct a compressed representation of the design matrix. Experimental results are given and connections to KernelPCA, Sparse Kernel Feature Analysis, and Matching Pursuit are pointed out. 1. Introduction Many recent advances in machine learning such as Support Vector Machines [Vapnik, 1995], Regularization Networks [Girosi et al., 1995], or Gaussian Processes [Williams, 1998] are based on kernel methods. Given an msample f(x 1 ; y 1 ); : : : ; (x m ; y m )g of patterns x i 2 X and target values y i 2 Y these algorithms minimize the regularized risk functional min f2H R reg [f ] = 1 m m X i=1 c(x i ; y i ; f(x i )) + 2 kfk 2 H : (1) Here H denotes a reproducing kernel Hilbert space (RKHS) [Aronszajn, 1950],...
The degree sequence of a scalefree random graph process. Random Structures and Algorithms
, 2001
"... ABSTRACT: Recently, Barabási and Albert [2] suggested modeling complex realworld networks such as the worldwide web as follows:consider a random graph process in which vertices are added to the graph one at a time and joined to a fixed number of earlier vertices, selected with probabilities proport ..."
Abstract

Cited by 160 (3 self)
ABSTRACT: Recently, Barabási and Albert [2] suggested modeling complex realworld networks such as the worldwide web as follows:consider a random graph process in which vertices are added to the graph one at a time and joined to a fixed number of earlier vertices, selected with probabilities proportional to their degrees. In [2] and, with Jeong, in [3], Barabási and Albert suggested that after many steps the proportion P�d � of vertices with degree d should obey a power law P�d � α d −γ. They obtained γ = 2�9 ± 0�1 by experiment and gave a simple heuristic argument suggesting that γ = 3. Here we obtain P�d � asymptotically for all d ≤ n 1/15, where n is the number of vertices, proving as a consequence that γ = 3.
Resilient Multicast using Overlays
 In Proc. of ACM Sigmetrics
, 2003
"... (PRM): a multicast data recovery scheme that improves data delivery ratios while maintaining low endtoend latencies. PRM has both a proactive and a reactive components; in this paper we describe how PRM can be used to improve the performance of applicationlayer multicast protocols especially when ..."
Abstract

Cited by 111 (9 self)
(PRM): a multicast data recovery scheme that improves data delivery ratios while maintaining low endtoend latencies. PRM has both a proactive and a reactive components; in this paper we describe how PRM can be used to improve the performance of applicationlayer multicast protocols especially when there are high packet losses and host failures. Through detailed analysis in this paper, we show that this loss recovery technique has efficient scaling properties—the overheads at each overlay node asymptotically decrease to zero with increasing group sizes. As a detailed case study, we show how PRM can be applied to the NICE applicationlayer multicast protocol. We present detailed simulations of the PRMenhanced NICE protocol for 10 000 node Internetlike topologies. Simulations show that PRM achieves a high delivery ratio ( 97%) with a low latency bound (600 ms) for environments with high endtoend network losses (1%–5%) and high topology change rates (5 changes per second) while incurring very low overheads ( 5%). Index Terms—Multicast, networks, overlays, probabilistic forwarding, protocols, resilience. I.
Using Confidence Bounds for ExploitationExploration Tradeoffs
 Journal of Machine Learning Research
, 2002
"... We show how a standard tool from statistics  namely confidence bounds  can be used to elegantly deal with situations which exhibit an exploitationexploration tradeo#. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm h ..."
Abstract

Cited by 110 (2 self)
We show how a standard tool from statistics  namely confidence bounds  can be used to elegantly deal with situations which exhibit an exploitationexploration tradeo#. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm has to make exploitationversusexploration decisions based on uncertain information provided by a random process.
An Elementary Introduction to Modern Convex Geometry
 in Flavors of Geometry
, 1997
"... Introduction to Modern Convex Geometry KEITH BALL Contents Preface 1 Lecture 1. Basic Notions 2 Lecture 2. Spherical Sections of the Cube 8 Lecture 3. Fritz John's Theorem 13 Lecture 4. Volume Ratios and Spherical Sections of the Octahedron 19 Lecture 5. The BrunnMinkowski Inequality and It ..."
Abstract

Cited by 101 (2 self)
Introduction to Modern Convex Geometry KEITH BALL Contents Preface 1 Lecture 1. Basic Notions 2 Lecture 2. Spherical Sections of the Cube 8 Lecture 3. Fritz John's Theorem 13 Lecture 4. Volume Ratios and Spherical Sections of the Octahedron 19 Lecture 5. The BrunnMinkowski Inequality and Its Extensions 25 Lecture 6. Convolutions and Volume Ratios: The Reverse Isoperimetric Problem 32 Lecture 7. The Central Limit Theorem and Large Deviation Inequalities 37 Lecture 8. Concentration of Measure in Geometry 41 Lecture 9. Dvoretzky's Theorem 47 Acknowledgements 53 References 53 Index 55 Preface These notes are based, somewhat loosely, on three series of lectures given by myself, J. Lindenstrauss and G. Schechtman, during the Introductory Workshop in Convex Geometry held at the Mathematical Sciences Research Institute in Berkeley, early in 1996. A fourth series was given by B. Bollobas, on rapid mixing and random volume algorithms; they are found els
Estimating the Generalization Performance of an SVM Efficiently
, 2000
"... This paper proposes and analyzes an approach to estimating the generalization performance of a support vector machine (SVM) for text classification. Without any computation intensive resampling, the new estimators are computationally much more ecient than crossvalidation or bootstrap, since they ca ..."
Abstract

Cited by 98 (1 self)
This paper proposes and analyzes an approach to estimating the generalization performance of a support vector machine (SVM) for text classification. Without any computation intensive resampling, the new estimators are computationally much more ecient than crossvalidation or bootstrap, since they can be computed immediately from the form of the hypothesis returned by the SVM. Moreover, the estimators delevoped here address the special performance measures needed for text classification. While they can be used to estimate error rate, one can also estimate the recall, the precision, and the F 1 . A theoretical analysis and experiments on three text classification collections show that the new method can effectively estimate the performance of SVM text classifiers in a very efficient way.