Results 1  10
of
74
The Nature of Statistical Learning Theory
, 1995
"... Abstract—Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on ..."
Abstract

Cited by 8950 (28 self)
 Add to MetaCart
Abstract—Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems.
The capacity of wireless networks
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2000
"... When n identical randomly located nodes, each capable of transmitting at bits per second and using a fixed range, form a wireless network, the throughput @ A obtainable by each node for a randomly chosen destination is 2 bits per second under a noninterference protocol. If the nodes are optimally p ..."
Abstract

Cited by 2220 (32 self)
 Add to MetaCart
When n identical randomly located nodes, each capable of transmitting at bits per second and using a fixed range, form a wireless network, the throughput @ A obtainable by each node for a randomly chosen destination is 2 bits per second under a noninterference protocol. If the nodes are optimally placed in a disk of unit area, traffic patterns are optimally assigned, and each transmission’s range is optimally chosen, the bit–distance product that can be transported by the network per second is 2 @ A bitmeters per second. Thus even under optimal circumstances, the throughput is only 2 bits per second for each node for a destination nonvanishingly far away. Similar results also hold under an alternate physical model where a required signaltointerference ratio is specified for successful receptions. Fundamentally, it is the need for every node all over the domain to share whatever portion of the channel it is utilizing with nodes in its local neighborhood that is the reason for the constriction in capacity. Splitting the channel into several subchannels does not change any of the results. Some implications may be worth considering by designers. Since the throughput furnished to each user diminishes to zero as the number of users is increased, perhaps networks connecting smaller numbers of users, or featuring connections mostly with nearby neighbors, may be more likely to be find acceptance.
Empirical margin distributions and bounding the generalization error of combined classifiers
 Ann. Statist
, 2002
"... Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such ..."
Abstract

Cited by 112 (8 self)
 Add to MetaCart
Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The bounds are in terms of the empirical distribution of the margin of the combined classifier. They are based on the methods of the theory of Gaussian and empirical processes (comparison inequalities, symmetrization method, concentration inequalities) and they improve previous results of Bartlett (1998) on bounding the generalization error of neural networks in terms of ℓ1norms of the weights of neurons and of Schapire, Freund, Bartlett and Lee (1998) on bounding the generalization error of boosting. We also obtain rates of convergence in Lévy distance of empirical margin distribution to the true margin distribution uniformly over the classes of classifiers and prove the optimality of these rates.
A few notes on Statistical Learning Theory
, 2003
"... this article is on the theoretical side and not on the applicative one; hence, we shall not present examples which may be interesting from the practical point of view but have little theoretical significance. This survey is far from being complete and it focuses on problems the author finds interest ..."
Abstract

Cited by 52 (10 self)
 Add to MetaCart
this article is on the theoretical side and not on the applicative one; hence, we shall not present examples which may be interesting from the practical point of view but have little theoretical significance. This survey is far from being complete and it focuses on problems the author finds interesting (an opinion which is not necessarily shared by the majority of the learning community). Relevant books which present a more evenly balanced approach are, for example [1, 4, 35, 36] The starting point of our discussion is the formulation of the learning problem. Consider a class G, consisting of real valued functions defined on a space #, and assume that each g G maps # into [0, 1]. Let T be an unknown function, T : # [0, 1] and set to be an unknown probability measure on #
M.: The scenario approach to robust control design
 IEEE Trans. Autom. Control
, 2006
"... Abstract—This paper proposes a new probabilistic solution framework for robust control analysis and synthesis problems that can be expressed in the form of minimization of a linear objective subject to convex constraints parameterized by uncertainty terms. This includes the wide class of NPhard con ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
Abstract—This paper proposes a new probabilistic solution framework for robust control analysis and synthesis problems that can be expressed in the form of minimization of a linear objective subject to convex constraints parameterized by uncertainty terms. This includes the wide class of NPhard control problems representable by means of parameterdependent linear matrix inequalities (LMIs). It is shown in this paper that by appropriate sampling of the constraints one obtains a standard convex optimization problem (the scenario problem) whose solution is approximately feasible for the original (usually infinite) set of constraints, i.e., the measure of the set of original constraints that are violated by the scenario solution rapidly decreases to zero as the number of samples is increased. We provide an explicit and efficient bound on the number of samples required to attain apriori specified levels of probabilistic guarantee of robustness. A rich family of control problems which are in general hard to solve in a deterministically robust sense is therefore amenable to polynomialtime solution, if robustness is intended in the proposed riskadjusted sense. Index Terms—Probabilistic robustness, randomized algorithms, robust control, robust convex optimization, uncertainty. I.
Rademacher Processes And Bounding The Risk Of Function Learning
 High Dimensional Probability II
, 1999
"... We construct data dependent upper bounds on the risk in function learning problems. The bounds are based on the local norms of the Rademacher process indexed by the underlying function class and they do not require prior knowledge about the distribution of training examples or any specific propertie ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
We construct data dependent upper bounds on the risk in function learning problems. The bounds are based on the local norms of the Rademacher process indexed by the underlying function class and they do not require prior knowledge about the distribution of training examples or any specific properties of the function class. Using Talagrand's type concentration inequalities for empirical and Rademacher processes, we show that the bounds hold with high probability that decreases exponentially fast when the sample size grows. In typical situations that are frequently encountered in the theory of function learning, the bounds give nearly optimal rate of convergence of the risk to zero. 1. Local Rademacher norms and bounds on the risk: main results Let (S; A) be a measurable space and let F be a class of Ameasurable functions from S into [0; 1]: Denote P(S) the set of all probability measures on (S; A): Let f 0 2 F be an unknown target function. Given a probability measure P 2 P(S) (also unknown), let (X 1 ; : : : ; Xn ) be an i.i.d. sample in (S; A) with common distribution P (defined on a probability space(\Omega ; \Sigma; P)). In computer learning theory, the problem of estimating f 0 ; based on the labeled sample (X 1 ; Y 1 ); : : : ; (Xn ; Yn ); where Y j := f 0 (X j ); j = 1; : : : ; n; is referred to as function learning problem. The so called concept learning is a special case of function learning. In this case, F := fI C : C 2 Cg; where C ae A is called a class of concepts (see Vapnik (1998), Vidyasagar (1996), Devroye, Gyorfi and Lugosi (1996) for the account on statistical learning theory). The goal of function learning is to find an estimate
A PACBayesian Margin Bound for Linear Classifiers
, 2002
"... We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training sample. The result is obtained in a PACBayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential imp ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training sample. The result is obtained in a PACBayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound, which was developed in the luckiness framework, and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to nontrivial bound values andfor maximum marginsto a vanishing complexity term. In contrast to previous results, however, the new bound does depend on the dimensionality of feature space. The analysis shows that the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the fraction of hypothesis space consistent with the training sample. The practical relevance of the result lies in the fact that the wellknown support vector machine is optimal with respect to the new bound only if the feature vectors in the training sample are all of the same length. As a consequence we recommend to use SVMs on normalised feature vectors only. Numerical simulations support this recommendation and demonstrate that the new error bound can be used for the purpose of model selection.
Randomized algorithms for probabilistic robustness with real and complex structured uncertainty
 IEEE Trans. Autom. Control
, 2000
"... Abstract—In recent years, there has been a growing interest in developing randomized algorithms for probabilistic robustness of uncertain control systems. Unlike classical worst case methods, these algorithms provide probabilistic estimates assessing, for instance, if a certain design specification ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
Abstract—In recent years, there has been a growing interest in developing randomized algorithms for probabilistic robustness of uncertain control systems. Unlike classical worst case methods, these algorithms provide probabilistic estimates assessing, for instance, if a certain design specification is met with a given probability. One of the advantages of this approach is that the robustness margins can be often increased by a considerable amount, at the expense of a small risk. In this sense, randomized algorithms may be used by the control engineer together with standard worst case methods to obtain additional useful information. The applicability of these probabilistic methods to robust control is presently limited by the fact that the sample generation is feasible only in very special cases which include systems affected by real parametric uncertainty bounded in rectangles or spheres. Sampling in more general uncertainty sets is generally performed through overbounding, at the expense of an exponential rejection rate. In this paper, randomized algorithms for stability and performance of linear time invariant uncertain systems described by a general1 configuration are studied. In particular, efficient polynomialtime algorithms for uncertainty structures 1 consisting of an arbitrary number of full complex blocks and uncertain parameters are developed. Index Terms—Random matrices, randomized algorithms, robust control, uncertainty. I.
Nonparametric time series prediction through adaptive model selection
 Machine Learning
, 2000
"... Abstract. We consider the problem of onestep ahead prediction for time series generated by an underlying stationary stochastic process obeying the condition of absolute regularity, describing the mixing nature of process. We make use of recent results from the theory of empirical processes, and ada ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Abstract. We consider the problem of onestep ahead prediction for time series generated by an underlying stationary stochastic process obeying the condition of absolute regularity, describing the mixing nature of process. We make use of recent results from the theory of empirical processes, and adapt the uniform convergence framework of Vapnik and Chervonenkis to the problem of time series prediction, obtaining finite sample bounds. Furthermore, by allowing both the model complexity and memory size to be adaptively determined by the data, we derive nonparametric rates of convergence through an extension of the method of structural risk minimization suggested by Vapnik. All our results are derived for general L p error measures, and apply to both exponentially and algebraically mixing processes.