## Learning Changing Concepts by Exploiting the Structure of Change (1996)

Citations: | 21 - 0 self |

### BibTeX

@MISC{Bartlett96learningchanging,

author = {Peter L. Bartlett and Shai Ben-david and Sanjeev R. Kulkarni},

title = {Learning Changing Concepts by Exploiting the Structure of Change},

year = {1996}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must provide an accurate estimate of the target function sequence. We consider a variety of restrictions on how the target function is allowed to change, including infrequent but arbitrary changes, sequences that correspond to slow walks on a graph whose nodes are functions, and changes that are small on average, as measured by the probability of disagreements between consecutive functions. We first study estimation, in which the learner sees a batch of examples and is then required to give an accurate estimate of the function sequence. Our results provide bounds on the sample complexity and allowable drift rate for these problems. We also study prediction, in which the learner must produce online a hypothesis after each labelled example and the average misclassification probability over this hypothes...

### Citations

803 |
Estimation of Dependencies Based on Empirical Data
- Vapnik
- 1979
(Show Context)
Citation Context ...eger q, and that n is such that k = \Deltan is an integer, Clearly, fi fi Fnjx fi fisX k Y j=0 F (i j ); where the sum is over all 1si jsn satisfying k X j=0 i j = n: Sauer's lemma (see, for example, =-=[8]-=-) implies F (i)s2i d . Now, for all legal choices of the i j , k Y j=0 2i d jsk Y j=0 2(n=k) d ; that is, the choice of i j that maximizes this product is i j = n=k. To see this, suppose that we have ... |

624 |
Learnability and the vapnik-chervonenkis dimension
- Blumer, Ehrenfeucht, et al.
- 1989
(Show Context)
Citation Context ...d samples). Our main result here is the derivation of a sufficient condition that guarantees estimability of a family of sequences of functions. This result may be viewed as an extension of the basic =-=[2]-=- sufficiency theorem for PAC learnability of classes of (single) functions. We go on and apply this result to provide sample size upper bounds for the estimation of several naturally arising families ... |

372 |
Decision theoretic generalizations of the PAC model for neural net and other learning applications
- Haussler
- 1992
(Show Context)
Citation Context ...annot be sure that there will be a function sequence in the class Fn that is consistent with an arbitrary target sequence. However, we can use a similar argument (together with techniques of Haussler =-=[5]-=-) to prove the following more general uniform convergence result, which is useful for learning when the target sequencesf is arbitrary. Theorem 3 For a; bs0, define d fl (a; b) = ja \Gamma bj jaj + jb... |

67 | Tracking drifting concepts by minimizing disagreements
- Helmbold, Long
- 1994
(Show Context)
Citation Context ...ly one target concept per learning session (it remains fixed throughout the learning process). The problem of predicting labels for a changing concept has been considered elsewhere. Helmbold and Long =-=[6]-=- consider prediction when the concept is allowed to drift slowly between trials. That is, any two consecutive functionssf i and f i+1 must have Pr(f i 6= f i+1 ) small. This is a natural measure of co... |

28 | Universal schemes for sequential decision from individual sequences
- Merhav, Feder
- 1993
(Show Context)
Citation Context ...to the complete last-k-steps information (and then looking for the best Markovian strategy, or the best one in some computationally restricted family of strategies, as in the work of Merhav and Feder =-=[7]-=-), we assume that our predictor can only approximate the past sequence, (f t\Gammak ; : : : ; f t\Gamma1 ). We conclude the paper by gluing together our estimation and prediction results to obtain sam... |

18 | Learning switching concepts
- Blum, Chalasani
- 1992
(Show Context)
Citation Context ...ed to their setting, shows that with this weaker constraint, the allowable drift rate decreases by no more than a log factor, ffl 2 =(d log(1=ffl)) versus ffl 2 =(d log 2 (d=ffl)). Blum and Chalasani =-=[1]-=- consider learning switching concepts. The target concept is allowed to switch between concepts in the class, but with some constraint on the total number of concepts visited, or on the frequency of s... |

14 | Learning with a slowly changing distribution - Bartlett - 1992 |

14 | The complexity of learning according to two models of a drifting environment - Long - 1999 |

11 | Learning under persistent drift - Freund, Mansour - 1997 |

10 | On the complexity of learning from drifting distributions - Barve, Long - 1997 |

4 |
A Guided Tour of Chernov Bounds
- Hagerup, Rüb
- 1989
(Show Context)
Citation Context ...n s.t. g i (x i ) = f i (x i ) for all i, and dP ( f ; g)sffl \Psi ! 2 \Gammanffl=2+1 E fi fi Fn j x fi fi 2 ; where the expectation is over x in X n . Proof: Using Chernoff bounds (see, for example, =-=[4]-=-), we have P n \Phi x : 9g 2 Fn s.t.sg(x) =sf (x),sdP (sf ;sg)sffl \Psi 1 1 \Gamma e \Gammanffl=8 P 2n n (x; y) 2 X 2n : 9g 2 Fn s.t.sg(x) =sf (x),sdP (sf ;sg)sffl, andsd y (sf ;sg)sffl=2 o : Let U be... |

1 |
Temperature modelling and prediction of steel strip in a HSM
- Connolly, Chicharo, et al.
- 1992
(Show Context)
Citation Context ... changes. Another rather practical example arises in a steel rolling mill, where the efficiency of the mill's operation depends on how accurately the behavior of the rolling surfaces can be predicted =-=[3]-=-. As in many industrial processes, there is an accurate physical model of the target function (relating the measured variables to the desired quantity), but there are several unknown parameters, and t... |

1 | Learning changing problems - Bartlett, Helmbold - 1996 |