Results 1 -
6 of
6
Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain
- Machine Learning
, 1995
"... This paper describes experimental results on using Winnow and Weighted-Majority based algorithms on a real-world calendar scheduling domain. These two algorithms have been highly studied in the theoretical machine learning literature. We show here that these algorithms can be quite competitive pract ..."
Abstract
-
Cited by 114 (4 self)
- Add to MetaCart
This paper describes experimental results on using Winnow and Weighted-Majority based algorithms on a real-world calendar scheduling domain. These two algorithms have been highly studied in the theoretical machine learning literature. We show here that these algorithms can be quite competitive practically, outperforming the decision-tree approach currently in use in the Calendar Apprentice system in terms of both accuracy and speed. One of the contributions of this paper is a new variant on the Winnow algorithm (used in the experiments) that is especially suited to conditions with stringvalued classifications, and we give a theoretical analysis of its performance. In addition we show how Winnow can be applied to achieve a good accuracy/coverage tradeoff and explore issues that arise such as concept drift. We also provide an analysis of a policy for discarding predictors in Weighted-Majority that allows it to speed up as it learns. Keywords: Winnow, Weighted-Majority, Multiplicative alg...
Efficient Learning of Typical Finite Automata from Random Walks
, 1997
"... This paper describes new and efficient algorithms for learning deterministic finite automata. Our approach is primarily distinguished by two features: (1) the adoption of an average-case setting to model the ``typical'' labeling of a finite automaton, while retaining a worst-case model for the under ..."
Abstract
-
Cited by 44 (9 self)
- Add to MetaCart
This paper describes new and efficient algorithms for learning deterministic finite automata. Our approach is primarily distinguished by two features: (1) the adoption of an average-case setting to model the ``typical'' labeling of a finite automaton, while retaining a worst-case model for the underlying graph of the automaton, along with (2) a learning model in which the learner is not provided with the means to experiment with the machine, but rather must learn solely by observing the automaton's output behavior on a random input sequence. The main contribution of this paper is in presenting the first efficient algorithms for learning nontrivial classes of automata in an entirely passive learning model. We adopt an on-line learning model in which the learner is asked to predict the output of the next state, given the next symbol of the random input sequence; the goal of the learner is to make as few prediction mistakes as possible. Assuming the learner has a means of resetting the target machine to a fixed start state, we first present an efficient algorithm that
Approximating Hyper-Rectangles: Learning and Pseudo-random Sets
- Journal of Computer and System Sciences
, 1997
"... The PAC learning of rectangles has been studied because they have been found experimentally to yield excellent hypotheses for several applied learning problems. Also, pseudorandom sets for rectangles have been actively studied recently because (i) they are a subproblem common to the derandomization ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
The PAC learning of rectangles has been studied because they have been found experimentally to yield excellent hypotheses for several applied learning problems. Also, pseudorandom sets for rectangles have been actively studied recently because (i) they are a subproblem common to the derandomization of depth-2 (DNF) circuits and derandomizing Randomized Logspace, and (ii) they approximate the distribution of n independent multivalued random variables. We present improved upper bounds for a class of such problems of "approximating" high-dimensional rectangles that arise in PAC learning and pseudorandomness. Key words and phrases. Rectangles, machine learning, PAC learning, derandomization, pseudorandomness, multiple-instance learning, explicit constructions, Ramsey graphs, random graphs, sample complexity, approximations of distributions. 2 1 Introduction A basic common theme of a large part of PAC learning and derandomization/computational pseudorandomness is to "approximate" a stru...
A simple population protocol for fast robust approximate majority
- Distributed Computing, 21st International Symposium, DISC 2007
, 2008
"... Abstract We describe and analyze a 3-state one-way population protocol to compute approximate majority in the model in which pairs of agents are drawn uniformly at random to interact. Given an initial configuration of x’s, y’s and blanks that contains at least one non-blank, the goal is for the agen ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract We describe and analyze a 3-state one-way population protocol to compute approximate majority in the model in which pairs of agents are drawn uniformly at random to interact. Given an initial configuration of x’s, y’s and blanks that contains at least one non-blank, the goal is for the agents to reach consensus on one of the values x or y. Additionally, the value chosen should be the majority non-blank initial value, provided it exceeds the minority by a sufficient margin. We prove that with high probability n agents reach consensus in O(n log n) interactions and the value chosen is the majority provided that its initial margin is at least ω ( √ n log n). This protocol has the additional property of tolerating Byzantine behavior in o ( √ n) of the agents, making it the first known population protocol that tolerates Byzantine agents.
Convergence Of Moments In A Markov-Chain Central Limit Theorem
, 2001
"... . Let (X i ) 1 i=0 be a V -uniformly ergodic Markov chain on a general state space, and let be its stationary distribution. For g : X ! R, define W k (g) := k \Gamma1=2 k\Gamma1 X i=0 g(X i ) \Gamma (g): It is shown that if jgj V 1=n for a positive integer n, then Ex W k (g) n converg ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
. Let (X i ) 1 i=0 be a V -uniformly ergodic Markov chain on a general state space, and let be its stationary distribution. For g : X ! R, define W k (g) := k \Gamma1=2 k\Gamma1 X i=0 g(X i ) \Gamma (g): It is shown that if jgj V 1=n for a positive integer n, then Ex W k (g) n converges to the n-th moment of a normal random variable with expectation 0 and variance fl 2 g := (g 2 ) \Gamma (g) 2 + 1 X j=1 `Z g(x)Exg(X j ) \Gamma (g 2 ) ' : This extends the existing Markov-chain central limit theorems, according to which expectations of bounded functionals of W k (g) converge. We also derive nonasymptotic bounds for the error in approximating the moments of W k (g) by the normal moments. These yield easy bounds of all feasible polynomial orders, and exponential bounds as well under some circumstances, for the probabilities of large deviations by the empirical measure along the Markov chain path X i . 1.
On the Sample Complexity of Weakly Learning
- Information and Computation
, 1992
"... In this paper, we study the sample complexity of weak learning. That is, we ask how much data must be collected from an unknown distribution in order to extract a small but significant advantage in prediction. We show that it is important to distinguish between those learning algorithms that output ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we study the sample complexity of weak learning. That is, we ask how much data must be collected from an unknown distribution in order to extract a small but significant advantage in prediction. We show that it is important to distinguish between those learning algorithms that output deterministic hypotheses and those that output randomized hypotheses. We prove that in the weak learning model, any algorithm using deterministic hypotheses to weakly learn a class of Vapnik-Chervonenkis dimension d(n) requires\Omega\Gamma p d(n)) examples. In contrast, when randomized hypotheses are allowed, we show that \Theta(1) examples suffice in some cases. We then show that there exists an efficient algorithm using deterministic hypotheses that weakly learns against any distribution on a set of size d(n) with only O(d(n) 2=3 ) examples. Thus for the class of symmetric Boolean functions over n variables, where the strong learning sample complexity is \Theta(n), the sample complexi...

