Results 1 - 10
of
12
Efficient Distribution-free Learning of Probabilistic Concepts
- Journal of Computer and System Sciences
, 1993
"... In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behavior---thus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic c ..."
Abstract
-
Cited by 182 (8 self)
- Add to MetaCart
In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behavior---thus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic concepts (or p-concepts) may arise in situations such as weather prediction, where the measured variables and their accuracy are insufficient to determine the outcome with certainty. We adopt from the Valiant model of learning [27] the demands that learning algorithms be efficient and general in the sense that they perform well for a wide class of p-concepts and for any distribution over the domain. In addition to giving many efficient algorithms for learning natural classes of p-concepts, we study and develop in detail an underlying theory of learning p-concepts. 1 Introduction Consider the following scenarios: A meteorologist is attempting to predict tomorrow's weather as accurately as pos...
On the Computational Complexity of Approximating Distributions by Probabilistic Automata
- Machine Learning
, 1990
"... We introduce a rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, and analyze the complexity of the training problem as a computational problem. The PA training problem is the problem of ap ..."
Abstract
-
Cited by 77 (0 self)
- Add to MetaCart
We introduce a rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, and analyze the complexity of the training problem as a computational problem. The PA training problem is the problem of approximating an arbitrary, unknown source distribution by distributions generated by a PA. We investigate the following question about this important, well-studied problem: Does there exist an efficient training algorithm such that the trained PAs provably converge to a model close to an optimum one with high confidence, after only a feasibly small set of training data? We model this problem in the framework of computational learning theory and analyze the sample as well as computational complexity. We show that the number of examples required for training PAs is moderate -- essentially linear in the number of transition probabilities to be trained and a low-degree polynomial in the example l...
Robust Trainability of Single Neurons
, 1995
"... It is well known that (McCulloch-Pitts) neurons are efficiently trainable to learn an unknown halfspace from examples, using linear-programming methods. We want to analyze how the learning performance degrades when the representational power of the neuron is overstrained, i.e., if more complex conce ..."
Abstract
-
Cited by 75 (0 self)
- Add to MetaCart
It is well known that (McCulloch-Pitts) neurons are efficiently trainable to learn an unknown halfspace from examples, using linear-programming methods. We want to analyze how the learning performance degrades when the representational power of the neuron is overstrained, i.e., if more complex concepts than just halfspaces are allowed. We show that the problem of learning a probably almost optimal weight vector for a neuron is so difficult that the minimum error cannot even be approximated to within a constant factor in polynomial time (unless RP = NP); we obtain the same hardness result for several variants of this problem. We considerably strengthen these negative results for neurons with binary weights 0 or 1. We also show that neither heuristical learning nor learning by sigmoidal neurons with a constant reject rate is efficiently possible (unless RP = NP).
Learning Factor Graphs in Polynomial Time and Sample Complexity
- JMLR
, 2006
"... We study the computational and sample complexity of parameter and structure learning in graphical models. Our main result shows that the class of factor graphs with bounded degree can be learned in polynomial time and from a polynomial number of training examples, assuming that the data is genera ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
We study the computational and sample complexity of parameter and structure learning in graphical models. Our main result shows that the class of factor graphs with bounded degree can be learned in polynomial time and from a polynomial number of training examples, assuming that the data is generated by a network in this class. This result covers both parameter estimation for a known network structure and structure learning. It implies as a corollary that we can learn factor graphs for both Bayesian networks and Markov networks of bounded degree, in polynomial time and sample complexity. Importantly, unlike standard maximum likelihood estimation algorithms, our method does not require inference in the underlying network, and so applies to networks where inference is intractable. We also show that the error of our learned model degrades gracefully when the generating distribution is not a member of the target class of networks. In addition to our main result, we show that the sample complexity of parameter learning in graphical models has an O(1) dependence on the number of variables in the model when using the KL-divergence normalized by the number of variables as the performance criterion.
Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model
- SIAM Journal on Computing
, 1998
"... The j-State General Markov Model of evolution (due to Steel) is a stochastic model concerned with the evolution of strings over an alphabet of size j . In particular, the TwoState General Markov Model of evolution generalises the well-known Cavender-FarrisNeyman model of evolution by removing the sy ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
The j-State General Markov Model of evolution (due to Steel) is a stochastic model concerned with the evolution of strings over an alphabet of size j . In particular, the TwoState General Markov Model of evolution generalises the well-known Cavender-FarrisNeyman model of evolution by removing the symmetry restriction (which requires that the probability that a `0' turns into a `1' along an edge is the same as the probability that a `1' turns into a `0' along the edge). Farach and Kannan showed how to PAClearn Markov Evolutionary Trees in the Cavender-Farris-Neyman model provided that the target tree satisfies the additional restriction that all pairs of leaves have a sufficiently high probability of being the same. We show how to remove both restrictions and thereby obtain the first polynomial-time PAC-learning algorithm (in the sense of Kearns et al.) for the general class of Two-State Markov Evolutionary Trees. Research Report RR347, Department of Computer Science, University of Wa...
A Geometric Approach to Leveraging Weak Learners
- Computational Learning Theory: 4th European Conference (EuroCOLT '99
, 1998
"... . AdaBoost is a popular and effective leveraging procedure for improving the hypotheses generated by weak learning algorithms. AdaBoost and many other leveraging algorithms can be viewed as performing a constrained gradient descent over a potential function. At each iteration the distribution over t ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
. AdaBoost is a popular and effective leveraging procedure for improving the hypotheses generated by weak learning algorithms. AdaBoost and many other leveraging algorithms can be viewed as performing a constrained gradient descent over a potential function. At each iteration the distribution over the sample given to the weak learner is the direction of steepest descent. We introduce a new leveraging algorithm based on a natural potential function. For this potential function, the direction of steepest descent can have negative components. Therefore we provide two transformations for obtaining suitable distributions from these directions of steepest descent. The resulting algorithms have bounds that are incomparable to AdaBoost's, and their empirical performance is similar to AdaBoost's. 1 Introduction Algorithms like AdaBoost [7] that are able to improve the hypotheses generated by weak learning methods have great potential and practical benefits. We call any such algorithm a leverag...
The sample complexity of learning fixed-structure Bayesian networks
- Machine Learning
, 1997
"... Abstract. We consider the problem of PAC learning probabilistic networks in the case where the structure of the net is specified beforehand. We allow the conditional probabilities to be represented in any manner (as tables or specialized functions) and obtain sample complexity bounds for learning ne ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Abstract. We consider the problem of PAC learning probabilistic networks in the case where the structure of the net is specified beforehand. We allow the conditional probabilities to be represented in any manner (as tables or specialized functions) and obtain sample complexity bounds for learning nets with and without hidden nodes.
Parameter Learning in Object Oriented Bayesian Networks
, 2001
"... This paper describes a method for parameter learning in Object-Oriented Bayesian Networks (OOBNs). We propose a methodology for learning parameters in OOBNs, and prove that maintaining the object orientation imposed by the prior model will increase the learning speed in object-oriented domains. We a ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
This paper describes a method for parameter learning in Object-Oriented Bayesian Networks (OOBNs). We propose a methodology for learning parameters in OOBNs, and prove that maintaining the object orientation imposed by the prior model will increase the learning speed in object-oriented domains. We also propose a method to efficiently estimate the probability parameters in domains that are not strictly object oriented. Finally, we attack type uncertainty, a special case of model uncertainty typical to object-oriented domains
Collective Mining of Bayesian Networks from Distributed Heterogeneous Data
, 2002
"... We present a collective approach to learning a Bayesian network from distributed heterogenous data. In this approach, we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local an ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
We present a collective approach to learning a Bayesian network from distributed heterogenous data. In this approach, we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local and non-local variables and transmits a subset of these observations to a central site. Another Bayesian network is learnt at the central site using the data transmitted from the local site. The local and central Bayesian networks are combined to obtain a collective Bayesian network, that models the entire data. Experimental results and theoretical justification that demonstrate the feasibility of our approach are presented.
Learning and Approximation Algorithms for problems motivated by Evolutionary Trees
, 1999
"... vi Chapter 1 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Biological Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Models and Methods . . . . . . ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
vi Chapter 1 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Biological Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Models and Methods . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Learning in the General Markov Model . . . . . . . . . . . . . . . 15 1.3.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3.2 Learning Problems for Evolutionary Trees . . . . . . . . . 19 1.4 Layout of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Chapter 2 Learning Two-State Markov Evolutionary Trees 28 2.1 Previous research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.1.1 The General Idea . . . . . . . . . . . . . . . . . . . . . . . . 28 2.1.2 Previous work on learning the distribution . . . . . . . . . 34 2.1.3 Previous work on finding the topology . . . . . . . . . . . . 39 ii 2.1.4 Re...

