Results 1  10
of
13
Efficient Distributionfree Learning of Probabilistic Concepts
 Journal of Computer and System Sciences
, 1993
"... In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behaviorthus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic c ..."
Abstract

Cited by 197 (8 self)
 Add to MetaCart
In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behaviorthus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic concepts (or pconcepts) may arise in situations such as weather prediction, where the measured variables and their accuracy are insufficient to determine the outcome with certainty. We adopt from the Valiant model of learning [27] the demands that learning algorithms be efficient and general in the sense that they perform well for a wide class of pconcepts and for any distribution over the domain. In addition to giving many efficient algorithms for learning natural classes of pconcepts, we study and develop in detail an underlying theory of learning pconcepts. 1 Introduction Consider the following scenarios: A meteorologist is attempting to predict tomorrow's weather as accurately as pos...
On the Computational Complexity of Approximating Distributions by Probabilistic Automata
 Machine Learning
, 1990
"... We introduce a rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, and analyze the complexity of the training problem as a computational problem. The PA training problem is the problem of ap ..."
Abstract

Cited by 86 (0 self)
 Add to MetaCart
We introduce a rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, and analyze the complexity of the training problem as a computational problem. The PA training problem is the problem of approximating an arbitrary, unknown source distribution by distributions generated by a PA. We investigate the following question about this important, wellstudied problem: Does there exist an efficient training algorithm such that the trained PAs provably converge to a model close to an optimum one with high confidence, after only a feasibly small set of training data? We model this problem in the framework of computational learning theory and analyze the sample as well as computational complexity. We show that the number of examples required for training PAs is moderate  essentially linear in the number of transition probabilities to be trained and a lowdegree polynomial in the example l...
Robust Trainability of Single Neurons
, 1995
"... It is well known that (McCullochPitts) neurons are efficiently trainable to learn an unknown halfspace from examples, using linearprogramming methods. We want to analyze how the learning performance degrades when the representational power of the neuron is overstrained, i.e., if more complex conce ..."
Abstract

Cited by 84 (0 self)
 Add to MetaCart
It is well known that (McCullochPitts) neurons are efficiently trainable to learn an unknown halfspace from examples, using linearprogramming methods. We want to analyze how the learning performance degrades when the representational power of the neuron is overstrained, i.e., if more complex concepts than just halfspaces are allowed. We show that the problem of learning a probably almost optimal weight vector for a neuron is so difficult that the minimum error cannot even be approximated to within a constant factor in polynomial time (unless RP = NP); we obtain the same hardness result for several variants of this problem. We considerably strengthen these negative results for neurons with binary weights 0 or 1. We also show that neither heuristical learning nor learning by sigmoidal neurons with a constant reject rate is efficiently possible (unless RP = NP).
Learning factor graphs in polynomial time and sample complexity. JMLR
, 2006
"... We study the computational and sample complexity of parameter and structure learning in graphical models. Our main result shows that the class of factor graphs with bounded degree can be learned in polynomial time and from a polynomial number of training examples, assuming that the data is generated ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
We study the computational and sample complexity of parameter and structure learning in graphical models. Our main result shows that the class of factor graphs with bounded degree can be learned in polynomial time and from a polynomial number of training examples, assuming that the data is generated by a network in this class. This result covers both parameter estimation for a known network structure and structure learning. It implies as a corollary that we can learn factor graphs for both Bayesian networks and Markov networks of bounded degree, in polynomial time and sample complexity. Importantly, unlike standard maximum likelihood estimation algorithms, our method does not require inference in the underlying network, and so applies to networks where inference is intractable. We also show that the error of our learned model degrades gracefully when the generating distribution is not a member of the target class of networks. In addition to our main result, we show that the sample complexity of parameter learning in graphical models has an O(1) dependence on the number of variables in the model when using the KLdivergence normalized by the number of variables as the performance criterion. 1
Evolutionary Trees can be Learned in Polynomial Time in the TwoState General Markov Model
 SIAM Journal on Computing
, 1998
"... The jState General Markov Model of evolution (due to Steel) is a stochastic model concerned with the evolution of strings over an alphabet of size j . In particular, the TwoState General Markov Model of evolution generalises the wellknown CavenderFarrisNeyman model of evolution by removing the sy ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
The jState General Markov Model of evolution (due to Steel) is a stochastic model concerned with the evolution of strings over an alphabet of size j . In particular, the TwoState General Markov Model of evolution generalises the wellknown CavenderFarrisNeyman model of evolution by removing the symmetry restriction (which requires that the probability that a `0' turns into a `1' along an edge is the same as the probability that a `1' turns into a `0' along the edge). Farach and Kannan showed how to PAClearn Markov Evolutionary Trees in the CavenderFarrisNeyman model provided that the target tree satisfies the additional restriction that all pairs of leaves have a sufficiently high probability of being the same. We show how to remove both restrictions and thereby obtain the first polynomialtime PAClearning algorithm (in the sense of Kearns et al.) for the general class of TwoState Markov Evolutionary Trees. Research Report RR347, Department of Computer Science, University of Wa...
A Geometric Approach to Leveraging Weak Learners
 Computational Learning Theory: 4th European Conference (EuroCOLT '99
, 1998
"... . AdaBoost is a popular and effective leveraging procedure for improving the hypotheses generated by weak learning algorithms. AdaBoost and many other leveraging algorithms can be viewed as performing a constrained gradient descent over a potential function. At each iteration the distribution over t ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
. AdaBoost is a popular and effective leveraging procedure for improving the hypotheses generated by weak learning algorithms. AdaBoost and many other leveraging algorithms can be viewed as performing a constrained gradient descent over a potential function. At each iteration the distribution over the sample given to the weak learner is the direction of steepest descent. We introduce a new leveraging algorithm based on a natural potential function. For this potential function, the direction of steepest descent can have negative components. Therefore we provide two transformations for obtaining suitable distributions from these directions of steepest descent. The resulting algorithms have bounds that are incomparable to AdaBoost's, and their empirical performance is similar to AdaBoost's. 1 Introduction Algorithms like AdaBoost [7] that are able to improve the hypotheses generated by weak learning methods have great potential and practical benefits. We call any such algorithm a leverag...
The sample complexity of learning fixedstructure Bayesian networks
 Machine Learning
, 1997
"... Abstract. We consider the problem of PAC learning probabilistic networks in the case where the structure of the net is specified beforehand. We allow the conditional probabilities to be represented in any manner (as tables or specialized functions) and obtain sample complexity bounds for learning ne ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
Abstract. We consider the problem of PAC learning probabilistic networks in the case where the structure of the net is specified beforehand. We allow the conditional probabilities to be represented in any manner (as tables or specialized functions) and obtain sample complexity bounds for learning nets with and without hidden nodes.
Collective Mining of Bayesian Networks from Distributed Heterogeneous Data
, 2002
"... We present a collective approach to learning a Bayesian network from distributed heterogenous data. In this approach, we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local an ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
We present a collective approach to learning a Bayesian network from distributed heterogenous data. In this approach, we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local and nonlocal variables and transmits a subset of these observations to a central site. Another Bayesian network is learnt at the central site using the data transmitted from the local site. The local and central Bayesian networks are combined to obtain a collective Bayesian network, that models the entire data. Experimental results and theoretical justification that demonstrate the feasibility of our approach are presented.
Parameter Learning in Object Oriented Bayesian Networks
, 2001
"... This paper describes a method for parameter learning in ObjectOriented Bayesian Networks (OOBNs). We propose a methodology for learning parameters in OOBNs, and prove that maintaining the object orientation imposed by the prior model will increase the learning speed in objectoriented domains. We a ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
This paper describes a method for parameter learning in ObjectOriented Bayesian Networks (OOBNs). We propose a methodology for learning parameters in OOBNs, and prove that maintaining the object orientation imposed by the prior model will increase the learning speed in objectoriented domains. We also propose a method to efficiently estimate the probability parameters in domains that are not strictly object oriented. Finally, we attack type uncertainty, a special case of model uncertainty typical to objectoriented domains
Empirical risk minimization with approximations of probabilistic grammars
 In NIPS
, 2010
"... Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of the parameters of a fixed probabilistic grammar using the logloss. We derive ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of the parameters of a fixed probabilistic grammar using the logloss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. 1