Results 1  10
of
113
Property Testing and its connection to Learning and Approximation
"... We study the question of determining whether an unknown function has a particular property or is fflfar from any function with that property. A property testing algorithm is given a sample of the value of the function on instances drawn according to some distribution, and possibly may query the fun ..."
Abstract

Cited by 498 (68 self)
 Add to MetaCart
We study the question of determining whether an unknown function has a particular property or is fflfar from any function with that property. A property testing algorithm is given a sample of the value of the function on instances drawn according to some distribution, and possibly may query the function on instances of its choice. First, we establish some connections between property testing and problems in learning theory. Next, we focus on testing graph properties, and devise algorithms to test whether a graph has properties such as being kcolorable or having a aeclique (clique of density ae w.r.t the vertex set). Our graph property testing algorithms are probabilistic and make assertions which are correct with high probability, utilizing only poly(1=ffl) edgequeries into the graph, where ffl is the distance parameter. Moreover, the property testing algorithms can be used to efficiently (i.e., in time linear in the number of vertices) construct partitions of the graph which corre...
The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length
 Machine Learning
, 1996
"... . We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions gene ..."
Abstract

Cited by 226 (18 self)
 Add to MetaCart
(Show Context)
. We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KLdivergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in humanmachine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second ...
A Spectral Algorithm for Learning Hidden Markov Models
"... Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard; practitioners typically resort to search heuristics (such as the BaumWelch / EM algorithm) which suffer from ..."
Abstract

Cited by 120 (8 self)
 Add to MetaCart
Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard; practitioners typically resort to search heuristics (such as the BaumWelch / EM algorithm) which suffer from the usual local optima issues. We prove that under a natural separation condition (roughly analogous to those considered for learning mixture models), there is an efficient and provably correct algorithm for learning HMMs. The sample complexity of the algorithm does not explicitly depend on the number of distinct (discrete) observations—it implicitly depends on this number through spectral properties of the underlying HMM. This makes the algorithm particularly applicable to settings with a large number of observations, such as those in natural language processing where the space of observation is sometimes the words in a language. The algorithm is also simple: it employs only a singular value decomposition and matrix multiplications. 1
On the Learnability and Usage of Acyclic Probabilistic Finite Automata
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1995
"... We propose and analyze a distribution learning algorithm for a subclass of Acyclic Probabilistic Finite Automata (APFA). This subclass is characterized by a certain distinguishability property of the automata's states. Though hardness results are known for learning distributions generated by ge ..."
Abstract

Cited by 74 (3 self)
 Add to MetaCart
(Show Context)
We propose and analyze a distribution learning algorithm for a subclass of Acyclic Probabilistic Finite Automata (APFA). This subclass is characterized by a certain distinguishability property of the automata's states. Though hardness results are known for learning distributions generated by general APFAs, we prove that our algorithm can efficiently learn distributions generated by the subclass of APFAs we consider. In particular, we show that the KLdivergence between the distribution generated by the target source and the distribution generated by our hypothesis can be made arbitrarily small with high confidence in polynomial time. We present two applications of our algorithm. In the first, we show how to model cursively written letters. The resulting models are part of a complete cursive handwriting recognition system. In the second application we demonstrate how APFAs can be used to build multiplepronunciation models for spoken words. We evaluate the APFA based pronunciation models...
Streaming and sublinear approximation of entropy and information distances
 In ACMSIAM Symposium on Discrete Algorithms
, 2006
"... In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard ℓp norms. In this paper we design streaming and sublinear time property testing algorithms for entropy and various information theoretic distances. Batu et al posed the pr ..."
Abstract

Cited by 69 (13 self)
 Add to MetaCart
(Show Context)
In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard ℓp norms. In this paper we design streaming and sublinear time property testing algorithms for entropy and various information theoretic distances. Batu et al posed the problem of property testing with respect to the JensenShannon distance. We present optimal algorithms for estimating bounded, symmetric fdivergences (including the JensenShannon divergence and the Hellinger distance) between distributions in various property testing frameworks. Along the way, we close a (log n)/H gap between the upper and lower bounds for estimating entropy H, yielding an optimal algorithm over all values of the entropy. In a data stream setting (sublinear space), we give the first algorithm for estimating the entropy of a distribution. Our algorithm runs in polylogarithmic space and yields an asymptotic constant factor approximation scheme. An integral part of the algorithm is an interesting use of an F0 (the number of distinct elements in a set) estimation algorithm; we also provide other results along the space/time/approximation tradeoff curve. Our results have interesting structural implications that connect sublinear time and space constrained algorithms. The mediating model is the random order streaming model, which assumes the input is a random permutation of a multiset and was first considered by Munro and Paterson in 1980. We show that any property testing algorithm in the combined oracle model for calculating a permutation invariant functions can be simulated in the random order model in a single pass. This addresses a question raised by Feigenbaum et al regarding the relationship between property testing and stream algorithms. Further, we give a polylogspace PTAS for estimating the entropy of a one pass random order stream. This bound cannot be achieved in the combined oracle (generalized property testing) model. 1
Efficient Learning of Typical Finite Automata from Random Walks
, 1997
"... This paper describes new and efficient algorithms for learning deterministic finite automata. Our approach is primarily distinguished by two features: (1) the adoption of an averagecase setting to model the ``typical'' labeling of a finite automaton, while retaining a worstcase model for ..."
Abstract

Cited by 50 (9 self)
 Add to MetaCart
This paper describes new and efficient algorithms for learning deterministic finite automata. Our approach is primarily distinguished by two features: (1) the adoption of an averagecase setting to model the ``typical'' labeling of a finite automaton, while retaining a worstcase model for the underlying graph of the automaton, along with (2) a learning model in which the learner is not provided with the means to experiment with the machine, but rather must learn solely by observing the automaton's output behavior on a random input sequence. The main contribution of this paper is in presenting the first efficient algorithms for learning nontrivial classes of automata in an entirely passive learning model. We adopt an online learning model in which the learner is asked to predict the output of the next state, given the next symbol of the random input sequence; the goal of the learner is to make as few prediction mistakes as possible. Assuming the learner has a means of resetting the target machine to a fixed start state, we first present an efficient algorithm that
The complexity of approximating the entropy
 SIAM JOURNAL ON COMPUTING
, 2005
"... We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear ..."
Abstract

Cited by 49 (8 self)
 Add to MetaCart
(Show Context)
We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear time in the size of the domain is both necessary and sufficient for approximating the entropy. In the generation oracle model, the algorithm has access only to independent samples from the distribution. In this ( case, we show that a γmultiplicative approximation to the entropy can be obtained in O n (1+η)/γ2 log n time for distributions with entropy Ω(γ/η), where n is the size of the domain of the distribution and η is an arbitrarily small positive constant. We show that this model does not permit a multiplicative approximation to the entropy in general. For ( the class of distributions to which our upper bound applies, we obtain a lower bound of Ω n1/(2γ2) We next consider a combined oracle model in which the algorithm has access to both the
QUARK: Empirical assessment of automatonbased specification miners
 In WCRE
, 2006
"... Software is often built without specification. Tools to automatically extract specification from software are needed and many techniques have been proposed. One type of these specifications – temporal API specification – is often specified in the form of automaton. There has been much work on revers ..."
Abstract

Cited by 46 (15 self)
 Add to MetaCart
(Show Context)
Software is often built without specification. Tools to automatically extract specification from software are needed and many techniques have been proposed. One type of these specifications – temporal API specification – is often specified in the form of automaton. There has been much work on reverse engineering or mining software temporal specification, using dynamic analysis techniques; i.e., analysis of software program traces. Unfortunately, the issues of scalability, robustness and accuracy of these techniques have not been comprehensively addressed. In this paper, we describe QUARK(QUality Assurance framewoRK) that enables assessments of the performance of a specification miner in generating temporal specification of software through traces recorded from its API interaction. QUARK requires the temporal specification produced by the miner to be expressed as an automaton. It accepts a userdefined simulator automaton and a specification miner. It produces quality assurance measures on the specification generated by the miner. Extensive experiments on 3 specification miners have been performed to demonstrate the usefulness of our proposed framework. 1
Learning mixtures of product distributions over discrete domains
 SIAM J. Comput
"... Abstract. We consider the problem of learning mixtures of product distributions over discrete domains in the distribution learning framework introduced by Kearns et al. [Proceedings of the 26th Annual Symposium on Theory of Computing (STOC), Montréal, QC, 1994, ACM, New York, pp. 273–282]. We give a ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the problem of learning mixtures of product distributions over discrete domains in the distribution learning framework introduced by Kearns et al. [Proceedings of the 26th Annual Symposium on Theory of Computing (STOC), Montréal, QC, 1994, ACM, New York, pp. 273–282]. We give a poly(n/ɛ)time algorithm for learning a mixture of k arbitrary product distributions over the ndimensional Boolean cube {0, 1} n to accuracy ɛ, for any constant k. Previous polynomialtime algorithms could achieve this only for k = 2 product distributions; our result answers an open question stated independently in [M. Cryan, Learning and Approximation Algorithms for Problems Motivated by Evolutionary Trees, Ph.D. thesis, University of Warwick