Results 1 - 10
of
35
Probabilistic Finite-State Machines - Part I
"... Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translatio ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translation are some of them. In part I of this paper we survey these generative objects and study their definitions and properties. In part II, we will study the relation of probabilistic finite-state automata with other well known devices that generate strings as hidden Markov models and n-grams, and provide theorems, algorithms and properties that represent a current state of the art of these objects.
Covariance in Unsupervised Learning of Probabilistic Grammars.
- In Proc. Journees Francophones de Programmation en Logique avec Contraintes.
, 2010
"... Abstract Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of well-understood, generalpur ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
(Show Context)
Abstract Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of well-understood, generalpurpose learning algorithms. There has been an increased interest in using probabilistic grammars in the Bayesian setting. To date, most of the literature has focused on using a Dirichlet prior. The Dirichlet prior has several limitations, including that it cannot directly model covariance between the probabilistic grammar's parameters. Yet, various grammar parameters are expected to be correlated because the elements in language they represent share linguistic properties. In this paper, we suggest an alternative to the Dirichlet prior, a family of logistic normal distributions. We derive an inference algorithm for this family of distributions and experiment with the task of dependency grammar induction, demonstrating performance improvements with our priors on a set of six treebanks in different natural languages. Our covariance framework permits soft parameter tying within grammars and across grammars for text in different languages, and we show empirical gains in a novel learning setting using bilingual, non-parallel data.
PAC-learnability of Probabilistic Deterministic Finite State Automata in terms of Variation Distance
- In Proceedings of ALT 05, LNAI 3734
, 2005
"... We consider the problem of PAC-learning distributions over strings, represented by probabilistic deterministic finite automata (PDFAs). PDFAs are a probabilistic model for the generation of strings of symbols, that have been used in the context of speech and handwriting recognition, and bioinformati ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
(Show Context)
We consider the problem of PAC-learning distributions over strings, represented by probabilistic deterministic finite automata (PDFAs). PDFAs are a probabilistic model for the generation of strings of symbols, that have been used in the context of speech and handwriting recognition, and bioinformatics. Recent work on learning PDFAs from random examples has used the KL-divergence as the error measure; here we use the variation distance. We build on recent work by Clark and Thollard, and show that the use of the variation distance allows simplifications to be made to the algorithms, and also a strengthening of the results; in particular that using the variation distance, we obtain polynomial sample size bounds that are independent of the expected length of strings. Key words: Computational complexity, machine learning 1
Analyzing the Errors of Unsupervised Learning
"... We identify four types of errors that unsupervised induction systems make and study each one in turn. Our contributions include (1) using a meta-model to analyze the incorrect biases of a model in a systematic way, (2) providing an efficient and robust method of measuring distance between two parame ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
(Show Context)
We identify four types of errors that unsupervised induction systems make and study each one in turn. Our contributions include (1) using a meta-model to analyze the incorrect biases of a model in a systematic way, (2) providing an efficient and robust method of measuring distance between two parameter settings of a model, and (3) showing that local optima issues which typically plague EM can be somewhat alleviated by increasing the number of training examples. We conduct our analyses on three models: the HMM, the PCFG, and a simple dependency model. 1
Probabilistic Finite-State Machines - Part II
"... Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked. In part I of this paper, we surveyed these objects and studied their properties. In this part II, we study the relations between probabilistic finit ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked. In part I of this paper, we surveyed these objects and studied their properties. In this part II, we study the relations between probabilistic finite-state automata and other well known devices that generate strings like hidden Markov models and n- grams, and provide theorems, algorithms and properties that represent a current state of the art of these objects.
Learnability of Probabilistic Automata via Oracles
"... Abstract. Efficient learnability using the state merging algorithm is known for a subclass of probabilistic automata termed µ-distinguishable. In this paper, we prove that state merging algorithms can be extended to efficiently learn a larger class of automata. In particular, we show learnability of ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract. Efficient learnability using the state merging algorithm is known for a subclass of probabilistic automata termed µ-distinguishable. In this paper, we prove that state merging algorithms can be extended to efficiently learn a larger class of automata. In particular, we show learnability of a subclass which we call µ2-distinguishable. Using an analog of the Myhill-Nerode theorem for probabilistic automata, we analyze µ-distinguishability and generalize it to µp-distinguishability. By combining new results from property testing with the state merging algorithm we obtain KL-PAC learnability of the new automata class. 1
Towards Feasible PAC-Learning of Probabilistic Deterministic Finite Automata
, 2008
"... We present an improvement of an algorithm due to Clark and ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
(Show Context)
We present an improvement of an algorithm due to Clark and
Empirical risk minimization for probabilistic grammars: Sample complexity and hardness of learning
- Computational Linguistics
, 2012
"... Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammar ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
(Show Context)
Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk. Learning from data is central to contemporary computational linguistics. It is in common in such learning to estimate a model in a parametric family using the maximum likelihood principle. This principle applies in the supervised case (i.e., using annotated
Learning and testing the bounded retransmission protocol
- University of Maryland, College Park, USA
"... Abstract Using a well-known industrial case study from the verification literature, the bounded retransmission protocol, we show how active learning can be used to establish the correctness of protocol implementation I relative to a given reference implementation R. Using active learning, we learn ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
(Show Context)
Abstract Using a well-known industrial case study from the verification literature, the bounded retransmission protocol, we show how active learning can be used to establish the correctness of protocol implementation I relative to a given reference implementation R. Using active learning, we learn a model M R of reference implementation R, which serves as input for a model based testing tool that checks conformance of implementation I to M R . In addition, we also explore an alternative approach in which we learn a model M I of implementation I, which is compared to model M R using an equivalence checker. Our work uses a unique combination of software tools for model construction (Uppaal), active learning (LearnLib, Tomte), model-based testing (JTorX, TorXakis) and verification (CADP, MRMC). We show how these tools can be used for learning these models, analyzing the obtained results, and improving the learning performance.