Results 1 - 10
of
30
Parameter Estimation for Probabilistic Finite-State Transducers
- Proc. of the Annual Meeting of the Association for Computational Linguistics
, 2002
"... Weighted finite-state transducers suffer from the lack of a training algorithm. Training is even harder for transducers that have been assembled via finite-state operations such as composition, minimization, union, concatenation, and closure, as this yields tricky parameter tying. We formulate a &qu ..."
Abstract
-
Cited by 58 (4 self)
- Add to MetaCart
Weighted finite-state transducers suffer from the lack of a training algorithm. Training is even harder for transducers that have been assembled via finite-state operations such as composition, minimization, union, concatenation, and closure, as this yields tricky parameter tying. We formulate a "parameterized FST" paradigm and give training algorithms for it, including a general bookkeeping trick ("expectation semirings") that cleanly and efficiently computes expectations and gradients.
Finite State Transducers with Predicates and Identities
- Grammars
, 2001
"... An extension to finite state transducers is presented, in which atomic symbols are replaced by arbitrary predicates over symbols. The extension is motivated by applications in natural language processing (but may be more widely applicable) as well as by the observation that transducers with predicat ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
An extension to finite state transducers is presented, in which atomic symbols are replaced by arbitrary predicates over symbols. The extension is motivated by applications in natural language processing (but may be more widely applicable) as well as by the observation that transducers with predicates generally have fewer states and fewer transitions. Although the extension is fairly trivial for finite state acceptors, the introduction of predicates is more interesting for transducers. It is shown how various operations on transducers (e.g. composition) can be implemented, as well as how the transducer determinization algorithm can be generalized for predicate-augmented finite state transducers.
Symbolic string verification: An automata-based approach
- in Proc. of SPIN, 2008
"... Abstract. We present an automata-based approach for the verification of string operations in PHP programs based on symbolic string analysis. String analysis is a static analysis technique that determines the values that a string expression can take during program execution at a given program point. ..."
Abstract
-
Cited by 32 (11 self)
- Add to MetaCart
(Show Context)
Abstract. We present an automata-based approach for the verification of string operations in PHP programs based on symbolic string analysis. String analysis is a static analysis technique that determines the values that a string expression can take during program execution at a given program point. This information can be used to verify that string values are sanitized properly and to detect programming errors and security vulnerabilities. In our string analysis approach, we encode the set of string values that string variables can take as automata. We implement all string functions using a symbolic automata representation (MBDD representation from the MONA automata package) and leverage efficient manipulations on MBDDs, e.g., determinization and minimization. Particularly, we propose a novel algorithm for language-based replacement. Our replacement function takes three DFAs as arguments and outputs a DFA. Finally, we apply a widening operator defined on automata to approximate fixpoint computations. If this conservative approximation does not include any bad patterns (specified as regular expressions), we conclude that the program does not contain any errors or vulnerabilities. Our experimental results demonstrate that our approach works quite well in checking the correctness of sanitization operations in real-world PHP applications. 1
Approximation and exactness in finite state optimality theory
- In Coling Workshop Finite State Phonology
, 2000
"... Previous work (Frank and Satta, 1998; Karttunen, 1998) has shown that Optimality Theory with gradient constraints generally is not finite state. A new finite-state treatment of gradient constraints is presented which improves upon the approximation of Karttunen (1998). The method turns out to be exa ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Previous work (Frank and Satta, 1998; Karttunen, 1998) has shown that Optimality Theory with gradient constraints generally is not finite state. A new finite-state treatment of gradient constraints is presented which improves upon the approximation of Karttunen (1998). The method turns out to be exact, and very compact, for the syllabification analysis of Prince and Smolensky (1993). 1
Expectation Semirings: Flexible EM for Learning Finite-State Transducers
, 2001
"... This paper offers a clean way to combine the two traditions: an Expectation-Maximation (EM) [4] algorithm for training arbitrary FSTs. First the human expert uses domain knowledge to specify the topology and parameterization of the transducer in any convenient way. Then the EM algorithm automaticall ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper offers a clean way to combine the two traditions: an Expectation-Maximation (EM) [4] algorithm for training arbitrary FSTs. First the human expert uses domain knowledge to specify the topology and parameterization of the transducer in any convenient way. Then the EM algorithm automatically chooses parameter values that (locally) maximize the joint likelihood of fully or partly observed data
Finite State Methods for Hyphenation
- NATURAL LANGUAGE ENGINEERING
, 2002
"... Hyphenation is the task of identifying potential hyphenation points in words. In this paper, three finite-state hyphenation methods for Dutch are presented and compared in terms of accuracy and size of the resulting automata. ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Hyphenation is the task of identifying potential hyphenation points in words. In this paper, three finite-state hyphenation methods for Dutch are presented and compared in terms of accuracy and size of the resulting automata.
Using HFST for Creating Computational Linguistic Applications*
"... Abstract. HFST – Helsinki Finite-State Technology (hfst.sf.net) is a framework for compiling and applying linguistic descriptions with finitestate methods. HFST currently collects some of the most important finite-state tools for creating morphologies and spellcheckers into one open-source platform ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract. HFST – Helsinki Finite-State Technology (hfst.sf.net) is a framework for compiling and applying linguistic descriptions with finitestate methods. HFST currently collects some of the most important finite-state tools for creating morphologies and spellcheckers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications. In this article, we focus on aspects of HFST that are new to the end user, i.e. new tools, new features in existing tools, or new language applications, in addition to some revised algorithms that increase performance.
A Finite State and Data-Oriented Method for Grapheme to Phoneme Conversion
, 2000
"... A finite-state method, based on leftmost longestmatch replacement, is presented for segmenting words into graphemes, and for converting graphemes into phonemes. A small set of hand-crafted conversion rules for Dutch achieves a phoneme accuracy of over 93%. The accuracy of the system is further impro ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
A finite-state method, based on leftmost longestmatch replacement, is presented for segmenting words into graphemes, and for converting graphemes into phonemes. A small set of hand-crafted conversion rules for Dutch achieves a phoneme accuracy of over 93%. The accuracy of the system is further improved by using transformation-based learning. The phoneme accuracy of the best system (using a large rule and a 'lazy' variant of Brill's algoritm), trained on only 40K words, reaches 99%.
Stochastic Contextual Edit Distance and Probabilistic FSTs
"... String similarity is most often measured by weighted or unweighted edit distance d(x, y). Ristad and Yianilos (1998) de-fined stochastic edit distance—a probabil-ity distribution p(y | x) whose parame-ters can be trained from data. We general-ize this so that the probability of choosing each edit op ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
String similarity is most often measured by weighted or unweighted edit distance d(x, y). Ristad and Yianilos (1998) de-fined stochastic edit distance—a probabil-ity distribution p(y | x) whose parame-ters can be trained from data. We general-ize this so that the probability of choosing each edit operation can depend on contex-tual features. We show how to construct and train a probabilistic finite-state trans-ducer that computes our stochastic con-textual edit distance. To illustrate the im-provement from conditioning on context, we model typos found in social media text. 1
Disjunctive rule ordering in finite state morphology”. Paper presented at the 41st Meeting of the Chicago Linguistics Society
, 2005
"... Paradigm Function Morphology (PFM; Stump, 2001) is an elaborated realization-based theory of inflectional morphology which is notable for its empirical scope and formal precision. As Karttunen (2003) shows, most of the apparatus of PFM can be straightforwardly mapped onto regular expressions or fini ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Paradigm Function Morphology (PFM; Stump, 2001) is an elaborated realization-based theory of inflectional morphology which is notable for its empirical scope and formal precision. As Karttunen (2003) shows, most of the apparatus of PFM can be straightforwardly mapped onto regular expressions or finite state machines (FSMs). However, Karttunen’s implementation sim-plifies Stump’s theory slightly by assuming that at most one rule per block may be compatible with any given form. This allows rule blocks to be compiled into FSMs simply by composing the FSMs which implement the individual realization rules. However, this precludes the case where more than one potentially applicable rules competes to apply to a particular form. In what Stump argues is a crucial feature of PFM, such rule competition should be resolved ac-cording to Pān. ini’s principle: within each rule block, only the applicable rule with the narrowest domain is applied. In this talk, we will describe an alternative implementation of PFM as FSMs using van Noord and Gerdemann’s (2001) FSA Utilities. This implementation, while otherwise similar in many respects to Karttunen’s, uses Pān. ini’s principle to resolve rule competition and so is more faithful to Stump’s version of PFM.