Results 1  10
of
30
Parameter Estimation for Probabilistic FiniteState Transducers
 Proc. of the Annual Meeting of the Association for Computational Linguistics
, 2002
"... Weighted finitestate transducers suffer from the lack of a training algorithm. Training is even harder for transducers that have been assembled via finitestate operations such as composition, minimization, union, concatenation, and closure, as this yields tricky parameter tying. We formulate a &qu ..."
Abstract

Cited by 58 (4 self)
 Add to MetaCart
Weighted finitestate transducers suffer from the lack of a training algorithm. Training is even harder for transducers that have been assembled via finitestate operations such as composition, minimization, union, concatenation, and closure, as this yields tricky parameter tying. We formulate a "parameterized FST" paradigm and give training algorithms for it, including a general bookkeeping trick ("expectation semirings") that cleanly and efficiently computes expectations and gradients.
Finite State Transducers with Predicates and Identities
 Grammars
, 2001
"... An extension to finite state transducers is presented, in which atomic symbols are replaced by arbitrary predicates over symbols. The extension is motivated by applications in natural language processing (but may be more widely applicable) as well as by the observation that transducers with predicat ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
An extension to finite state transducers is presented, in which atomic symbols are replaced by arbitrary predicates over symbols. The extension is motivated by applications in natural language processing (but may be more widely applicable) as well as by the observation that transducers with predicates generally have fewer states and fewer transitions. Although the extension is fairly trivial for finite state acceptors, the introduction of predicates is more interesting for transducers. It is shown how various operations on transducers (e.g. composition) can be implemented, as well as how the transducer determinization algorithm can be generalized for predicateaugmented finite state transducers.
Symbolic string verification: An automatabased approach
 in Proc. of SPIN, 2008
"... Abstract. We present an automatabased approach for the verification of string operations in PHP programs based on symbolic string analysis. String analysis is a static analysis technique that determines the values that a string expression can take during program execution at a given program point. ..."
Abstract

Cited by 32 (11 self)
 Add to MetaCart
(Show Context)
Abstract. We present an automatabased approach for the verification of string operations in PHP programs based on symbolic string analysis. String analysis is a static analysis technique that determines the values that a string expression can take during program execution at a given program point. This information can be used to verify that string values are sanitized properly and to detect programming errors and security vulnerabilities. In our string analysis approach, we encode the set of string values that string variables can take as automata. We implement all string functions using a symbolic automata representation (MBDD representation from the MONA automata package) and leverage efficient manipulations on MBDDs, e.g., determinization and minimization. Particularly, we propose a novel algorithm for languagebased replacement. Our replacement function takes three DFAs as arguments and outputs a DFA. Finally, we apply a widening operator defined on automata to approximate fixpoint computations. If this conservative approximation does not include any bad patterns (specified as regular expressions), we conclude that the program does not contain any errors or vulnerabilities. Our experimental results demonstrate that our approach works quite well in checking the correctness of sanitization operations in realworld PHP applications. 1
Approximation and exactness in finite state optimality theory
 In Coling Workshop Finite State Phonology
, 2000
"... Previous work (Frank and Satta, 1998; Karttunen, 1998) has shown that Optimality Theory with gradient constraints generally is not finite state. A new finitestate treatment of gradient constraints is presented which improves upon the approximation of Karttunen (1998). The method turns out to be exa ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Previous work (Frank and Satta, 1998; Karttunen, 1998) has shown that Optimality Theory with gradient constraints generally is not finite state. A new finitestate treatment of gradient constraints is presented which improves upon the approximation of Karttunen (1998). The method turns out to be exact, and very compact, for the syllabification analysis of Prince and Smolensky (1993). 1
Expectation Semirings: Flexible EM for Learning FiniteState Transducers
, 2001
"... This paper offers a clean way to combine the two traditions: an ExpectationMaximation (EM) [4] algorithm for training arbitrary FSTs. First the human expert uses domain knowledge to specify the topology and parameterization of the transducer in any convenient way. Then the EM algorithm automaticall ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
This paper offers a clean way to combine the two traditions: an ExpectationMaximation (EM) [4] algorithm for training arbitrary FSTs. First the human expert uses domain knowledge to specify the topology and parameterization of the transducer in any convenient way. Then the EM algorithm automatically chooses parameter values that (locally) maximize the joint likelihood of fully or partly observed data
Finite State Methods for Hyphenation
 NATURAL LANGUAGE ENGINEERING
, 2002
"... Hyphenation is the task of identifying potential hyphenation points in words. In this paper, three finitestate hyphenation methods for Dutch are presented and compared in terms of accuracy and size of the resulting automata. ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Hyphenation is the task of identifying potential hyphenation points in words. In this paper, three finitestate hyphenation methods for Dutch are presented and compared in terms of accuracy and size of the resulting automata.
Using HFST for Creating Computational Linguistic Applications*
"... Abstract. HFST – Helsinki FiniteState Technology (hfst.sf.net) is a framework for compiling and applying linguistic descriptions with finitestate methods. HFST currently collects some of the most important finitestate tools for creating morphologies and spellcheckers into one opensource platform ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract. HFST – Helsinki FiniteState Technology (hfst.sf.net) is a framework for compiling and applying linguistic descriptions with finitestate methods. HFST currently collects some of the most important finitestate tools for creating morphologies and spellcheckers into one opensource platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications. In this article, we focus on aspects of HFST that are new to the end user, i.e. new tools, new features in existing tools, or new language applications, in addition to some revised algorithms that increase performance.
A Finite State and DataOriented Method for Grapheme to Phoneme Conversion
, 2000
"... A finitestate method, based on leftmost longestmatch replacement, is presented for segmenting words into graphemes, and for converting graphemes into phonemes. A small set of handcrafted conversion rules for Dutch achieves a phoneme accuracy of over 93%. The accuracy of the system is further impro ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
A finitestate method, based on leftmost longestmatch replacement, is presented for segmenting words into graphemes, and for converting graphemes into phonemes. A small set of handcrafted conversion rules for Dutch achieves a phoneme accuracy of over 93%. The accuracy of the system is further improved by using transformationbased learning. The phoneme accuracy of the best system (using a large rule and a 'lazy' variant of Brill's algoritm), trained on only 40K words, reaches 99%.
Stochastic Contextual Edit Distance and Probabilistic FSTs
"... String similarity is most often measured by weighted or unweighted edit distance d(x, y). Ristad and Yianilos (1998) defined stochastic edit distance—a probability distribution p(y  x) whose parameters can be trained from data. We generalize this so that the probability of choosing each edit op ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
String similarity is most often measured by weighted or unweighted edit distance d(x, y). Ristad and Yianilos (1998) defined stochastic edit distance—a probability distribution p(y  x) whose parameters can be trained from data. We generalize this so that the probability of choosing each edit operation can depend on contextual features. We show how to construct and train a probabilistic finitestate transducer that computes our stochastic contextual edit distance. To illustrate the improvement from conditioning on context, we model typos found in social media text. 1
Disjunctive rule ordering in finite state morphology”. Paper presented at the 41st Meeting of the Chicago Linguistics Society
, 2005
"... Paradigm Function Morphology (PFM; Stump, 2001) is an elaborated realizationbased theory of inflectional morphology which is notable for its empirical scope and formal precision. As Karttunen (2003) shows, most of the apparatus of PFM can be straightforwardly mapped onto regular expressions or fini ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Paradigm Function Morphology (PFM; Stump, 2001) is an elaborated realizationbased theory of inflectional morphology which is notable for its empirical scope and formal precision. As Karttunen (2003) shows, most of the apparatus of PFM can be straightforwardly mapped onto regular expressions or finite state machines (FSMs). However, Karttunen’s implementation simplifies Stump’s theory slightly by assuming that at most one rule per block may be compatible with any given form. This allows rule blocks to be compiled into FSMs simply by composing the FSMs which implement the individual realization rules. However, this precludes the case where more than one potentially applicable rules competes to apply to a particular form. In what Stump argues is a crucial feature of PFM, such rule competition should be resolved according to Pān. ini’s principle: within each rule block, only the applicable rule with the narrowest domain is applied. In this talk, we will describe an alternative implementation of PFM as FSMs using van Noord and Gerdemann’s (2001) FSA Utilities. This implementation, while otherwise similar in many respects to Karttunen’s, uses Pān. ini’s principle to resolve rule competition and so is more faithful to Stump’s version of PFM.