Results 1 
9 of
9
New techniques for regular expression searching
 Algorithmica
, 2005
"... We present two new techniques for regular expression searching and use them to derive faster practical algorithms. Based on the specific properties of Glushkov’s nondeterministic finite automaton construction algorithm, we show how to encode a deterministic finite automaton (DFA) using O(m2 m) bits, ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
We present two new techniques for regular expression searching and use them to derive faster practical algorithms. Based on the specific properties of Glushkov’s nondeterministic finite automaton construction algorithm, we show how to encode a deterministic finite automaton (DFA) using O(m2 m) bits, where m is the number of characters, excluding operator symbols, in the regular expression. This compares favorably against the worst case of O(m2 m Σ) bits needed by a classical DFA representation (where Σ is the alphabet) and O(m2 2m) bits needed by the Wu and Manber approach implemented in Agrep. We also present a new way to search for regular expressions, which is able to skip text characters. The idea is to determine the minimum length ℓ of a string matching the regular expression, manipulate the original automaton so that it recognizes all the reverse prefixes of length up to ℓ of the strings originally accepted, and use it to skip text characters as done for exact string matching in previous work. We combine these techniques into two algorithms, one able and one unable to skip text characters. The algorithms are simple to implement, and our experiments show that they permit fast searching for regular expressions, normally faster than any existing algorithm. 1
A General Weighted Grammar Library
 IN PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON AUTOMATA (CIAA 2004
, 2004
"... We present a general weighted grammar software library, the GRM Library, that can be used in a variety of applications in text, speech, and biosequence processing. The underlying algorithms were designed to support a wide variety of semirings and the representation and use of very large grammars ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
We present a general weighted grammar software library, the GRM Library, that can be used in a variety of applications in text, speech, and biosequence processing. The underlying algorithms were designed to support a wide variety of semirings and the representation and use of very large grammars and automata of several hundred million rules or transitions. We describe several algorithms and utilities of this library and point out in each case their application to several text and speech processing tasks.
Lattice Kernels for Spoken Dialog Classification
 In Proceedings ICASSP'03, Hong Kong
, 2003
"... Classification is a key task in spokendialog systems. The response of a spokendialog system is often guided by the category assigned to the speaker’s utterance. Unfortunately, classifiers based on the onebest transcription of the speech utterances are not satisfactory because of the high word err ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Classification is a key task in spokendialog systems. The response of a spokendialog system is often guided by the category assigned to the speaker’s utterance. Unfortunately, classifiers based on the onebest transcription of the speech utterances are not satisfactory because of the high word error rate of conversational speech recognition systems. Since the correct transcription may not be the highest ranking one but often will be represented in the word lattices output by the recognizer, the classification accuracy can be much higher if the full lattice is exploited both during training and classification. In this paper we present the first principled approach for classification based on full lattices. For this purpose, we use the Support Vector Machine (SVM) framework with kernels for lattices. The lattice kernel we define belongs to the general class of rational kernels. We give efficient algorithms for computing kernels for arbitrary lattices and report experiments using the algorithm in a difficult callclassification task with ¢¤ £ categories. Our experiments with a trigram lattice kernel show a ¥§¦© ¨ reduction in error rate at a ¢©�© ¨ rejection level. 1.
The Design Principles and Algorithms of a General Weighted Grammar Library
"... We present the software design principles, algorithms, and utilities of a general weighted grammar library, the GRM Library, that can be used in a variety of applications in text, speech, and biosequence processing. Several of the algorithms and utilities of this library are described, including in ..."
Abstract
 Add to MetaCart
We present the software design principles, algorithms, and utilities of a general weighted grammar library, the GRM Library, that can be used in a variety of applications in text, speech, and biosequence processing. Several of the algorithms and utilities of this library are described, including in some cases their pseudocodes and pointers to their use in applications. The algorithms and the utilities were designed to support a wide variety of semirings and the representation and use of large grammars and automata of several hundred million rules or transitions.
An Efficient Double Complementation Algorithm for SuperpositionBased FiniteState Morphology
"... anssi.ylijyra helsinki.fi This paper presents an efficient compilation algorithm that is several orders of magnitude faster than a standard method for context restriction rules. The new algorithm combines even hundreds of thousands of rules in parallel when the alphabet is large but the resulti ..."
Abstract
 Add to MetaCart
(Show Context)
anssi.ylijyra helsinki.fi This paper presents an efficient compilation algorithm that is several orders of magnitude faster than a standard method for context restriction rules. The new algorithm combines even hundreds of thousands of rules in parallel when the alphabet is large but the resulting automaton is sparse. The method opens new possibilities for representation of contextdependent lexical entries and the related processes. This is demonstrated by encoding complete HunSpell dictionaries as a single context restriction rule whose center placeholder in contexts is replaced with a new operation, called underline operation. The approach gives rise to new superpositionbased contextdependent lexicon formalisms and new methods for ondemand compilation and composition of twolevel morphology. 1
Rhetorical Systems 4 Crichton’s Close, Edinburgh
, 2004
"... This paper describes a novel method of compiling ranked tagging rules into a deterministic finitestate device called a bimachine. The rules are formulated in the framework of regular rewrite operations and allow unrestricted regular expressions in both left and right rule contexts. The compiler is ..."
Abstract
 Add to MetaCart
(Show Context)
This paper describes a novel method of compiling ranked tagging rules into a deterministic finitestate device called a bimachine. The rules are formulated in the framework of regular rewrite operations and allow unrestricted regular expressions in both left and right rule contexts. The compiler is illustrated by an application within a speech synthesis system. 1
Random Generation of Deterministic Acyclic Automata Using Markov Chains
, 2011
"... In this article we propose an algorithm, based on Markov chain techniques, to generate random automata that are deterministic, accessible and acyclic. The distribution of the output approaches the uniform distribution on nstate such automata. We then show how to adapt this algorithm in order to gen ..."
Abstract
 Add to MetaCart
(Show Context)
In this article we propose an algorithm, based on Markov chain techniques, to generate random automata that are deterministic, accessible and acyclic. The distribution of the output approaches the uniform distribution on nstate such automata. We then show how to adapt this algorithm in order to generate minimal acyclic automata with n states almost uniformly.