Results 1 - 10
of
10
Adaptive Algorithms for Cache-efficient Trie Search
- in ACM and SIAM Workshop on Algorithm Engineering and Experimentation
, 1999
"... ..."
A Faster Scrabble Move Generation Algorithm
- Softw. Pract. Exp
, 1994
"... This paper presents a faster algorithm that uses a GADDAG, a finite automaton that avoids the non-deterministic prefix generation of the DAWG algorithm by encoding a bidirectional path starting from each letter of each word in the lexicon. For a typical lexicon, the GADDAG is nearly five times large ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper presents a faster algorithm that uses a GADDAG, a finite automaton that avoids the non-deterministic prefix generation of the DAWG algorithm by encoding a bidirectional path starting from each letter of each word in the lexicon. For a typical lexicon, the GADDAG is nearly five times larger than the DAWG, but generates moves more than twice as fast. This time/space trade-off is justified not only by the decreasing cost of computer memory, but also by the extensive use of move-generation in the analysis of board positions used by Gordon in the probabilistic search for the most appropriate play in a given position within realistic time constraints
Text Augmentation: Inserting XML tags into natural language text with PPM Models and Viterbi-like search
, 2003
"... This thesis develops work on using Hidden Markov Models to insert tags natural language text. A taxonomy of tags is developed unifying the fields of text segmentation tagging, part-of-speech tagging, proper noun extraction and hierarchical entity extraction. The search spaces for inserting tags are ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This thesis develops work on using Hidden Markov Models to insert tags natural language text. A taxonomy of tags is developed unifying the fields of text segmentation tagging, part-of-speech tagging, proper noun extraction and hierarchical entity extraction. The search spaces for inserting tags are examined from both a theoretical and experimental point of view across the taxonomy and on four corpora. A analysis of different correctness measures for different types of tag insertion problem is undertaken and a technique to determine whether tag-insertion errors are the result of a modelling failure or a searching failure is discovered.
Application of Finite Automata in Debugging Natural Language Vocabularies
- Journal of the Brazilian Computer Society
, 1993
"... Finite acyclic automata can be used as a very versatile tool in many applications involving natural language vocabularies. This work describes some experiments in "debugging" semi-automatically such vocabularies, i. e. suggesting non-existent and missing words. Partial statistics are shown for Portu ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Finite acyclic automata can be used as a very versatile tool in many applications involving natural language vocabularies. This work describes some experiments in "debugging" semi-automatically such vocabularies, i. e. suggesting non-existent and missing words. Partial statistics are shown for Portuguese, Italian and English vocabularies.
Finite Automata and Efficient Lexicon Implementation
, 1988
"... We describe a general technique for the encoding of lexical functions --- such as lexical classification, gender and number marking, inflections and conjugations --- using minimized acyclic finite-state automata. This technique has been used to store a Portuguese lexicon with over 2 million entries ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We describe a general technique for the encoding of lexical functions --- such as lexical classification, gender and number marking, inflections and conjugations --- using minimized acyclic finite-state automata. This technique has been used to store a Portuguese lexicon with over 2 million entries in about 1 megabyte. Unlike general file compression schemes, this representation allows random access to the stored data. Moreover it allows the lexical functions and their inverses to be computed at negligible cost. The technique can be easily adapted to practically any language or lexical classification scheme, and this task does not require any knowledge of the programs or data structures. 1 Introduction A minimized acyclic finite automaton provides an efficient technique for storing and retrieving a finite set of strings over a finite alphabet. The most obvious usage, within the domain of natural language processing, is the representation of the vocabulary of a language, without any ad...
Morphological annotation of Korean with Directly Maintainable Resources
"... This article describes an exclusively resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. Our annotator is designed to process text before the operation of a syntactic parser. In its present state, it annotates one-stem words only. The outpu ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This article describes an exclusively resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. Our annotator is designed to process text before the operation of a syntactic parser. In its present state, it annotates one-stem words only. The output is a graph of morphemes annotated with accurate linguistic information. The granularity of the tagset is 3 to 5 times higher than usual tagsets. A comparison with a reference annotated corpus showed that it achieves 89 % recall without any corpus training. The language resources used by the system are lexicons of stems, transducers of suffixes and transducers of generation of allomorphs. All can be easily updated, which allows users to control the evolution of the performances of the system. It has been claimed that morphological annotation of Korean text could only be performed by a morphological analysis module accessing a lexicon of morphemes. We show that it can also be performed directly with a lexicon of words and without applying morphological rules at annotation time, which speeds up annotation to 1,210 word/s. The lexicon of words is obtained from the maintainable language resources through a fully automated compilation process. 1.
Regional Versus Global Finite-State Error Repair ⋆
"... Abstract. We focus on the domain of a regional least-cost strategy in order to illustrate the viability of non-global repair models over finitestate architectures. Our interest is justified by the difficulty, shared by all repair proposals, to determine how far to validate. A short validation may fa ..."
Abstract
- Add to MetaCart
Abstract. We focus on the domain of a regional least-cost strategy in order to illustrate the viability of non-global repair models over finitestate architectures. Our interest is justified by the difficulty, shared by all repair proposals, to determine how far to validate. A short validation may fail to gather sufficient information, and in a long one most of the effort can be wasted. The goal is to prove that our approach can provide, in practice, a performance and quality comparable to that attained by global criteria, with a significant saving in time and space. To the best of our knowledge, this is the first discussion of its kind. 1
How to Squeeze a Lexicon
"... Minimal acyclic deterministic finite automata (ADFAs) can be used as a compact representation of finite string sets with fast access time. Creating them with traditional algorithms of DFA minimization is a resource hog when a large collection of strings is involved. This paper aims to popularize a ..."
Abstract
- Add to MetaCart
Minimal acyclic deterministic finite automata (ADFAs) can be used as a compact representation of finite string sets with fast access time. Creating them with traditional algorithms of DFA minimization is a resource hog when a large collection of strings is involved. This paper aims to popularize an efficient but little known algorithm for creating minimal ADFAs recognizing a finite language, invented independently by several authors. The algorithm is presented for three variants of ADFAs, its minor improvements are discussed, and minimal ADFAs are compared to competitive data structures.
A Prolog morphological analyzer for Portuguese
, 1994
"... This paper describes a morphological analyzer for Portuguese written in Prolog. It understands about the standard declinations for noun, adjectives, and regular verbs and it also understands about prefixes and suffixes. The system is not only able to recognize a word if its root is in the dictionary ..."
Abstract
- Add to MetaCart
This paper describes a morphological analyzer for Portuguese written in Prolog. It understands about the standard declinations for noun, adjectives, and regular verbs and it also understands about prefixes and suffixes. The system is not only able to recognize a word if its root is in the dictionary, but also to infer (or rather guess) the lexical classes of words whose roots are not in the dictionary by using its knowledge of declinations and suffixes.

