Results 1 -
8 of
8
Linear Contexts and the Sharing Functor: Techniques for Symbolic Computation
- in « Thirty Five Years of Automating Mathematics
, 2003
"... We present in this paper two design issues concerning fundamental representation structures for symbolic and logic computations. The first one concerns structured editing, or more generally the possibly destructive update of tree-like data-structures of inductive types. Instead of the standard imple ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We present in this paper two design issues concerning fundamental representation structures for symbolic and logic computations. The first one concerns structured editing, or more generally the possibly destructive update of tree-like data-structures of inductive types. Instead of the standard implementation of mutable data structures containing references, we advocate the zipper technology, fully applicative. This may be considered a disciplined use of pointer reversal techniques. We argue that zippers, i.e. unary contexts generalizing stacks, are concrete representations of linear functions on algebraic data types. The second method is a uniform sharing functor, which is a variation on the traditional technique of hashing, but controling the indexing function on the client side rather than on the server side, which allows the fine-tuning of bucket balancing, taking into account specific statistical properties of the application data. Such techniques are of general interest for symbolic computation applications such as structure editors, proof assistants, algebraic computation systems, and computational linguistics platforms.
Antisequential Suffix Sorting For BWT-Base Data Compression
- IEEE Transactions on Computers
, 2005
"... Abstract—Suffix sorting requires ordering all suffixes of all symbols in an input sequence and has applications in running queries on large texts and in universal lossless data compression based on the Burrows Wheeler transform (BWT). We propose a new suffix lists data structure that leads to three ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract—Suffix sorting requires ordering all suffixes of all symbols in an input sequence and has applications in running queries on large texts and in universal lossless data compression based on the Burrows Wheeler transform (BWT). We propose a new suffix lists data structure that leads to three fast, antisequential, and memory-efficient algorithms for suffix sorting. For a length-N input over a size-jXj alphabet, the worst-case complexities of these algorithms are ðN2Þ, OðjXjN logð N ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jXjÞÞ, and OðN jXj logð N jXjÞ q Þ, respectively. Furthermore, simulation results indicate performance that is competitive with other suffix sorting methods. In contrast, the suffix sorting methods that are fastest on standard test corpora have poor worst-case performance. Therefore, in comparison with other suffix sorting methods, suffix lists offer a useful trade off between practical performance and worst-case behavior. Another distinguishing feature of suffix lists is that these algorithms are simple; some of them can be implemented in VLSI. This could accelerate suffix sorting by at least an order of magnitude and enable high-speed BWT-based compression systems.
A Dictionary-based Multi-Corpora Text Compression System
- Proceedings of the IEEE Data Compression Conference 2003, Snowbird
"... Abstract In this paper we introduce StarZip, a multi-corpora lossless text compression utility which incorporates StarNT, our newly proposed transform algorithm. StarNT is a dictionary-based fast lossless text transform algorithm which utilizes ternary search tree to expedite transform encoding. For ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract In this paper we introduce StarZip, a multi-corpora lossless text compression utility which incorporates StarNT, our newly proposed transform algorithm. StarNT is a dictionary-based fast lossless text transform algorithm which utilizes ternary search tree to expedite transform encoding. For large files, viz. 400 Kbytes or more, our experiments show that the compression time is no worse than those obtained by bzip2 and gzip, and much faster than PPMD. However, if the file size is small, our algorithm is 28.1 % and 50.4 % slower than bzip2-9, gzip-9 respectively and 21.1 % faster compared to PPMD. We also achieve a superior compression ratio than almost all the other recent efforts based on BWT and PPM. StarNT is especially suitable for domain-specific lossless text compression used for archival storage and retrieval. Using domain-specific dictionaries, StarZip achieve an average improvement (in terms of BPC) of 13 % over bzip2 –9, 19% over Gzip-9, and 10 % over PPMD. 1.
TreePredict - Improving text entry on PDAs
, 2001
"... In this paper we describe how an improved word prediction implemented on a PDA can make it easier for users to enter text. The resulting predictions are a result of trigrams using POS-tags (Part Of Speech). The first two parts of the trigrams are POS-tagged, and the last part is extended into a tern ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we describe how an improved word prediction implemented on a PDA can make it easier for users to enter text. The resulting predictions are a result of trigrams using POS-tags (Part Of Speech). The first two parts of the trigrams are POS-tagged, and the last part is extended into a ternary tree, using information from the trigrams to narrow the search. With an improved predictions system, the users are more likely to trust the system, find it improves their ability to enter text with less keystrokes. It is also likely that they will to use the prediction feature more actively when they perceive that it is useful to them. Keywords Word prediction, ternary trees, hand held device, keystrokes.
An Adaptive Algorithm for Splitting Large Sets of Strings and Its Application to Efficient External Sorting
"... In this paper, we study the problem of sorting a large collection of strings in external memory. Based on adaptive construction of a summary data structure, called adaptive synopsis trie, we present a practical string sorting algorithm DistStrSort, which is suitable to sorting string collections of ..."
Abstract
- Add to MetaCart
In this paper, we study the problem of sorting a large collection of strings in external memory. Based on adaptive construction of a summary data structure, called adaptive synopsis trie, we present a practical string sorting algorithm DistStrSort, which is suitable to sorting string collections of large size in external memory, and also suitable for more complex string processing problems in text and semistructured databases such as counting, aggregation, and statistics. Case analyses of the algorithm and experiments on real datasets show the efficiency of our algorithm in realistic setting.
Worldwide Offices
"... For contact information, please visit www.vni.com/contact/worldwideoffices.php © 1970-2010 Visual Numerics, Inc. All rights reserved. Visual Numerics, IMSL and PV-WAVE are registered trademarks of Visual Numerics, Inc. in the U.S. and other countries. JMSL, JWAVE, TS-WAVE, PyIMSL and Knowledge in Mo ..."
Abstract
- Add to MetaCart
For contact information, please visit www.vni.com/contact/worldwideoffices.php © 1970-2010 Visual Numerics, Inc. All rights reserved. Visual Numerics, IMSL and PV-WAVE are registered trademarks of Visual Numerics, Inc. in the U.S. and other countries. JMSL, JWAVE, TS-WAVE, PyIMSL and Knowledge in Motion are trademarks of Visual Numerics, Inc. All other company, product or brand names are the property of their respective owners. IMPORTANT NOTICE: Information contained in this documentation is subject to change without notice. Use of this document is subject to the terms and conditions of a Visual Numerics Software License Agreement, including, without limitation, the Limited Warranty and Limitation of Liability. If you do not accept the terms of the license agreement, you may not use this documentation and should promptly return the product for a full refund. This documentation may not be copied or distributed in any form without the express written consent of Visual Numerics. Embeddable mathematical and statistical algorithms available for C, C#/.NET, Java™,
Web site:
"... For contact information, please visit www.vni.com/contact © 1970-2007 Visual Numerics, IMSL and PV-WAVE are registered trademarks of Visual Numerics, Inc. in the U.S. and other countries. JMSL, JWAVE, TS-WAVE and Knowledge in Motion are trademarks of Visual Numerics, Inc. All other company, product ..."
Abstract
- Add to MetaCart
For contact information, please visit www.vni.com/contact © 1970-2007 Visual Numerics, IMSL and PV-WAVE are registered trademarks of Visual Numerics, Inc. in the U.S. and other countries. JMSL, JWAVE, TS-WAVE and Knowledge in Motion are trademarks of Visual Numerics, Inc. All other company, product or brand names are the property of their respective owners. IMPORTANT NOTICE: Information contained in this documentation is subject to change without notice. Use of this document is subject to the terms and conditions of a Visual Numerics Software License Agreement, including, without limitation, the Limited Warranty and Limitation of Liability. If you do not accept the terms of the license agreement, you may not use this documentation and should promptly return the product for a full refund. This documentation may not be copied or distributed in any form
Fax: +81-011-706-7680Efficient Algorithms on Sequence Binary Decision Diagrams for Manipulating Sets of Strings
, 2011
"... Abstract. We consider sequence binary decision diagrams (sequence BDD or SDD, for short), which are compact representation for manipulating sets of strings, proposed by (Loekito, et al., Knowl. Inf. Syst., 24(2), 235-268, 2009). An SDD resembles to an acyclic DFA in binary form with different reduct ..."
Abstract
- Add to MetaCart
Abstract. We consider sequence binary decision diagrams (sequence BDD or SDD, for short), which are compact representation for manipulating sets of strings, proposed by (Loekito, et al., Knowl. Inf. Syst., 24(2), 235-268, 2009). An SDD resembles to an acyclic DFA in binary form with different reduction rules from one for DFAs. In this paper, we study the power of SDDs for storing and manipulating sets of strings on shared and reduced SDDs. Particularly, we first give the characterization of minimal SDDs as reduced SDDs. Then, we present simple and efficient algorithms for various problems related to reduced and shared SDDs: on-the-fly and off-line minimization, dynamic string set construction, and factor SDD construction. Finally, we run experiments on real data sets that show the efficiency and usefulness of SDDs in large-scale string processing. 1

