## Implementing the Context Tree Weighting Method for Text Compression (2000)

Venue: | In Data Compression Conference |

Citations: | 14 - 2 self |

### BibTeX

@INPROCEEDINGS{Sadakane00implementingthe,

author = {Kunihiko Sadakane and Takumi Okazaki and Hiroshi Imai},

title = {Implementing the Context Tree Weighting Method for Text Compression},

booktitle = {In Data Compression Conference},

year = {2000},

pages = {123--132}

}

### Years of Citing Articles

### OpenURL

### Abstract

Context tree weighting method is a universal compression algorithm for FSMX sources. Though we expect that it will have good compression ratio in practice, it is difficult to implement it and in many cases the implementation is only for estimating compression ratio. Though Willems and Tjalkens showed practical implementation using not block probabilities but conditional probabilities, it is used for only binary alphabet sequences. We extend the method for multi-alphabet sequences and show a simple implementation using PPM techniques. We also propose a method to optimize a parameter of the context tree weighting for binary alphabet case. Experimental results on texts and DNA sequences show that the performance of PPM can be improved by combining the context tree weighting and that DNA sequences can be compressed in less than 2.0 bpc.

### Citations

664 |
Arithmetic coding for data compression
- Witten, Neal, et al.
- 1987
(Show Context)
Citation Context ...form arithmetic encoding. They used conditional probabilities to calculate block probabilities of symbols, while we directly encode conditional probabilities by using arithmetic code of Witten et al. =-=[18]-=-. 3.1 Calculation of conditional probabilities If a symbol x t has a context 0s, the probability of x t = c in a context s can be calculated as follows: P s w (x t = c) = P s w (x t 1 ) P s w (x t-1 1... |

330 | I.: Data compression using adaptive coding and partial string matching
- Cleary, Witten
- 1984
(Show Context)
Citation Context ...ires multi-precision floating-point arithmetic operations. This is a drawback in speed and memory requirements. Concerning compression ratio, one of the best compression algorithms in practice is PPM =-=[5]-=- and its variants. These algorithms compresses a sequence of symbols one by one by predicting symbol probabilities from past symbols. The prediction is made from preceding several symbols called conte... |

128 |
The performance of universal encoding
- Krichevsky, Trohov
- 1981
(Show Context)
Citation Context ...ated probability P s e (a s , b s ) for each context s which stands for the probability that a symbol 0 occurs a s times and 0 occurs b s times at the context s. The Krichevsky-Trofimov(KT)-estimator =-=[9]-=- is commonly used and it is defined by P s e (a s , b s ) = (a s - 1 2 ) 3 2 1 2 (b s - 1 2 ) 3 2 1 2 (a s + b s )! Next we calculate weighted probabilities P s w from values of P s e . This is done b... |

79 | The context tree weighting method: Basic properties
- Willems, Shtarkov, et al.
- 1995
(Show Context)
Citation Context ...how that the performance of PPM can be improved by combining the context tree weighting and that DNA sequences can be compressed in less than 2.0 bpc. 1 Introduction The context tree weighting method =-=[16]-=-, or CTW, is a lossless compression algorithm for FSMX sources. It has theoretically good compression ratio for binary alphabet sequences. It was extended for multi-alphabet sources [20], for easiness... |

68 | A new challenge for compression algorithms: Genetic sequences
- Grumbach, Tahi
- 1994
(Show Context)
Citation Context ...thm for texts. Compression ratio can be improved by assuming that the size of alphabet is four. However, the CTW outperforms this result. In the table, Bio2, CDNA, GTAC shows results of Biocompress-2 =-=[6]-=-, CDNA compress [10], and GTAC [8]. These are comperssion algorithm specialized for DNA sequences. Compression ratio is improved by using approximated matching or considering palindromes. Note that th... |

36 | The entropy of english using ppm-based models
- Teahan, Cleary
- 1996
(Show Context)
Citation Context ...s di#cult to improve performance. On the other hand, if we can apply the CTW for multi-alphabet sequences easily, we can improve the performance of the CTW by using some techniques developed for PPMs =-=[7, 12, 3, 2]-=-. Therefore it is important to show easy implementation of the CTW for multi-alphabet sequences to use the techniques. We propose an e#cient implementation of the CTW. Though it is based on Willems an... |

35 | Significantly lower entropy estimates for natural DNA sequences
- Loewenstern, Yianilos
- 1999
(Show Context)
Citation Context ...ression ratio can be improved by assuming that the size of alphabet is four. However, the CTW outperforms this result. In the table, Bio2, CDNA, GTAC shows results of Biocompress-2 [6], CDNA compress =-=[10]-=-, and GTAC [8]. These are comperssion algorithm specialized for DNA sequences. Compression ratio is improved by using approximated matching or considering palindromes. Note that the CDNA and the GTAC ... |

24 | Switching between two universal source coding algorithms
- Volf, Willems
- 1998
(Show Context)
Citation Context ...ement [17, 14, 19], for compressing text [13], and for improving PPMs [1]. Though the CTW will have good compression ratio in practice, its implementation is di#cult and there are few implementations =-=[13, 15]-=-. One reason is that the original CTW is for binary sequences and it cannot be directly applied to multi-alphabet sequences. The other reason is that it uses block probabilities of many subsequences w... |

16 |
Smeets, “Multialphabet coding with separate alphabet description
- ˚Aberg, Shtarkov, et al.
- 1997
(Show Context)
Citation Context ...s di#cult to improve performance. On the other hand, if we can apply the CTW for multi-alphabet sequences easily, we can improve the performance of the CTW by using some techniques developed for PPMs =-=[7, 12, 3, 2]-=-. Therefore it is important to show easy implementation of the CTW for multi-alphabet sequences to use the techniques. We propose an e#cient implementation of the CTW. Though it is based on Willems an... |

11 |
The Design and Analysis of Ecient Lossless Data Compression Systems
- Howard
- 1993
(Show Context)
Citation Context ...1 by using multi-precision operations. However, they calculate only width of a range encoded by arithmetic code and they did not implement the decoder. They showed that the CTW combined with the PPMD =-=[7]-=- has superior performance. Willems and Tjalkens [17] reduced space complexity of the CTW. They use not weighted block probabilities but conditional weighted probabilities. They store not P s e (x t 1 ... |

7 |
A context-tree weighting method for text generating sources
- Tjalkens, Volf, et al.
- 1997
(Show Context)
Citation Context ... FSMX sources. It has theoretically good compression ratio for binary alphabet sequences. It was extended for multi-alphabet sources [20], for easiness to implement [17, 14, 19], for compressing text =-=[13]-=-, and for improving PPMs [1]. Though the CTW will have good compression ratio in practice, its implementation is di#cult and there are few implementations [13, 15]. One reason is that the original CTW... |

6 |
Text compression by context tree weighting
- Aberg, Shtarkov
- 1997
(Show Context)
Citation Context ...ically good compression ratio for binary alphabet sequences. It was extended for multi-alphabet sources [20], for easiness to implement [17, 14, 19], for compressing text [13], and for improving PPMs =-=[1]-=-. Though the CTW will have good compression ratio in practice, its implementation is di#cult and there are few implementations [13, 15]. One reason is that the original CTW is for binary sequences and... |

4 | Complexity reduction of the context-tree weighting method
- Willems, Tjalkens
- 1997
(Show Context)
Citation Context ... lossless compression algorithm for FSMX sources. It has theoretically good compression ratio for binary alphabet sequences. It was extended for multi-alphabet sources [20], for easiness to implement =-=[17, 14, 19]-=-, for compressing text [13], and for improving PPMs [1]. Though the CTW will have good compression ratio in practice, its implementation is di#cult and there are few implementations [13, 15]. One reas... |

3 | Implementing the context-tree weighting method: Arithmetic coding
- Tjalkens, Willems
- 1997
(Show Context)
Citation Context ... lossless compression algorithm for FSMX sources. It has theoretically good compression ratio for binary alphabet sequences. It was extended for multi-alphabet sources [20], for easiness to implement =-=[17, 14, 19]-=-, for compressing text [13], and for improving PPMs [1]. Though the CTW will have good compression ratio in practice, its implementation is di#cult and there are few implementations [13, 15]. One reas... |

2 |
A Simple Implementation of Context Tree Weighting Method and its Verification
- Yokoo, Kawabata
- 1994
(Show Context)
Citation Context ... lossless compression algorithm for FSMX sources. It has theoretically good compression ratio for binary alphabet sequences. It was extended for multi-alphabet sources [20], for easiness to implement =-=[17, 14, 19]-=-, for compressing text [13], and for improving PPMs [1]. Though the CTW will have good compression ratio in practice, its implementation is di#cult and there are few implementations [13, 15]. One reas... |

1 |
Estimating DNA Sequence Entropy. preprint
- Lanctot, Li, et al.
(Show Context)
Citation Context ...an be improved by assuming that the size of alphabet is four. However, the CTW outperforms this result. In the table, Bio2, CDNA, GTAC shows results of Biocompress-2 [6], CDNA compress [10], and GTAC =-=[8]-=-. These are comperssion algorithm specialized for DNA sequences. Compression ratio is improved by using approximated matching or considering palindromes. Note that the CDNA and the GTAC only estimate ... |