Results 1 -
4 of
4
Combining PPM models using a text mining approach
- In Storer and Cohn [128
, 2001
"... : This paper introduces a novel switching method which can be used to combine two or more PPM models. The work derives from our earlier work on modelling English and text mining, and the approach takes advantage of both to help improve the compression performance signicantly. The performance of ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
: This paper introduces a novel switching method which can be used to combine two or more PPM models. The work derives from our earlier work on modelling English and text mining, and the approach takes advantage of both to help improve the compression performance signicantly. The performance of the combination of models is at least as good as (and in many cases signicantly better than) the best performed of the individual models. 1 Introduction The PPM data compression scheme has consistently set the standard in lossless compression of text since it was originally described by Cleary & Witten back in 1984. Moat's (1990) implementation, PPMC, set the benchmark for over a decade, and currently, an implementation of the PPMD algorithm (Howard, 1993) has the distinction of being the best \all-round" compression scheme (ACT, 2000). Other variations on a very productive research theme include improved blending algorithms (Bunton, 1996), improved escape estimation for the nely tun...
Access Support Tree & TextArray: Data Structures for XML Document Storage
, 2001
"... The characteristics of XML documents require new ways of storing and querying such documents. Queries on both textual content and structural aspects must be supported efficiently. For this reason, we examined existing work on both document storage approaches and models for querying documents derivin ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The characteristics of XML documents require new ways of storing and querying such documents. Queries on both textual content and structural aspects must be supported efficiently. For this reason, we examined existing work on both document storage approaches and models for querying documents deriving requirements that are essential for the storage of XML documents. An important result of this study is the design of the Access Support Tree and TextArray (AST/TA) data structures. The basic idea of the AST/TA data structures is the separation of the (logical) structure of a document from its "visible" text content which is represented as a single contiguous string. We introduce the AST/TA data structures formally by its abstraction, namely the AST/TA model, and relate the model to the well-known XML Information Set. Moreover, we address specific issues that must not be ignored in the context of XML and that influences the design of the AST/TA structures strongly. We also compare requirements of the AST/TA approach with those found in current work. Finally, we describe those operations that take advantage of the design principles of the AST/TA data structures.
Text Augmentation: Inserting XML tags into natural language text with PPM Models and Viterbi-like search
, 2003
"... This thesis develops work on using Hidden Markov Models to insert tags natural language text. A taxonomy of tags is developed unifying the fields of text segmentation tagging, part-of-speech tagging, proper noun extraction and hierarchical entity extraction. The search spaces for inserting tags are ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This thesis develops work on using Hidden Markov Models to insert tags natural language text. A taxonomy of tags is developed unifying the fields of text segmentation tagging, part-of-speech tagging, proper noun extraction and hierarchical entity extraction. The search spaces for inserting tags are examined from both a theoretical and experimental point of view across the taxonomy and on four corpora. A analysis of different correctness measures for different types of tag insertion problem is undertaken and a technique to determine whether tag-insertion errors are the result of a modelling failure or a searching failure is discovered.
Text Mining Using HMM and PPM
, 2001
"... Text mining involves the use of statistical and machine learning techniques to learn structural elements of text in order to search for useful information in previously unseen text. The need for these techniques have emerged out of the rapidly growing information era. Token identification is an impo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Text mining involves the use of statistical and machine learning techniques to learn structural elements of text in order to search for useful information in previously unseen text. The need for these techniques have emerged out of the rapidly growing information era. Token identification is an important component of any text mining tool. The accomplishment of this task enhances the function of diverse applications involving searching for patterns in textual data. Several different identification methods have been reported in the literature. HMMs and PPM models have been successfully used in language processing tasks. They have also been applied separately to learning-based token identification. Most of the existing systems are domain- and language-dependent. In this thesis, we implement a system that bridges the two well known methods through words new to the identification model. The system is fully domain- and language-independent. No changes of code are necessary when applying to other domains or languages. The only thing required is an annotated corpus. The system has been tested on two corpora and achieved an overall F-measure of 76:59% for TCC, and 69:02% for BIB. This is not as good as would be expected from a system which includes language-dependent components. However, our system is more generalized. The identification of date has the best result, 73% and 92% of correct tokens are identified respectively. The system also performs reasonably well on people's name with correct tokens of 68% for TCC, and 76% for BIB. ii Acknowledgements During the time of my MPhil. study, I have been so lucky to have had a huge amount of help in academic, financial and personal from a number of people. First and foremost, I would like to thank my chief supervisor, Ian Witte...

