Results 1  10
of
28
Semantic Processing using the Hidden Vector State Model
 Computer Speech and Language
, 2005
"... This paper discusses semantic processing using the Hidden Vector State (HVS) model. The HVS model extends the basic discrete Markov model by encoding context in each state as a vector. State transitions are then factored into a stack shift operation similar to those of a pushdown automaton followed ..."
Abstract

Cited by 73 (26 self)
 Add to MetaCart
This paper discusses semantic processing using the Hidden Vector State (HVS) model. The HVS model extends the basic discrete Markov model by encoding context in each state as a vector. State transitions are then factored into a stack shift operation similar to those of a pushdown automaton followed by a push of a new preterminal semantic category label. The key feature of the model is that it can capture hierarchical structure without the use of treebank data for training. Experiments have been conducted in the travel domain using the relatively simple ATIS corpus and the more complex DARPA Communicator Task. The results show that the HVS model can be robustly trained from only minimally annotated corpus data. Furthermore, when measured by its ability to extract attributevalue pairs from natural language queries in the travel domain, the HVS model outperforms a conventional finitestate semantic tagger by 4.1 % in Fmeasure for ATIS and by 6.6 % in Fmeasure for Communicator, suggesting that the benefit of the HVS model’s ability to encode context increases as the task becomes more complex.
The Design and Analysis of Efficient Lossless Data Compression Systems
, 1993
"... Our thesis is that high compression efficiency for text and images can be obtained by using sophisticated statistical compression techniques, and that greatly increased speed can be achieved at only a small cost in compression efficiency. Our emphasis is on elegant design and mathematical as well as ..."
Abstract

Cited by 59 (0 self)
 Add to MetaCart
(Show Context)
Our thesis is that high compression efficiency for text and images can be obtained by using sophisticated statistical compression techniques, and that greatly increased speed can be achieved at only a small cost in compression efficiency. Our emphasis is on elegant design and mathematical as well as empirical analysis. We analyze arithmetic coding as it is commonly implemented and show rigorously that almost no compression is lost in the implementation. We show that highefficiency lossless compression of both text and grayscale images can be obtained by using appropriate models in conjunction with arithmetic coding. We introduce a fourcomponent paradigm for lossless image compression and present two methods that give state of the art compression efficiency. In the text compression area, we give a small improvement on the preferred method in the literature. We show that we can often obtain significantly improved throughput at the cost of slightly reduced compression. The extra speed c...
Analysis of Arithmetic Coding for Data Compression
 INFORMATION PROCESSING AND MANAGEMENT
, 1992
"... Arithmetic coding, in conjunction with a suitable probabilistic model, can provide nearly optimal data compression. In this article we analyze the effect that the model and the particular implementation of arithmetic coding have on the code length obtained. Periodic scaling is often used in arithmet ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
Arithmetic coding, in conjunction with a suitable probabilistic model, can provide nearly optimal data compression. In this article we analyze the effect that the model and the particular implementation of arithmetic coding have on the code length obtained. Periodic scaling is often used in arithmetic coding implementations to reduce time and storage requirements; it also introduces a recency effect which can further affect compression. Our main contribution is introducing the concept of weighted entropy and using it to characterize in an elegant way the effect that periodic scaling has on the code length. We explain why and by how much scaling increases the code length for files with a homogeneous distribution of symbols, and we characterize the reduction in code length due to scaling for files exhibiting locality of reference. We also give a rigorous proof that the coding effects of rounding scaled weights, using integer arithmetic, and encoding endoffile are negligible.
Practical Implementations of Arithmetic Coding
 IN IMAGE AND TEXT
, 1992
"... We provide a tutorial on arithmetic coding, showing how it provides nearly optimal data compression and how it can be matched with almost any probabilistic model. We indicate the main disadvantage of arithmetic coding, its slowness, and give the basis of a fast, spaceefficient, approximate arithmet ..."
Abstract

Cited by 41 (6 self)
 Add to MetaCart
We provide a tutorial on arithmetic coding, showing how it provides nearly optimal data compression and how it can be matched with almost any probabilistic model. We indicate the main disadvantage of arithmetic coding, its slowness, and give the basis of a fast, spaceefficient, approximate arithmetic coder with only minimal loss of compression efficiency. Our coder is based on the replacement of arithmetic by table lookups coupled with a new deterministic probability estimation scheme.
Probabilistic FiniteState Machines  Part I
"... Probabilistic finitestate machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translatio ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
Probabilistic finitestate machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translation are some of them. In part I of this paper we survey these generative objects and study their definitions and properties. In part II, we will study the relation of probabilistic finitestate automata with other well known devices that generate strings as hidden Markov models and ngrams, and provide theorems, algorithms and properties that represent a current state of the art of these objects.
Design and Analysis of Fast Text Compression Based on QuasiArithmetic Coding
 IN PROC. DATA COMPRESSION CONFERENCE
, 1994
"... We give a detailed algorithm for fast text compression. Our algorithm, related to the PPM method, simplifies the modeling phase by eliminating the escape mechanism and speeds up coding by using a combination of quasiarithmetic coding and Rice coding. We provide details of the use of quasiarithmeti ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
We give a detailed algorithm for fast text compression. Our algorithm, related to the PPM method, simplifies the modeling phase by eliminating the escape mechanism and speeds up coding by using a combination of quasiarithmetic coding and Rice coding. We provide details of the use of quasiarithmetic code tables, and analyze their compression performance. Our Fast PPM method is shown experimentally to be almost twice as fast as the PPMC method, while giving comparable compression.
Text Compression for Dynamic Document Databases
 IEEE Transactions on Knowledge and Data Engineering
, 1994
"... For compression of text databases, semistatic wordbased methods provide good performance in terms of both speed and disk space, but two problems arise. First, the memory requirements for the compression model during decoding can be unacceptably high. Second, the need to handle document insertions ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
(Show Context)
For compression of text databases, semistatic wordbased methods provide good performance in terms of both speed and disk space, but two problems arise. First, the memory requirements for the compression model during decoding can be unacceptably high. Second, the need to handle document insertions means that the collection must be periodically recompressed, if compression e#ciency is to be maintained on dynamic collections. Here we show that with careful management the impact of both of these drawbacks can be kept small. Experiments with a wordbased model and 500 Mb of text show that excellent compression rates can be retained even in the presence of severe memory limitations on the decoder, and after significant expansion in the amount of stored text.
Probabilistic FiniteState Machines  Part II
"... Probabilistic finitestate machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked. In part I of this paper, we surveyed these objects and studied their properties. In this part II, we study the relations between probabilistic finit ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Probabilistic finitestate machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked. In part I of this paper, we surveyed these objects and studied their properties. In this part II, we study the relations between probabilistic finitestate automata and other well known devices that generate strings like hidden Markov models and n grams, and provide theorems, algorithms and properties that represent a current state of the art of these objects.