Results 1 -
3 of
3
The role of domain information in word sense disambiguation
- Natural Language Engineering
, 2002
"... This paper explores the role of domain information in word sense disambiguation. The underlying hypothesis is that domain labels, such as Medicine, Architecture and Sport, provide a useful way to establish semantic relations among word senses, which can be profitably used during the disambiguation p ..."
Abstract
-
Cited by 20 (11 self)
- Add to MetaCart
This paper explores the role of domain information in word sense disambiguation. The underlying hypothesis is that domain labels, such as Medicine, Architecture and Sport, provide a useful way to establish semantic relations among word senses, which can be profitably used during the disambiguation process. Results obtained at the Senseval-2 initiative confirm that for a significant subset of words domain information can be used to disambiguate with a very high level of precision. 1
Text Classification and Segmentation Using Minimum Cross-Entropy
, 2000
"... Several methods for classifying and segmenting text are described. These are based on ranking text sequences by their cross-entropy calculated using a fixed order character-based Markov model adapted from the PPM text compression algorithm. Experimental results show that the methods are a signi cant ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Several methods for classifying and segmenting text are described. These are based on ranking text sequences by their cross-entropy calculated using a fixed order character-based Markov model adapted from the PPM text compression algorithm. Experimental results show that the methods are a signi cant improvement over previously used methods in a number of areas. For example, text can be classified with a very high degree of accuracy by authorship, language, dialect and genre. Highly accurate text segmentation is also possible -- the accuracy of the PPM-based Chinese word segmenter is close to 99% on Chinese news text; similarly, a PPM-based method of segmenting text by language achieves an accuracy of over 99%.
Evaluating a Hidden Markov Model Of Syntax In A Text Recognition System
"... Recognition of text by whole word shapes generates a set of candidate words for each printed word. A Hidden Markov Model (HMM) of syntax may be used to find the most probable sequence of syntactic tags for a sentence given the sequence of candidate sets. Candidate sets are then reduced by removing a ..."
Abstract
- Add to MetaCart
Recognition of text by whole word shapes generates a set of candidate words for each printed word. A Hidden Markov Model (HMM) of syntax may be used to find the most probable sequence of syntactic tags for a sentence given the sequence of candidate sets. Candidate sets are then reduced by removing all words which are not associated with the chosen tag. We show that the tagging performance of the HMM does not deteriorate despite an increasing proportion of mis-classified words. We also show that using the model significantly reduces the number of candidates. 1

