Results 11 -
13 of
13
Statistical Models for Text Segmentation
- Machine Learning
, 1999
"... . This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based on a technique that incrementally builds an exponential model to extract features that are correlated with the presence of boundaries in labeled training text. The mod ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
. This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based on a technique that incrementally builds an exponential model to extract features that are correlated with the presence of boundaries in labeled training text. The models use two classes of features: topicality features that use adaptive language models in a novel way to detect broad changes of topic, and cue-word features that detect occurrences of specific words, whichmay be domain-specific, that tend to be used near segment boundaries. Assessment of our approachonquantitative and qualitative grounds demonstrates its effectiveness in twovery different domains, Wall Street Journal news articles and television broadcast news story transcripts. Quantitative results on these domains are presented using a new probabilistically motivated error metric, whichcombines precision and recall in a natural and flexible way. This metric is used to make a quantitative ...
papers/136 Maximum Entropy Methods for Biological Sequence Modeling
"... Many of the same modeling methods used in natural languages, speci cally Markov models and HMM's, have also been applied to biological sequence analysis. In recent years, natural language models have been improved upon by using maximum entropy methods which allow information based upon the entire hi ..."
Abstract
- Add to MetaCart
Many of the same modeling methods used in natural languages, speci cally Markov models and HMM's, have also been applied to biological sequence analysis. In recent years, natural language models have been improved upon by using maximum entropy methods which allow information based upon the entire history of a sequence to be considered. This is in contrast to the Markov models, whose predictions generally are based on some xed number of previous emissions, that have been the standard for most biological sequence models. To test the utility of Maximum Entropy modeling for biological sequence analysis, we used these methods to model amino acid sequences. Our results show that there is signi cant long-distance information in amino acid sequences and suggests that maximum entropy techniques may be bene cial for a range of biological sequence analysis problems.
Joint Learning for Named Entity Recognition and Capitalization Generation
"... This study attempts to find the usefulness of Joint Learning to the tasks of Named Entity Recognition (NER) and Capitalization Generation, and tries to shed more light on Joint Learning models for Natural Language Processing in general. The study goes further to look for feature sets that help or do ..."
Abstract
- Add to MetaCart
This study attempts to find the usefulness of Joint Learning to the tasks of Named Entity Recognition (NER) and Capitalization Generation, and tries to shed more light on Joint Learning models for Natural Language Processing in general. The study goes further to look for feature sets that help or do not help the Joint task. This is achieved by using Dynamic Conditional Random Fields (DCRFs) as models for experiments with the two tasks. The Joint model is compared with both simple systems for each task that do not use the other task, and with traditional pipeline systems that perform the two tasks sequentially. Various feature sets are explored and their results are compared with the use of Significance Tests. It was found that the results were inconclusive about the usefulness of Joint Learning to Named Entity Recognition. The improvements made in the results, were found to be not significant. It was found though that true Capitalization significantly helps the NER performance. Capitalization Generation task on the other hand, was found to not be helped at all by Named Entity information (even when learning jointly). The conclusion reached

