Results 1 - 10
of
10
Probabilistic detection of context-sensitive spelling errors
- In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-04), volume Five
, 2004
"... This article focuses on the evaluation of a novel algorithm for the detection of context-sensitive spelling errors. We present a fully automatic evaluation procedure with no requirements of manual work or resources annotated with spelling errors. The evaluation method is applicable to any language a ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
This article focuses on the evaluation of a novel algorithm for the detection of context-sensitive spelling errors. We present a fully automatic evaluation procedure with no requirements of manual work or resources annotated with spelling errors. The evaluation method is applicable to any language and tag set, and is easily adaptable to other NLP systems such as taggers and parsers. 1.
2003b. Automatic evaluation of robustness and degradation in tagging and parsing
- In Proc. RANLP 2003
, 2003
"... We address the topic of automatic evaluation of robustness and performance degradation in parsing systems. We focus on one aspect of robustness, namely ill-formed sentences and the impact of spelling errors on the different components of a parsing system. We propose an automated framework to evaluat ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
We address the topic of automatic evaluation of robustness and performance degradation in parsing systems. We focus on one aspect of robustness, namely ill-formed sentences and the impact of spelling errors on the different components of a parsing system. We propose an automated framework to evaluate robustness, where ill-formed and noisy data is introduced using an automatic tool and fed to the parsing system. With increasing levels of noise, the performance of a system will inevitably degrade, and the question is to what extent?The experiments show a graceful degradation in performance for both state-of-the-art taggers used and a Swedish shallow parser. The automated nature of the evaluation allows easy and reproducible evaluation of the individual components of a parsing system. 1
M.: Unsupervised Evaluation of Parser Robustness
- Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing
, 2005
"... Abstract. This article describes an automatic evaluation procedure for NLP system robustness under the strain of noisy and ill-formed input. The procedure requires no manual work or annotated resources. It is language and annotation scheme independent and produces reliable estimates on the robustnes ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. This article describes an automatic evaluation procedure for NLP system robustness under the strain of noisy and ill-formed input. The procedure requires no manual work or annotated resources. It is language and annotation scheme independent and produces reliable estimates on the robustness of NLP systems. The only requirement is an estimate on the NLP system accuracy. The procedure was applied to five parsers and one part-of-speech tagger on Swedish text. To establish the reliability of the procedure, a comparative evaluation involving annotated resources was carried out on the tagger and three of the parsers. 1
Grammar checking for Swedish second language learners
- In CALL for the Nordic Languages, Copenhagen Studies in Language, pages 33– 47. Copenhagen Business School, Samfundslitteratur. Harald Clahsen, Jürgen
, 2005
"... Grammar errors and context-sensitive spelling errors in texts written by second language learners are hard to detect automatically. We have used three different approaches for grammar checking: manually constructed error detection rules, statistical differences between correct and incorrect texts, a ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Grammar errors and context-sensitive spelling errors in texts written by second language learners are hard to detect automatically. We have used three different approaches for grammar checking: manually constructed error detection rules, statistical differences between correct and incorrect texts, and machine learning of specific error types. The three approaches have been evaluated using a corpus of second language learner Swedish. We found that the three methods detect different errors and therefore complement each other. Svensk sammanfattning Grammatikfel och kontextberoende stavfel (felstavningar som bildar riktiga ord) i texter skrivna av andraspråksinlärare är svårt att detektera automatiskt. Vi har använt tre olika angreppssätt för granskningen: manuellt konstruerade feldetekteringsregler, statistiska skillnader mellan korrekt och felaktig text, samt maskininlärning av specifika feltyper. De tre metoderna har vi utvärderat på en korpus bestående av svenska uppsatser av andraspråksinlärare. Vi fann att metoderna upptäcker olika fel och därför kompletterar varandra väl. 1
A Study of Applying BTM Model on the Chinese Chunk Bracketing
- In LINC-2005, IJCNLP-2005
"... The purpose of this paper is to automatically generate Chinese chunk bracketing by a bottom-to-top mapping (BTM) model with a BTM dataset. The BTM model is designed as a supporting model with parsers. We define a word-layer matrix to generate the BTM dataset from Chinese Treebank. Our model matches ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The purpose of this paper is to automatically generate Chinese chunk bracketing by a bottom-to-top mapping (BTM) model with a BTM dataset. The BTM model is designed as a supporting model with parsers. We define a word-layer matrix to generate the BTM dataset from Chinese Treebank. Our model matches auto-learned patterns and templates against segmented and POS-tagged Chinese sentences. A sentence that can be matched with some patterns or templates is called a matching sentence. The experimental results have shown that the chunk bracketing of the BTM model on the matching sentences is high and stable. By applying the BTM model to the matching sentences and the Ngram model to the non-matching sentences, the experiment results show the F-measure of an N-gram model can be improved. 1
AutoEval and Missplel: Two Generic Tools for Automatic Evaluation
- Updated on March 9th, 2006. http://www.csc.kth.se/tcs/humanlang/tools.html. (Accessed April
, 2007
"... We describe two freeware programs for automatic evaluation. The first, AutoEval, greatly simplifies the data gathering, processing and counting often involved in an evaluation. To this end, AutoEval includes a simple and powerful script language to describe the evaluation task to be carried out. The ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We describe two freeware programs for automatic evaluation. The first, AutoEval, greatly simplifies the data gathering, processing and counting often involved in an evaluation. To this end, AutoEval includes a simple and powerful script language to describe the evaluation task to be carried out. The second program is called Missplel. It introduces human-like spelling and grammar errors into text. A typical application is evaluation of NLP system performance on noisy input, in order to establish the robustness of the system. An evaluation of the quality of the produced errors has also been conducted. 1
Global Evaluation of Random Indexing through Swedish Word Clustering Compared to the People’s Dictionary of Synonyms
"... Evaluation of word space models is usually local in the sense that it only considers words that are deemed very similar by the model. We propose a global evaluation scheme based on clustering of the words. A clustering of high quality in an external evaluation against a semantic resource, such as a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Evaluation of word space models is usually local in the sense that it only considers words that are deemed very similar by the model. We propose a global evaluation scheme based on clustering of the words. A clustering of high quality in an external evaluation against a semantic resource, such as a dictionary of synonyms, indicates a word space model of high quality. We use Random Indexing to create several different models and compare them by clustering evaluation against the People’s Dictionary of Synonyms, a list of Swedish synonyms that are graded by the public. Most notably we get better results for models based on syntagmatic information (words that appear together) than for models based on paradigmatic information (words that appear in similar contexts). This is quite contrary to previous results that have been presented for local evaluation. Clusterings to ten clusters result in a recall of 83 % for a syntagmatic model, compared to 34% for a comparable paradigmatic model, and 10% for a random partition.
Automatic Evaluation of Parser Robustness: Eliminating Manual Labor and Annotated Resources
"... This article describes an automatic evaluation procedure for NLP system robustness under the strain of noisy and ill-formed input. The procedure requires no manual work or annotated resources. It is language and annotation scheme independent and produces reliable estimates on the robustness and accu ..."
Abstract
- Add to MetaCart
This article describes an automatic evaluation procedure for NLP system robustness under the strain of noisy and ill-formed input. The procedure requires no manual work or annotated resources. It is language and annotation scheme independent and produces reliable estimates on the robustness and accuracy of NLP systems. The procedure was applied to five parsers and one part-of-speech tagger on Swedish text. To establish the reliability of the procedure, a comparative evaluation involving annotated resources was carried out on the tagger and three of the parsers. 1
Influence of Module Order on Rule-Based De-identification of Personal Names in Electronic Patient Records Written in Swedish
"... Electronic patient records (EPRs) are a valuable resource for research but for confidentiality reasons they cannot be used freely. In order to make EPRs available to a wider group of researchers, sensitive information such as personal names has to be removed. Deidentification is a process that makes ..."
Abstract
- Add to MetaCart
Electronic patient records (EPRs) are a valuable resource for research but for confidentiality reasons they cannot be used freely. In order to make EPRs available to a wider group of researchers, sensitive information such as personal names has to be removed. Deidentification is a process that makes this possible. Both rule-based as well as statistical and machine learning based methods exist to perform de-identification, but the second method requires annotated training material which exists only very sparsely for patient names. It is therefore necessary to use rule-based methods for de-identification of EPRs. Not much is known, however, about the order in which the various rules should be applied and how the different rules influence precision and recall. This paper aims to answer this research question by implementing and evaluating four common rules for de-identification of personal names in EPRs written in Swedish: (1) dictionary name matching, (2) title matching, (3) common words filtering and (4) learning from previous modules. The results show that to obtain the highest recall and precision, the rules should be applied in the following order: title matching, common words filtering and dictionary name matching. 1.
Exploiting Structured Data, Negation Detection and SNOMED CT Terms in a Random Indexing Approach to Clinical Coding
"... The problem of providing effective computer support for clinical coding has been the target of many research efforts. A recently introduced approach, based on statistical data on co-occurrences of words in clinical notes and assigned diagnosis codes, is here developed further and improved upon. The ..."
Abstract
- Add to MetaCart
The problem of providing effective computer support for clinical coding has been the target of many research efforts. A recently introduced approach, based on statistical data on co-occurrences of words in clinical notes and assigned diagnosis codes, is here developed further and improved upon. The ability of the word space model to detect and appropriately handle the function of negations is demonstrated to be important in accurately correlating words with diagnosis codes, although the data on which the model is trained needs to be sufficiently large. Moreover, weighting can be performed in various ways, for instance by giving additional weight to ‘clinically significant ’ words or by filtering code candidates based on structured patient records data. The results demonstrate the usefulness of both weighting techniques, particularly the latter, yielding 27 % exact matches for a general model (across clinic types); 43 % and 82% for two domain-specific models (ear-nosethroat and rheumatology clinics). 1

