Results 1 -
4 of
4
Automatically Detecting Corresponding Edit-Turn-Pairs in Wikipedia
"... In this study, we analyze links between edits in Wikipedia articles and turns from their discussion page. Our motivation is to better understand implicit details about the writing process and knowledge flow in collaboratively created resources. Based on properties of the involved edit and turn, we h ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
In this study, we analyze links between edits in Wikipedia articles and turns from their discussion page. Our motivation is to better understand implicit details about the writing process and knowledge flow in collaboratively created resources. Based on properties of the involved edit and turn, we have defined constraints for corre-sponding edit-turn-pairs. We manually an-notated a corpus of 636 corresponding and non-corresponding edit-turn-pairs. Fur-thermore, we show how our data can be used to automatically identify correspond-ing edit-turn-pairs. With the help of su-pervised machine learning, we achieve an accuracy of.87 for this task. 1
Measuring the contextual fitness of a term in its context...
"... ABSTRACT. While detecting simple language errors (e.g. misspellings, number agreement, etc.) is nowadays standard functionality in all but the simplest text-editors, other more complicated language errors might go unnoticed. A difficult case are errors that come in the disguise of a valid word that ..."
Abstract
- Add to MetaCart
ABSTRACT. While detecting simple language errors (e.g. misspellings, number agreement, etc.) is nowadays standard functionality in all but the simplest text-editors, other more complicated language errors might go unnoticed. A difficult case are errors that come in the disguise of a valid word that fits syntactically into the sentence. We use the Wikipedia revision history to extract a dataset with such errors in their context. We show that the new dataset provides a more realistic picture of the performance of contextual fitness measures. The achieved error detection quality is generally sufficient for competent language users who are willing to accept a certain level of false alarms, but might be problematic for non-native writers who accept all suggestions made by the systems. We make the full experimental framework publicly available which will allow other scientists to reproduce our experiments and to conduct follow-up experiments. RÉSUMÉ. Alors que la détection d’erreurs simples est aujourd’hui une fonctionnalité standard des traitements de texte un peu évolués, de nombreuses erreurs restent difficiles à repérér. C’est souvent le cas lorsque la forme correcte est remplacée par une autre forme valide et syntaxiquement plausible en contexte. Nous avons utilisé les révisions de Wikipédia pour extraire automatiquement une listes d’erreurs de ce type. Ces données permettent de se faire une meilleure idée
TITLE AND ABSTRACT IN GERMAN
"... www.ukp.tu-darmstadt.de In this paper, we present a study of the collaborative writing process in Wikipedia. Our work is based on a corpus of 1,995 edits obtained from 891 article revisions in the English Wikipedia. We propose a 21-category classification scheme for edits based on Faigley and Witte’ ..."
Abstract
- Add to MetaCart
www.ukp.tu-darmstadt.de In this paper, we present a study of the collaborative writing process in Wikipedia. Our work is based on a corpus of 1,995 edits obtained from 891 article revisions in the English Wikipedia. We propose a 21-category classification scheme for edits based on Faigley and Witte’s (1981) model. Example edit categories include spelling error corrections and vandalism. In a manual multi-label annotation study with 3 annotators, we obtain an inter-annotator agreement of α = 0.67. We further analyze the distribution of edit categories for distinct stages in the revision history of 10 featured and 10 non-featured articles. Our results show that the information content in featured articles tends to become more stable after their promotion. On the opposite, this is not true for non-featured articles. We make the resulting corpus and the annotation guidelines freely available. 1
Structural Alignment as the Basis to Improve Significant Change Detection in Versioned Sentences
"... Some revisions of documents can change the meaning of passages, while others merely re-phrase or improve style. In a multi-author workflow, assisting readers to assess whether a revision changes mean-ing or not can be useful in prioritising re-vision. One challenge in this is how to de-tect and repr ..."
Abstract
- Add to MetaCart
(Show Context)
Some revisions of documents can change the meaning of passages, while others merely re-phrase or improve style. In a multi-author workflow, assisting readers to assess whether a revision changes mean-ing or not can be useful in prioritising re-vision. One challenge in this is how to de-tect and represent the revision changes in a meaningful way to assist users in assess-ing the impact of revision changes. This paper explores a segmentation approach which utilises the syntactic context of re-visions to support assessment of signifi-cant changes. We observe that length of normalised edit distance or Word Error Rate (WER) correlates better to the signif-icance of the revision changes at sentence level compared to general sentence simi-larity approaches. We show that our pro-posed method, SAVeS, supports improved analysis of change significance through alignment of segments rather than words. SAVeS can be used as the basis for a com-putational approach to identify significant revision changes. 1