Results 1 - 10
of
14
Implementing an Efficient Part-of-Speech Tagger
- Software–Practice and Experience
, 1999
"... An efficient implementation of a part-of-speech tagger for Swedish is described. The stochastic tagger uses a well-established Markov model of the language. The tagger tags 92% of unknown words correctly and up to 97% of all words. Several implementation and optimization considerations are discussed ..."
Abstract
-
Cited by 25 (5 self)
- Add to MetaCart
An efficient implementation of a part-of-speech tagger for Swedish is described. The stochastic tagger uses a well-established Markov model of the language. The tagger tags 92% of unknown words correctly and up to 97% of all words. Several implementation and optimization considerations are discussed. The main contribution of this paper is the thorough description of the tagging algorithm and the addition of a number of improvements. The paper contains enough detail for the reader to construct a tagger for his own language. Keywords: part-of-speech tagging, word tagging, optimization, hidden Markov models. Introduction In part-of-speech (POS) tagging of a text, each word and punctuation mark in the text is assigned its morphosyntactic tag. Different tagging systems use different sets of tags, but typically a tag describes a word class and some word class specific features, such as number and gender. The number of different tags varies between a dozen and several hundred. Constructing ...
Implementation Aspects and Applications of a Spelling Correction Algorithm
, 1998
"... A method for detecting and correcting spelling errors in Swedish text was presented by Domeij, Hollman and Kann (1994). The objectives were to perform very fast detection and correction of errors and to use a full size word list. Our implementation of the method as a C program is called Stava. We ha ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
A method for detecting and correcting spelling errors in Swedish text was presented by Domeij, Hollman and Kann (1994). The objectives were to perform very fast detection and correction of errors and to use a full size word list. Our implementation of the method as a C program is called Stava. We have further reøned this method and implemented ranking of corrections using word frequencies and editing distance. We also describe how the method can be used in several applications, for example when extending a partof -speech lexicon, tagging unknown words, stemming and correcting search questions in information retrieval. Keywords: spelling error detection, spelling error correction, Bloom ølter. 1 Introduction How to automatically detect and correct spelling errors is an old problem. Nowadays, most word processors include some sort of spelling error detection. The traditional way of detecting spelling errors is to use a word list, usually also containing some grammatical information, an...
Evaluating a Spelling Support in a Search Engine
- in Natural Language Processing and Information Systems, 6th International Conference on Applications of Natural Language to Information Systems, NLDB 2002 (Eds
, 2002
"... The information in a database is usually accessed using SQL or some other query language, but if one uses a free text retrieval system the retrieval of text based information becomes much easier and user friendly, since one can use natural languages techniques such as automatic spell checking and ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
The information in a database is usually accessed using SQL or some other query language, but if one uses a free text retrieval system the retrieval of text based information becomes much easier and user friendly, since one can use natural languages techniques such as automatic spell checking and stemming. The free text retrieval system needs first to index the database but then it is just to search the database.
Finding the correct interpretation of Swedish compounds, a statistical approach
- In Proc. 4th Int. Conf. Language Resources and Evaluation (LREC
, 2004
"... This paper treats compound splitting for Swedish, where compounding is productive and very common. A method for splitting compounds and several methods for choosing the correct interpretation of ambiguous compounds are presented. 99 % of all compounds are split, 97 % of these are correctly interpret ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper treats compound splitting for Swedish, where compounding is productive and very common. A method for splitting compounds and several methods for choosing the correct interpretation of ambiguous compounds are presented. 99 % of all compounds are split, 97 % of these are correctly interpreted. 1.
Faking errors to avoid making errors: Very weakly supervised learning for error detection in writing
- In preparation
, 2005
"... This paper describes a method to create a grammar checker “for free”. It requires no manual work, only unannotated text and a few basic NLP tools. The method used is to simply annotate a lot of errors in written text and train an off-the-shelf machine learning implementation to recognize such errors ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper describes a method to create a grammar checker “for free”. It requires no manual work, only unannotated text and a few basic NLP tools. The method used is to simply annotate a lot of errors in written text and train an off-the-shelf machine learning implementation to recognize such errors. To avoid manual annotation artificially created errors are used for training. Recall is comparable to other grammar checkers but precision is lower. Our method also complements traditional grammar checkers, i.e. they do not always find the same errors. The evaluation is performed on real errors. 1
Grammar checking for Swedish second language learners
- In CALL for the Nordic Languages, Copenhagen Studies in Language, pages 33– 47. Copenhagen Business School, Samfundslitteratur. Harald Clahsen, Jürgen
, 2005
"... Grammar errors and context-sensitive spelling errors in texts written by second language learners are hard to detect automatically. We have used three different approaches for grammar checking: manually constructed error detection rules, statistical differences between correct and incorrect texts, a ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Grammar errors and context-sensitive spelling errors in texts written by second language learners are hard to detect automatically. We have used three different approaches for grammar checking: manually constructed error detection rules, statistical differences between correct and incorrect texts, and machine learning of specific error types. The three approaches have been evaluated using a corpus of second language learner Swedish. We found that the three methods detect different errors and therefore complement each other. Svensk sammanfattning Grammatikfel och kontextberoende stavfel (felstavningar som bildar riktiga ord) i texter skrivna av andraspråksinlärare är svårt att detektera automatiskt. Vi har använt tre olika angreppssätt för granskningen: manuellt konstruerade feldetekteringsregler, statistiska skillnader mellan korrekt och felaktig text, samt maskininlärning av specifika feltyper. De tre metoderna har vi utvärderat på en korpus bestående av svenska uppsatser av andraspråksinlärare. Vi fann att metoderna upptäcker olika fel och därför kompletterar varandra väl. 1
Efficient generation and ranking of spelling error corrections.
- NADA REPORT TRITA-NA-E9621, 1996
, 1996
"... An efficient method for generating and ranking spelling error corrections is described. This method can be used with dictionaries with only one operation---check if a given word is in the dictionary or not. The method is intended for Swedish, but can easily be modified for other languages. Given a m ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
An efficient method for generating and ranking spelling error corrections is described. This method can be used with dictionaries with only one operation---check if a given word is in the dictionary or not. The method is intended for Swedish, but can easily be modified for other languages. Given a misspelled word, i.e., a word not in the dictionary, the corrections are generated by applying editing operations on the word. An efficient algorithm to generate corrections for compound words is also described. The corrections are the ranked using a combination of edit distances and word frequencies.
Improving Precision and Recall Using a Spellchecker in a Search Engine
- Stockholm University
, 2004
"... Search engines constitute a key to finding specific information on the fast growing World Wide Web. Users query a search engine by using natural language to extract documents that refer to the desired subject. Sometimes no information is found because they make spelling and typing mistakes while ent ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Search engines constitute a key to finding specific information on the fast growing World Wide Web. Users query a search engine by using natural language to extract documents that refer to the desired subject. Sometimes no information is found because they make spelling and typing mistakes while entering their queries. Earlier reports suggest that 10-12 percent of all questions to a search engine are misspelled. The issue is how much does the use of a query spellchecker affect the performance of a search engine? This Master’s thesis presents an evaluation of how much a query spellchecker improves precision and recall in information retrieval for Swedish texts. Evaluation results indicate that spellchecking improved both precision and recall with 4 respectively 11.5 percent. Evaluering av ett stavningsstöd till en sökmotor Sammanfattning Sökmotorer är en nyckel till att kunna hitta specifik information i det snabbt växande Internet. Användaren brukar använda naturligt språk på en sökmotor för att kunna hitta den informationen han eller hon är in-tresserad av. Ibland misslyckas sökningen därför att användaren råkar stava eller skriva fel. Tidigare studier visar att 10-12 procent av alla frågor som ställs till en sökmotor är felstavade. Frågan är hur påverkar stavningsstödet resultaten av sökningen? Detta examensarbete utvärderar hur mycket en stavningskontroll kan förbättra precision och täckning vid informationssökning på svenska. Resultaten visar att stavningskontrollen förbättrade både precisionen och täckningen med 4 respektive 11.5 procent.
Detecting, Diagnosing and Correcting Low-Level Problems When Editing With and Without Computer Aids
- Wright State University
, 1997
"... Can a tool for computer-aided editing be used to make bureaucratic writing more clear and less formal? The question is explored in a research project of which the study presented here is a part. In the study, 16 university students were asked to revise a letter in bureaucratic style, first using pen ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Can a tool for computer-aided editing be used to make bureaucratic writing more clear and less formal? The question is explored in a research project of which the study presented here is a part. In the study, 16 university students were asked to revise a letter in bureaucratic style, first using pen and paper, then using computer aids. The letter was prepared to contain 26 problems in mechanics and style, all of which could be analysed by the computer tool. The design made it possible to compare the number of changes subjects made to planted problems in mechanics and style with and without computer support. In avarage, subjects changed 99 % of the mechanical problems when using the computer tool, compared to 52 % without it. In contrast, subjects changed only 63 % of the style problems when using the computer tool, compared to 27 % without it. Thus, the computer tool had a strong influence on the total amount of changes made to mechanical problems; for problems in style, the influence was considerably weaker. Interestingly, the influence of the tool on the number of changes made in style varied greatly between different subjects. While some writers changed many problems in style both with and without computer support, other writers made almost no changes in style, even though they were urged to attend to them by the computer tool. Among the writers who made few or no changes in style without aid, some were strongly influenced to change in style by the tool. It is suggested that these differences may be related to different revision strategies employed by the writers, some of which match the strategy embodied in the computer tool, some of which do not. Possible negative effects are discussed. Practical background The study presented here is part of a cooperation project...
A Swedish grammar checker
, 2002
"... This article describes the construction and performance of Granska – a surface-oriented system for grammar checking of Swedish text. With the use of carefully constructed error detection rules, the system can detect and suggest corrections for a number of grammatical errors in Swedish texts. In this ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This article describes the construction and performance of Granska – a surface-oriented system for grammar checking of Swedish text. With the use of carefully constructed error detection rules, the system can detect and suggest corrections for a number of grammatical errors in Swedish texts. In this article, we specifically focus on how erroneously split compounds and noun phrase disagreement are handled in the rules. The system combines probabilistic and rule-based methods to achieve high efficiency and robustness. This is a necessary prerequisite for a grammar checker that will be used in real time in direct interaction with users. We hope to show that the Granska system with higher efficiency can achieve the same or better results than systems that use rule-based parsing alone. Parts of this work were presented at Nodalida-99 (Domeij et al., 1999).

