Results 1 - 10
of
11
Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms
, 1997
"... this paper, we examine the idea of lexical chains as such a representation. We show how they can be constructed by means of WordNet, and how they can be applied in one particular linguistic task: the detection and correction of malapropisms. ..."
Abstract
-
Cited by 197 (10 self)
- Add to MetaCart
this paper, we examine the idea of lexical chains as such a representation. We show how they can be constructed by means of WordNet, and how they can be applied in one particular linguistic task: the detection and correction of malapropisms.
Correcting Real-Word Spelling Errors by Restoring Lexical Cohesion
, 2001
"... Spelling errors that happen to result in a real word in the lexicon cannot be detected by a conventional spelling checker. We present a method for detecting and correcting many such errors by identifying tokens that are semantically unrelated to their context and are spelling variations of words tha ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
Spelling errors that happen to result in a real word in the lexicon cannot be detected by a conventional spelling checker. We present a method for detecting and correcting many such errors by identifying tokens that are semantically unrelated to their context and are spelling variations of words that would be related to the context. Relatedness to context is determined by a measure of semantic distance initially proposed by Jiang and Conrath (1997). We tested the method on an artificial corpus of errors; it achieved recall of up to 50% and precision of 18 to 25% -- levels that approach practical usability.
A Statistical Model for Domain-Independent Text Segmentation
- In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics
, 2001
"... We propose a statistical method that finds the maximum-probability segmentation of a given text. This method does not require training data because it estimates probabilities from the given text. Therefore, it can be applied to any text in any domain. An experiment showed that the method is m ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
We propose a statistical method that finds the maximum-probability segmentation of a given text. This method does not require training data because it estimates probabilities from the given text. Therefore, it can be applied to any text in any domain. An experiment showed that the method is more accurate than or at least as accurate as a state-of-the-art text segmentation system.
Detecting and Correcting Malapropisms with Lexical Chains
, 1995
"... Because chains of semantically related words express semantic continuity, such lexical chains can play an important role in the detection of malapropisms. A malapropism is a correctly spelled word that does not fit in the context where it is used because it is the result of a spelling error on a dif ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Because chains of semantically related words express semantic continuity, such lexical chains can play an important role in the detection of malapropisms. A malapropism is a correctly spelled word that does not fit in the context where it is used because it is the result of a spelling error on a different word that was intended. I first assume that such a word has much less probability of being inserted in any chain with other words. If this assumption is correct, words that failed to be inserted with other words can be considered as potential malapropisms. A mechanism that generates spelling replacements can then be used to generate replacement candidates. The second assumption is that whenever a spelling replacement can be inserted in a chain with other words, this replacement is likely to be the intended word for which a malapropism has been substituted. The algorithm proposed here to detect lexical chains uses the on-line thesaurus WordNet to automatically quantify semantic relatio...
First Story Detection using a Composite Document Representation
- Proc. HLT01
, 2001
"... In this paper, we explore the effects of data fusion on First Story Detection [1] in a broadcast news domain. The data fusion element of this experiment involves the combination of evidence derived from two distinct representations of document content in a single cluster run. Our composite document ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
In this paper, we explore the effects of data fusion on First Story Detection [1] in a broadcast news domain. The data fusion element of this experiment involves the combination of evidence derived from two distinct representations of document content in a single cluster run. Our composite document representation consists of a concept representation (based on the lexical chains derived from a text) and free text representation (using traditional keyword index terms). Using the TDT1 evaluation methodology we evaluate a number of document representation strategies and propose reasons why our data fusion experiment shows performance improvements in the TDT domain.
Segmenting Broadcast News Streams using Lexical Chains
- In Proceedings of 1st Starting AI Researchers Symposium (STAIRS 2002
, 2002
"... In this paper we propose a course-grained NLP approach to text segmentation based on the analysis of lexical cohesion within text. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the dis ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
In this paper we propose a course-grained NLP approach to text segmentation based on the analysis of lexical cohesion within text. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e. distinct news stories from broadcast news programmes. Our system SeLeCT first builds a set of lexical chains, in order to model the discourse structure of the text. A boundary detector is then used to search for breaking points in this structure indicated by patterns of cohesive strength and weakness within the text. We evaluate this technique on a test set of concatenated CNN news story transcripts and compare it with an established statistical approach to segmentation called TextTiling.
Not as Easy as It Seems: Automating the Construction of Lexical Chains Using Roget's Thesaurus
- In the Proceedings of the Canadian Conference on Artificial Intelligence
, 2003
"... Abstract. Morris and Hirst [10] present a method of linking significant words that are about the same topic. The resulting lexical chains are a means of identifying cohesive regions in a text, with applications in many natural language processing tasks, including text summarization. The first lexica ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. Morris and Hirst [10] present a method of linking significant words that are about the same topic. The resulting lexical chains are a means of identifying cohesive regions in a text, with applications in many natural language processing tasks, including text summarization. The first lexical chains were constructed manually using Roget’s International Thesaurus. Morris and Hirst wrote that automation would be straightforward given an electronic thesaurus. All applications so far have used WordNet to produce lexical chains, perhaps because adequate electronic versions of Roget’s were not available until recently. We discuss the building of lexical chains using an electronic version of Roget’s Thesaurus. We implement a variant of the original algorithm, and explain the necessary design decisions. We include a comparison with other implementations. 1
Spoken and written news story segmentation using lexical chaining
- In the Proceedings of the Student Workshop at HLT-NAACL, Companion Volume
, 2003
"... In this paper we describe a novel approach to lexical chain based segmentation of broadcast news stories. Our segmentation system SeLeCT is evaluated with respect to two other lexical cohesion based segmenters TextTiling and C99. Using the Pk and WindowDiff evaluation metrics we show that SeLeCT out ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In this paper we describe a novel approach to lexical chain based segmentation of broadcast news stories. Our segmentation system SeLeCT is evaluated with respect to two other lexical cohesion based segmenters TextTiling and C99. Using the Pk and WindowDiff evaluation metrics we show that SeLeCT outperforms both systems on spoken news transcripts (CNN) while the C99 algorithm performs best on the written newswire collection (Reuters). We also examine the differences between spoken and written news styles and how these differences can affect
Lexical Similarity based on Quantity of Information Exchanged - Synonym Extraction
, 2004
"... There are a lot of approaches for measuring semantic similarities between words. This paper proposes a new method based on the analysis of a monolingual dictionary. We can view the word definitions of a dictionary as a network: its nodes are the headwords found in the dictionary and its edges repres ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
There are a lot of approaches for measuring semantic similarities between words. This paper proposes a new method based on the analysis of a monolingual dictionary. We can view the word definitions of a dictionary as a network: its nodes are the headwords found in the dictionary and its edges represent the relations between a headword and the words present in its definition. In this view, the meaning of a word is defined by the total quantity of information, in which each element of its definition contributes. The similarity between two words is defined by the maximal quantity of information exchanged between them through the network.
BroadcastN ews Gisting using Lexical Cohesion Analysis.
- the Proceedings of the 26 th BCS-IRSG European Conference on Information Retrieval (ECIR-04
, 2004
"... In his paper we describe an ex rac ive me hod of crea ing very shor summaries or gis s ha cap ure he essence of a ne ws s ory using a linguis ic echnique called lexical chaining. The recen in er es in robus gis ing and i le genera ion echniques origina es from a need o imp rove he indexing and b ..."
Abstract
- Add to MetaCart
In his paper we describe an ex rac ive me hod of crea ing very shor summaries or gis s ha cap ure he essence of a ne ws s ory using a linguis ic echnique called lexical chaining. The recen in er es in robus gis ing and i le genera ion echniques origina es from a need o imp rove he indexing and browsing capabili ies of in erac ive digi al mul im edia sys ems. More specifically hese sys ems deal wi h s reams of con inuous da a, lik e a news programme, ha require fur her anno a ion before hey can be presen ed o he user in a meaningful way. We au oma ically evalua e he performance of our lexical chaining-based gis er wi h respec o four baseline ex rac ive gis ing me hods on a collec ion of closed cap ion ma erial ak en fr om a series of news broadcas s. We also repor resul s of a human-based evalua ion of summary quali y. Our resul s show ha our novel lexical chaining approach o his problem ou - performs s andard ex rac ive gis ing me hods. 1

