Results 1 - 10
of
18
Corpus-based and knowledge-based measures of text semantic similarity
- In IProceedings of the 21st national conference on Artificial intelligence - Volume 1
, 2006
"... This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focused mainly on either large documents (e.g. text classification, information retrieval) or individual words (e.g. synonymy ..."
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focused mainly on either large documents (e.g. text classification, information retrieval) or individual words (e.g. synonymy tests). Given that a large fraction of the information available today, on the Web and elsewhere, consists of short text snippets (e.g. abstracts of scientific documents, imagine captions, product descriptions), in this paper we focus on measuring the semantic similarity of short texts. Through experiments performed on a paraphrase data set, we show that the semantic similarity method outperforms methods based on simple lexical matching, resulting in up to 13 % error rate reduction with respect to the traditional vector-based similarity metric.
Reading Level Assessment Using Support Vector Machines and Statistical Language Models
- Proceedings of the Annual Meeting of the Association for Computational Linguistics
, 2005
"... Reading proficiency is a fundamental component of language competency. However, finding topical texts at an appropriate reading level for foreign and second language learners is a challenge for teachers. This task can be addressed with natural language processing technology to assess reading level. ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Reading proficiency is a fundamental component of language competency. However, finding topical texts at an appropriate reading level for foreign and second language learners is a challenge for teachers. This task can be addressed with natural language processing technology to assess reading level. Existing measures of reading level are not well suited to this task, but previous work and our own pilot experiments have shown the benefit of using statistical language models. In this paper, we also use support vector machines to combine features from traditional reading level measures, statistical language models, and other language processing tools to produce a better method of assessing reading level. 1
Dependency-based sentence alignment for multiple document summarization
- In Proceedings of Coling 2004
, 2004
"... In this paper, we describe a method of automatic sentence alignment for building extracts from abstracts in automatic summarization research. Our method is based on two steps. First, we introduce the “dependency tree path ” (DTP). Next, we calculate the similarity between DTPs based on the ESK (Exte ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In this paper, we describe a method of automatic sentence alignment for building extracts from abstracts in automatic summarization research. Our method is based on two steps. First, we introduce the “dependency tree path ” (DTP). Next, we calculate the similarity between DTPs based on the ESK (Extended String Subsequence Kernel), which considers sequential patterns. By using these procedures, we can derive one-to-many or many-to-one correspondences among sentences. Experiments using different similarity measures show that DTP consistently improves the alignment accuracy and that ESK gives the best performance. 1
Text simplification for language learners: a corpus analysis
- In Proc. of Workshop on Speech and Language Technology for Education
, 2007
"... Simplified texts are commonly used by teachers and students in bilingual education and other language-learning contexts. These texts are usually manually adapted, and teachers say this is a timeconsuming and sometimes challenging task. Our goal is the development of tools to aid teachers by automati ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Simplified texts are commonly used by teachers and students in bilingual education and other language-learning contexts. These texts are usually manually adapted, and teachers say this is a timeconsuming and sometimes challenging task. Our goal is the development of tools to aid teachers by automatically proposing ways to simplify texts. As a first step, this paper presents a detailed analysis of a corpus of news articles and abridged versions written by a literacy organization in order to learn what kinds of changes people make when simplifying texts for language learners. 1.
A Survey of Paraphrasing and Textual Entailment Methods
, 2010
"... Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads ( ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.
A Computational Model of Text Reuse in Ancient Literary Texts
"... We propose a computational model of text reuse tailored for ancient literary texts, available to us often only in small and noisy samples. The model takes into account source alternation patterns, so as to be able to align even sentences with low surface similarity. We demonstrate its ability to cha ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We propose a computational model of text reuse tailored for ancient literary texts, available to us often only in small and noisy samples. The model takes into account source alternation patterns, so as to be able to align even sentences with low surface similarity. We demonstrate its ability to characterize text reuse in the Greek New Testament. 1
Content Modeling Using Latent Permutations
"... We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selec ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods. 1 1.
New Functions for Unsupervised Asymmetrical Paraphrase Detection
, 2007
"... Monolingual text-to-text generation is an emerging research area ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Monolingual text-to-text generation is an emerging research area
Extracting Simplified Statements for Factual Question Generation
"... Abstract. We address the problem of automatically generating concise factual questions from linguistically complex sentences in reading materials. We discuss semantic and pragmatic issues that appear in complex sentences, and then we present an algorithm for extracting simplified sentences from appo ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. We address the problem of automatically generating concise factual questions from linguistically complex sentences in reading materials. We discuss semantic and pragmatic issues that appear in complex sentences, and then we present an algorithm for extracting simplified sentences from appositives, subordinate clauses, and other constructions. We conjecture that our method is useful as a preliminary step in a larger question generation process. Experimental results indicate that our method is more suitable for factual question generation applications than an alternative text compression algorithm. 1
Textual Entailment Recognition Using a Linguistically–Motivated Decision Tree Classifier
"... Abstract. In this paper we present a classifier for Recognising Textual Entailment (RTE) and Semantic Equivalence. We evaluate the performance of this classifier using an evaluation framework provided by the PASCAL RTE Challenge Workshop. Sentence–pairs are represented as a set of features, which ar ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. In this paper we present a classifier for Recognising Textual Entailment (RTE) and Semantic Equivalence. We evaluate the performance of this classifier using an evaluation framework provided by the PASCAL RTE Challenge Workshop. Sentence–pairs are represented as a set of features, which are used by our decision tree classifier to determine if an entailment relationship exisits between each sentence–pair in the RTE test corpus. 1

