Results 1 - 10
of
32
Modeling local coherence: An entity-based approach
- In Proceedings of ACL 2005
, 2005
"... This paper considers the problem of automatic assessment of local coherence. We present a novel entity-based representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the ..."
Abstract
-
Cited by 70 (5 self)
- Add to MetaCart
This paper considers the problem of automatic assessment of local coherence. We present a novel entity-based representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the proposed discourse representation supports the effective learning of a ranking function. Our experiments demonstrate that the induced model achieves significantly higher accuracy than a state-of-the-art coherence model. 1
Retrieval of authentic documents for reader-specific lexical practice
- In Proceedings of InSTIL/ICALL Symposium
, 2004
"... When a teacher gives a reading assignment in today s language learning classrooms, all of the students are almost always reading the same text. Although students have different reading levels, it is impractical for a single teacher to seek out unique texts matched to each student s abilities. In thi ..."
Abstract
-
Cited by 18 (11 self)
- Add to MetaCart
When a teacher gives a reading assignment in today s language learning classrooms, all of the students are almost always reading the same text. Although students have different reading levels, it is impractical for a single teacher to seek out unique texts matched to each student s abilities. In this paper, we describe REAP, a system designed to assign each student individualized readings by combining detailed student and curriculum modelling with the large amount of authentic materials on the Web. REAP is designed to be used as an additional resource in teacher-led classes, as well as to be used by reading comprehension researchers for testing hypotheses on how to improve reading skills for L1 as well as L2 learners. Vocabulary acquisition is the primary factor we use in matching texts to a student s abilities. The system can also prioritise different criteria during the search. For instance, the system can retrieve documents based solely on the vocabulary terms needed to progress toward the next level, thereby focusing on curriculum. REAP can take into account other goals, such as student interests, special topics decided by the teacher, or an upcoming test, all represented as word histograms. This allows teachers to decide what they want the students to focus on each day. We also describe the contributions of this project, including an open-corpus, authentic-materials approach to reading practice and word-level modelling of norms and student skills. Finally, we describe how learning researchers can use this tool to get fine-grained control over the selection of reading materials, so that they can more easily test a variety of new learning hypotheses.
Reading Level Assessment Using Support Vector Machines and Statistical Language Models
- Proceedings of the Annual Meeting of the Association for Computational Linguistics
, 2005
"... Reading proficiency is a fundamental component of language competency. However, finding topical texts at an appropriate reading level for foreign and second language learners is a challenge for teachers. This task can be addressed with natural language processing technology to assess reading level. ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Reading proficiency is a fundamental component of language competency. However, finding topical texts at an appropriate reading level for foreign and second language learners is a challenge for teachers. This task can be addressed with natural language processing technology to assess reading level. Existing measures of reading level are not well suited to this task, but previous work and our own pilot experiments have shown the benefit of using statistical language models. In this paper, we also use support vector machines to combine features from traditional reading level measures, statistical language models, and other language processing tools to produce a better method of assessing reading level. 1
A machine learning approach to reading level assessment
, 2006
"... Reading proficiency is a fundamental component of language competency. However, finding topical texts at an appropriate reading level for foreign and second language learners is a challenge for teachers. Existing measures of reading level are not well suited to this task, where students may know som ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Reading proficiency is a fundamental component of language competency. However, finding topical texts at an appropriate reading level for foreign and second language learners is a challenge for teachers. Existing measures of reading level are not well suited to this task, where students may know some difficult topic-related vocabulary items but not have the same level of sophistication in understanding complex sentence constructions. Recent work in this area has shown the benefit of using statistical language processing techniques. In this paper, we use support vector machines to combine features from statistical language models, traditional reading level measures, and other language processing tools to produce a better method of assessing reading level. We also discuss the performance of human annotators on this task. 1
The Principles of Readability
- Costa Mesa, CA: Impact Information
, 2004
"... The principles of readability are in every style manual. Readability formulas are in every word processor. What is missing is the research and theory on which they stand. ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The principles of readability are in every style manual. Readability formulas are in every word processor. What is missing is the research and theory on which they stand.
An Analysis of Statistical Models and Features for Reading Difficulty Prediction
"... A reading difficulty measure can be described as a function or model that maps a text to a numerical value corresponding to a difficulty or grade level. We describe a measure of readability that uses a combination of lexical features and grammatical features that are derived from subtrees of syntact ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
A reading difficulty measure can be described as a function or model that maps a text to a numerical value corresponding to a difficulty or grade level. We describe a measure of readability that uses a combination of lexical features and grammatical features that are derived from subtrees of syntactic parses. We also tested statistical models for nominal, ordinal, and interval scales of measurement. The results indicate that a model for ordinal regression, such as the proportional odds model, using a combination of grammatical and lexical features is most effective at predicting reading difficulty. 1
A text corpora-based estimation of the familiarity of health terminology
- Proc ISBMDA 2005
"... tse @ nlm.nih.gov Abstract. In a pilot effort to improve health communication we created a method for measuring the familiarity of various medical terms. To obtain term familiarity data, we recruited 21 volunteers who agreed to take medical terminology quizzes containing 68 terms. We then created pr ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
tse @ nlm.nih.gov Abstract. In a pilot effort to improve health communication we created a method for measuring the familiarity of various medical terms. To obtain term familiarity data, we recruited 21 volunteers who agreed to take medical terminology quizzes containing 68 terms. We then created predictive models for familiarly based on term occurrence in text corpora and reader’s demographics. Although the sample size was small, our preliminary results indicate that predicting the familiarity of medical terms based on an analysis of the frequency in text corpora is feasible. Further, individualized familiarity assessment is feasible when demographic features are included as predictors. 1
Assessing Readability of Consumer Health Information: An Exploratory Study," presented at MEDINFO
- Medinfo
, 2004
"... Researchers and practitioners frequently use readability formulas to predict the suitability of health-related texts for consumers (e.g., patient instructions, informed consent documents). However, the appropriateness of using readability formulas ⎯ originally developed for students and educational ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Researchers and practitioners frequently use readability formulas to predict the suitability of health-related texts for consumers (e.g., patient instructions, informed consent documents). However, the appropriateness of using readability formulas ⎯ originally developed for students and educational texts ⎯ for lay audiences and health-related texts remains to be validated. In this exploratory study, we compared two methods of assessing the readability of consumer health materials: the Cloze procedure, using actual readers, and readability formulas, using our Readability Analyzer program. A statistically significant inverse correlation (r =-0.581, p = 0.01) was found, suggesting that the Readability Analyzer may provide a reasonable “first approximation ” for predicting readability of consumer health texts. We also identified several linguistic factors associated with increased reading ease as candidates for improving the performance of the Readability Analyzer. Our ultimate objective is to develop tools to support the design and evaluation of health information that is comprehensible and accessible to laypersons.
Learning to Predict Readability using Diverse Linguistic Features
- In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010
, 2010
"... In this paper we consider the problem of building a system to predict readability of natural-language documents. Our system is trained using diverse features based on syntax and language models which are generally indicative of readability. The experimental results on a dataset of documents from a m ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper we consider the problem of building a system to predict readability of natural-language documents. Our system is trained using diverse features based on syntax and language models which are generally indicative of readability. The experimental results on a dataset of documents from a mix of genres show that the predictions of the learned system are more accurate than the predictions of naive human judges when compared against the predictions of linguistically-trained expert human judges. The experiments also compare the performances of different learning algorithms and different types of feature sets when used for predicting readability. 1
Statistical Estimation of Word Acquisition with Application to Readability Prediction
"... Models of language learning play a central role in a wide range of applications: from psycholinguistic theories of how people acquire new word knowledge, to information systems that can automatically match content to users ’ reading ability. We present a novel statistical approach that can infer the ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Models of language learning play a central role in a wide range of applications: from psycholinguistic theories of how people acquire new word knowledge, to information systems that can automatically match content to users ’ reading ability. We present a novel statistical approach that can infer the distribution of a word’s likely acquisition age automatically from authentic texts collected from the Web. We then show that combining these acquisition age distributions for all words in a document provides an effective semantic component for predicting reading difficulty of new texts. We also compare our automatically inferred acquisition ages with norms from existing oral studies, revealing interesting historical trends as well as differences between oral and written word acquisition processes. 1

