Results 1 - 10
of
36
Finding high-quality content in social media with an application to community-based question answering
- In Proceedings of WSDM
, 2008
"... The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content in sites based on user contributions—social media sites— becomes increasingly important. Social media in general exhi ..."
Abstract
-
Cited by 54 (10 self)
- Add to MetaCart
The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content in sites based on user contributions—social media sites— becomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from members of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing – indexing methods, linguistic
A Statistical Model for Scientific Readability
- In Proc. of CIKM
, 2001
"... This paper presents a new method of using statistical models to estimate the reading difficulty of Web pages. Language Models are used to represent the content typically associated with different readability levels. Reading level classifiers are created as linear combinations of a language model and ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
This paper presents a new method of using statistical models to estimate the reading difficulty of Web pages. Language Models are used to represent the content typically associated with different readability levels. Reading level classifiers are created as linear combinations of a language model and surface linguistic features. Experiments show that this new method is more accurate than the widely used Flesch-Kincaid readability formula KEYWORDS Readability, Flesch-Kincaid, Unigram Language Model, EM. 1.
Health and literacy: a review of medical and public health literature
- in Annual Review of Adult Learning and
, 1999
"... Literacy has recently emerged as a key item on the research agenda in medicine and public health. Researchers and practitioners are grappling with evidence that the reading ability of the average adult falls well below the reading level of educational materials, directives, forms, and informed-conse ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Literacy has recently emerged as a key item on the research agenda in medicine and public health. Researchers and practitioners are grappling with evidence that the reading ability of the average adult falls well below the reading level of educational materials, directives, forms, and informed-consent documents commonly used in the health field. The threats to effective communication and efficacious care have spurred interest in exploring strategies for more effective communication. In addition, increased attention to literacy may be driven by legal concerns for adequate protection of human subjects and ethical concerns for patient autonomy in informed-consent procedures. Methodological strides made since 1992, particularly in the form of new tools for rapid literacy measurement, have enabled a number of researchers to explore links between the literacy level of patients and health outcomes that will have critical policy implications. These investigations can best be undertaken through collaborative efforts between educators who understand the learning process and health professionals who understand the protocols used in health care and public health education. Findings will serve to enrich policy and practice.
Biasing web search results for topic familiarity
- In UMass Amherst CIIR Tech. Report – IR-393
, 2005
"... Depending on a web searcher’s familiarity with a query’s target topic, it may be more appropriate to show her introductory or advanced documents. The TREC HARD [1] track defined topic familiarity as meta-data associated with a user’s query. We instead define a user-independent and queryindependent m ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Depending on a web searcher’s familiarity with a query’s target topic, it may be more appropriate to show her introductory or advanced documents. The TREC HARD [1] track defined topic familiarity as meta-data associated with a user’s query. We instead define a user-independent and queryindependent model of topic-familiarity required to read a document, so it can be matched to a given user in response to a query. An introductory web page is defined as A web page that doesn’t presuppose any background knowledge of the topic it is on, and to an extent introduces or defines the key terms in the topic. while an advanced web page is defined as A web page that assumes sufficient background knowledge of the topic it is on, and familiarity with the key technical / important terms in the topic, and potentially builds on them. We develop a method for biasing the initial mix of documents returned by a search engine to increase the number of documents of desired familiarity level up to position 5, and up to position 10. Our method involves building a supervised text classifier, incorporating features based on reading level, the distribution of stop-words in the text, and non-text features such as average line-length. Using this familiarity classifier, we achieve statistically significant improvements at reranking the result set to show introductory documents higher up the ranked list. Our classifier can be seamlessly integrated into current search engine technology without involving any major modifications to existing architectures.
Revisiting Readability: A Unified Framework for Predicting Text Quality
"... We combine lexical, syntactic, and discourse features to produce a highly predictive model of human readers ’ judgments of text readability. This is the first study to take into account such a variety of linguistic factors and the first to empirically demonstrate that discourse relations are strongl ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We combine lexical, syntactic, and discourse features to produce a highly predictive model of human readers ’ judgments of text readability. This is the first study to take into account such a variety of linguistic factors and the first to empirically demonstrate that discourse relations are strongly associated with the perceived quality of text. We show that various surface metrics generally expected to be related to readability are not very good predictors of readability judgments in our Wall Street Journal corpus. We also establish that readability predictors behave differently depending on the task: predicting text readability or ranking the readability. Our experiments indicate that discourse relations are the one class of features that exhibits robustness across these two tasks. 1
The Principles of Readability
- Costa Mesa, CA: Impact Information
, 2004
"... The principles of readability are in every style manual. Readability formulas are in every word processor. What is missing is the research and theory on which they stand. ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The principles of readability are in every style manual. Readability formulas are in every word processor. What is missing is the research and theory on which they stand.
Applying Natural Language Generation to Indicative Summarization
- IN PROC. OF THE EACL WORKSHOP ON NATURAL LANGUAGE GENERATION
, 2001
"... The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a generation perspective, by first analyzing its required content via published guidelines and corpus ana ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a generation perspective, by first analyzing its required content via published guidelines and corpus analysis.
An Analysis of Statistical Models and Features for Reading Difficulty Prediction
"... A reading difficulty measure can be described as a function or model that maps a text to a numerical value corresponding to a difficulty or grade level. We describe a measure of readability that uses a combination of lexical features and grammatical features that are derived from subtrees of syntact ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
A reading difficulty measure can be described as a function or model that maps a text to a numerical value corresponding to a difficulty or grade level. We describe a measure of readability that uses a combination of lexical features and grammatical features that are derived from subtrees of syntactic parses. We also tested statistical models for nominal, ordinal, and interval scales of measurement. The results indicate that a model for ordinal regression, such as the proportional odds model, using a combination of grammatical and lexical features is most effective at predicting reading difficulty. 1
Information Retrieval for Education: Making Search Engines Language Aware. Themes in Science and Technology Education. Special issue on computer-aided language analysis, teaching and learning: Approaches, perspectives and applications 3(1–2), 9–30
, 2010
"... Search engines have been a major factor in making the web the successful and widely used information source it is today. Generally speaking, they make it possible to retrieve web pages on a topic specified by the keywords entered by the user. Yet web searching currently does not take into account wh ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Search engines have been a major factor in making the web the successful and widely used information source it is today. Generally speaking, they make it possible to retrieve web pages on a topic specified by the keywords entered by the user. Yet web searching currently does not take into account which of the search results are comprehensible for a given user – an issue of particular relevance when considering students in an educational setting. And current search engines do not support teachers in searching for language properties relevant for selecting texts appropriate for language students at different stages in the second language acquisition process. At the same time, raising language awareness is a major focus in second language acquisition research and foreign language teaching practice, and research since the 20s has tried to identify indicators predicting which texts are comprehensible for readers at a particular level of ability. For example, the military has been interested in ensuring that workers at a given level of education can understand the manuals they need to read in order to perform their job. We present a new search engine approach which makes it possible for teachers to search for texts both in terms of contents and in terms of their reading difficulty and other language properties. The implemented prototype builds on state-of-theart information retrieval technology and exemplifies how a range of readability measures can be integrated in a modular fashion. 1
Helping Our Own: Text Massaging for Computational Linguistics as a New Shared Task
"... In this paper, we propose a new shared task called HOO: Helping Our Own. The aim is to use tools and techniques developed in computational linguistics to help people writing about computational linguistics. We describe a text-to-text generation scenario that poses challenging research questions, and ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper, we propose a new shared task called HOO: Helping Our Own. The aim is to use tools and techniques developed in computational linguistics to help people writing about computational linguistics. We describe a text-to-text generation scenario that poses challenging research questions, and delivers practical outcomes that are useful in the first case to our own community and potentially much more widely. Two specific factors make us optimistic that this task will generate useful outcomes: one is the availability of the ACL Anthology, a large corpus of the target text type; the other is that CL researchers who are non-native speakers of English will be motivated to use prototype systems, providing informed and precise feedback in large quantity. We lay out our plans in detail and invite comment and critique with the aim of improving the nature of the planned exercise. 1

