Results 1 -
7 of
7
A dependency parser for tweets
- In Proceedings of the Conference on Empirical Methods in Natural Language Processing
"... We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80 % unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contribu-tions. Our dataset and parser can be found at
Event Detection and Factuality Assessment with Non-Expert Supervision
"... Abstract Events are communicated in natural language with varying degrees of certainty. For example, if you are "hoping for a raise," it may be somewhat less likely than if you are "expecting" one. To study these distinctions, we present scalable, highquality annotation schemes ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Events are communicated in natural language with varying degrees of certainty. For example, if you are "hoping for a raise," it may be somewhat less likely than if you are "expecting" one. To study these distinctions, we present scalable, highquality annotation schemes for event detection and fine-grained factuality assessment. We find that non-experts, with very little training, can reliably provide judgments about what events are mentioned and the extent to which the author thinks they actually happened. We also show how such data enables the development of regression models for fine-grained scalar factuality predictions that outperform strong baselines.
CREDBANK: A Large-scale Social Media Corpus With Associated Credibility Annotations
"... Abstract Social media has quickly risen to prominence as a news source, yet lingering doubts remain about its ability to spread rumor and misinformation. Systematically studying this phenomenon, however, has been difficult due to the need to collect large-scale, unbiased data along with in-situ jud ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Social media has quickly risen to prominence as a news source, yet lingering doubts remain about its ability to spread rumor and misinformation. Systematically studying this phenomenon, however, has been difficult due to the need to collect large-scale, unbiased data along with in-situ judgements of its accuracy. In this paper we present CREDBANK, a corpus designed to bridge this gap by systematically combining machine and human computation. Specifically, CREDBANK is a corpus of tweets, topics, events and associated human credibility judgements. It is based on the real-time tracking of more than 1 billion streaming tweets over a period of more than three months, computational summarizations of those tweets, and intelligent routings of the tweet streams to human annotators-within a few hours of those events unfolding on Twitter. In total CREDBANK comprises more than 60 million tweets grouped into 1049 real-world events, each annotated by 30 human annotators. As an example, with CREDBANK one can quickly calculate that roughly 24% of the events in the global tweet stream are not perceived as credible. We have made CREDBANK publicly available, and hope it will enable new research questions related to online information credibility in fields such as social science, data mining and health.
Recognizing Complex Negation on Twitter
"... After the Great East Japan Earthquake in 2011, an abundance of false rumors were dis-seminated on Twitter that actually hindered rescue activities. This work presents a method for recognizing the negation of predicates on Twitter to find Japanese tweets that refute false rumors. We assume that the p ..."
Abstract
- Add to MetaCart
After the Great East Japan Earthquake in 2011, an abundance of false rumors were dis-seminated on Twitter that actually hindered rescue activities. This work presents a method for recognizing the negation of predicates on Twitter to find Japanese tweets that refute false rumors. We assume that the predicate “oc-cur ” is negated in the sentence “The guy who tweeted that a nuclear explosion occurred has watched too many SF movies. ” The challenge is in the treatment of such complex negation. We have to recognize a wide range of complex negation expressions such as “it is theoreti-cally impossible that... ” and “The guy who... watched too many SF movies. ” We tackle this problem using a combination of a supervised classifier and clusters of n-grams derived from large un-annotated corpora. The n-gram clus-ters give us a gain of about 22 % in F-score for complex negations.
Event Detection and Factuality
"... Events are communicated in natural lan-guage with varying degrees of certainty. For example, if you are “hoping for a raise, ” it may be somewhat less likely than if you are “expecting ” one. To study these distinctions, we present scalable, high-quality annotation schemes for event de-tection and f ..."
Abstract
- Add to MetaCart
Events are communicated in natural lan-guage with varying degrees of certainty. For example, if you are “hoping for a raise, ” it may be somewhat less likely than if you are “expecting ” one. To study these distinctions, we present scalable, high-quality annotation schemes for event de-tection and fine-grained factuality assess-ment. We find that non-experts, with very little training, can reliably provide judgments about what events are men-tioned and the extent to which the author thinks they actually happened. We also show how such data enables the develop-ment of regression models for fine-grained scalar factuality predictions that outper-form strong baselines. 1
QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting Patterns
"... ABSTRACT Given the extremely large pool of events and stories available, media outlets need to focus on a subset of issues and aspects to convey to their audience. Outlets are often accused of exhibiting a systematic bias in this selection process, with different outlets portraying different versio ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT Given the extremely large pool of events and stories available, media outlets need to focus on a subset of issues and aspects to convey to their audience. Outlets are often accused of exhibiting a systematic bias in this selection process, with different outlets portraying different versions of reality. However, in the absence of objective measures and empirical evidence, the direction and extent of systematicity remains widely disputed. In this paper we propose a framework based on quoting patterns for quantifying and characterizing the degree to which media outlets exhibit systematic bias. We apply this framework to a massive dataset of news articles spanning the six years of Obama's presidency and all of his speeches, and reveal that a systematic pattern does indeed emerge from the outlet's quoting behavior. Moreover, we show that this pattern can be successfully exploited in an unsupervised prediction setting, to determine which new quotes an outlet will select to broadcast. By encoding bias patterns in a low-rank space we provide an analysis of the structure of political media coverage. This reveals a latent media bias space that aligns surprisingly well with political ideology and outlet type. A linguistic analysis exposes striking differences across these latent dimensions, showing how the different types of media outlets portray different realities even when reporting on the same events. For example, outlets mapped to the mainstream conservative side of the latent space focus on quotes that portray a presidential persona disproportionately characterized by negativity.
Event Detection and Factuality Assessment with Non-Expert Supervision
"... Events are communicated in natural lan-guage with varying degrees of certainty. For example, if you are “hoping for a raise, ” it may be somewhat less likely than if you are “expecting ” one. To study these distinctions, we present scalable, high-quality annotation schemes for event de-tection and f ..."
Abstract
- Add to MetaCart
Events are communicated in natural lan-guage with varying degrees of certainty. For example, if you are “hoping for a raise, ” it may be somewhat less likely than if you are “expecting ” one. To study these distinctions, we present scalable, high-quality annotation schemes for event de-tection and fine-grained factuality assess-ment. We find that non-experts, with very little training, can reliably provide judgments about what events are men-tioned and the extent to which the author thinks they actually happened. We also show how such data enables the develop-ment of regression models for fine-grained scalar factuality predictions that outper-form strong baselines. 1