Results 1 -
9 of
9
A Unified Model for Topics, Events and Users on Twitter
"... With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also fol-low and tweet about them. Moreover ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also fol-low and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso-ciation between events and topics. Our exper-iments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recom-mendation and grouping events by topics. 1
TwitterNews: Real Time Event Detection from the Twitter Data Stream
"... Abstract-Research in event detection from the Twitter streaming data has been gaining momentum in the last couple of years. Although such data is noisy and often contains misleading information, Twitter can be a rich source of information if harnessed properly. In this paper, we propose a scalable ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract-Research in event detection from the Twitter streaming data has been gaining momentum in the last couple of years. Although such data is noisy and often contains misleading information, Twitter can be a rich source of information if harnessed properly. In this paper, we propose a scalable event detection system, TwitterNews, to detect and track newsworthy events in real time from Twitter. TwitterNews provides a novel approach, by combining random indexing based term vector model with locality sensitive hashing, that aids in performing incremental clustering of tweets related to various events within a fixed time. TwitterNews also incorporates an effective strategy to deal with the cluster fragmentation issue prevalent in incremental clustering. The set of candidate events generated by TwitterNews are then filtered, to report the newsworthy events along with an automatically selected representative tweet from each event cluster. Finally, we evaluate the effectiveness of TwitterNews, in terms of the recall and the precision, using a publicly available corpus.
GEAM: A General and Event-Related Aspects Model for Twitter Event Detection
"... Abstract. Event detection on Twitter has become a promising research direction due to Twitter's popularity, up-to-date feature, free writing style and so on. Unfortunately, it's a challenge to analyze Twitter dataset for event detection, since the informal expressions of short messages co ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Event detection on Twitter has become a promising research direction due to Twitter's popularity, up-to-date feature, free writing style and so on. Unfortunately, it's a challenge to analyze Twitter dataset for event detection, since the informal expressions of short messages comprise many abbreviations, Internet buzzwords, spelling mistakes, meaningless contents etc. Previous techniques proposed for Twitter event detection mainly focus on clustering bursty words related to the events, while ignoring that these words may not be easily interpreted to clear event descriptions. In this paper, we propose a General and Eventrelated Aspects Model (GEAM), a new topic model for event detection from Twitter that associates General topics and Event-related Aspects with events. We then introduce a collapsed Gibbs sampling algorithm to estimate the word distributions of General topics and Event-related Aspects in GEAM. Our experiments based on over 7 million tweets demonstrate that GEAM outperforms the state-of-the-art topic model in terms of both Precision and DERate (measuring Duplicated Events Rate detected). Particularly, GEAM can get better event representation by providing a 4-tuple (T ime, Locations, Entities, Keywords) structure of the detected events. We show that GEAM not only can be used to effectively detect events but also can be used to analyze event trends.
Fast Topic Discovery From Web Search Streams
"... ABSTRACT Web search involves voluminous data streams that record millions of users' interactions with the search engine. Recently latent topics in web search data have been found to be critical for a wide range of search engine applications such as search personalization and search history war ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT Web search involves voluminous data streams that record millions of users' interactions with the search engine. Recently latent topics in web search data have been found to be critical for a wide range of search engine applications such as search personalization and search history warehousing. However, the existing methods usually discover latent topics from web search data in an offline and retrospective fashion. Hence, they are increasingly ineffective in the face of the ever-increasing web search data that accumulate in the format of online streams. In this paper, we propose a novel probabilistic topic model, the Web Search Stream Model (WSSM), which is delicately calibrated for handling two salient features of the web search data: it is in the format of streams and in massive volume. We further propose an efficient parameter inference method, the Stream Parameter Inference (SPI) to efficiently train WSSM with massive web search streams. Based on a large-scale search engine query log, we conduct extensive experiments to verify the effectiveness and efficiency of WSSM and SPI. We observe that WSSM together with SPI discovers latent topics from web search streams faster than the state-of-the-art methods while retaining a comparable topic modeling accuracy.
CREDBANK: A Large-scale Social Media Corpus With Associated Credibility Annotations
"... Abstract Social media has quickly risen to prominence as a news source, yet lingering doubts remain about its ability to spread rumor and misinformation. Systematically studying this phenomenon, however, has been difficult due to the need to collect large-scale, unbiased data along with in-situ jud ..."
Abstract
- Add to MetaCart
Abstract Social media has quickly risen to prominence as a news source, yet lingering doubts remain about its ability to spread rumor and misinformation. Systematically studying this phenomenon, however, has been difficult due to the need to collect large-scale, unbiased data along with in-situ judgements of its accuracy. In this paper we present CREDBANK, a corpus designed to bridge this gap by systematically combining machine and human computation. Specifically, CREDBANK is a corpus of tweets, topics, events and associated human credibility judgements. It is based on the real-time tracking of more than 1 billion streaming tweets over a period of more than three months, computational summarizations of those tweets, and intelligent routings of the tweet streams to human annotators-within a few hours of those events unfolding on Twitter. In total CREDBANK comprises more than 60 million tweets grouped into 1049 real-world events, each annotated by 30 human annotators. As an example, with CREDBANK one can quickly calculate that roughly 24% of the events in the global tweet stream are not perceived as credible. We have made CREDBANK publicly available, and hope it will enable new research questions related to online information credibility in fields such as social science, data mining and health.
Place as topics: analysis of spatial and temporal evolution of topics from social networks data
"... Abstract Geography in a commonsense way is about place. Place is a term used to describe the meaning that humans give to a location. Characterising a location as a place requires a huge amount of time to collect and analyse data. Furthermore, a place definition associated to a location can become r ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Geography in a commonsense way is about place. Place is a term used to describe the meaning that humans give to a location. Characterising a location as a place requires a huge amount of time to collect and analyse data. Furthermore, a place definition associated to a location can become rapidly obsolete. Nowadays, social networks and social media became very popular. People on social networks act like social sensors, reporting information about society, politics, economics, etc. Thus, many researchers have focused on the analysis of posts, combining them together with algorithms or extracting their meaning, keywords or users' interests. In this paper, I will describe my research project, a visual framework that aims to simplify the process of place definition using topics generated from the application of Blei et al.'s Latent Dirichlet Allocation
A Framework for Real-Time Twitter Data Analysis
"... Accepted version It is advisable to refer to the publisher’s version if you intend to cite from the work. ..."
Abstract
- Add to MetaCart
(Show Context)
Accepted version It is advisable to refer to the publisher’s version if you intend to cite from the work.
Distant-supervised Language Model for Detecting Emotional Upsurge on Twitter
"... Event-specific twitter streams often reveal sudden spikes triggered by users ’ upsurge of emotions to crucial moments in the real world. Although upsurge of emotion is usually identi-fied by a sudden rise in the number of tweets, the detection for diverse event streams is not a trivial task. In this ..."
Abstract
- Add to MetaCart
(Show Context)
Event-specific twitter streams often reveal sudden spikes triggered by users ’ upsurge of emotions to crucial moments in the real world. Although upsurge of emotion is usually identi-fied by a sudden rise in the number of tweets, the detection for diverse event streams is not a trivial task. In this paper, we propose a new method to extract spiking tweets with up-surge of emotions based on characteristic ex-pressions used in tweets. The core part of our method is to use a distant-supervised language model (Spike LM) built from tweets in spikes to capture such expressions. We investigate the performance of detecting emotional spik-ing tweets using language models including Spike LM. Our experimental results show that the natural language expressions used in emo-tional upsurge fit specifically well to Spike LM. 1
Privacy Detective: Detecting Private Information and Collective Privacy Behavior in a Large Social Network
"... Detecting the presence and amount of private information being shared in online media is the first step towards an-alyzing information revealing habits of users in social net-works and a useful method for researchers to study aggre-gate privacy behavior. In this work, we aim to find out if text cont ..."
Abstract
- Add to MetaCart
(Show Context)
Detecting the presence and amount of private information being shared in online media is the first step towards an-alyzing information revealing habits of users in social net-works and a useful method for researchers to study aggre-gate privacy behavior. In this work, we aim to find out if text contains private content by using our novel learn-ing based approach ‘privacy detective ’ that combines topic modeling, named entity recognition, privacy ontology, senti-ment analysis, and text normalization to represent privacy features. Privacy detective investigates a broader range of privacy concerns compared to previous approaches that fo-cus on keyword searching or profile related properties. We collected 500,000 tweets from 100,000 Twitter users along with other information such as tweet linkages and follower relationships. We reach 95.45 % accuracy in a two-class task classifying Twitter users who do not reveal much private in-formation and Twitter users who share sensitive informa-tion. We score timelines according to three privacy lev-els after having Amazon Mechanical Turk (AMT) workers annotate collected tweets according to privacy categories. Supervised machine learning classification results on these annotations reach 69.63 % accuracy on a three-class task. Inter-annotator agreement on timeline privacy scores be-tween various AMT workers and our classifiers fall under the same positive agreement level. Additionally, we show that a user’s privacy level is correlated with her friends ’ pri-vacy scores and also with the privacy scores of people men-tioned in her text but not with the number of her followers. As such, privacy in social networks appear to be socially constructed, which can have great implications for privacy enhancing technologies and educational interventions. Keywords privacy; detecting private information; sensitive informa-tion; text classification; privacy behavior; social network 1.