• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

On-line trend analysis with topic models: #twitter trends detection topic model online. In (2012)

by J H Lau, N Collier, T Baldwin
Venue:COLING,
Add To MetaCart

Tools

Sorted by:
Results 1 - 9 of 9

A Unified Model for Topics, Events and Users on Twitter

by Qiming Diao, Jing Jiang
"... With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also fol-low and tweet about them. Moreover ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also fol-low and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso-ciation between events and topics. Our exper-iments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recom-mendation and grouping events by topics. 1
(Show Context)

Citation Context

...pics range from music and sports to more serious ones like politics and religion. Much work has been done to analyze topics on Twitter (Ramage et al., 2010; Hong and Davison, 2010; Zhao et al., 2011; =-=Lau et al., 2012-=-). (2) Events. These are things that take place at a certain time and attract many people’s shortterm attention in social media. Example events include concerts, sports games, scandals and elections. ...

TwitterNews: Real Time Event Detection from the Twitter Data Stream

by Mahmud Hasan , Mehmet Orgun , Rolf Schwitter
"... Abstract-Research in event detection from the Twitter streaming data has been gaining momentum in the last couple of years. Although such data is noisy and often contains misleading information, Twitter can be a rich source of information if harnessed properly. In this paper, we propose a scalable ..."
Abstract - Add to MetaCart
Abstract-Research in event detection from the Twitter streaming data has been gaining momentum in the last couple of years. Although such data is noisy and often contains misleading information, Twitter can be a rich source of information if harnessed properly. In this paper, we propose a scalable event detection system, TwitterNews, to detect and track newsworthy events in real time from Twitter. TwitterNews provides a novel approach, by combining random indexing based term vector model with locality sensitive hashing, that aids in performing incremental clustering of tweets related to various events within a fixed time. TwitterNews also incorporates an effective strategy to deal with the cluster fragmentation issue prevalent in incremental clustering. The set of candidate events generated by TwitterNews are then filtered, to report the newsworthy events along with an automatically selected representative tweet from each event cluster. Finally, we evaluate the effectiveness of TwitterNews, in terms of the recall and the precision, using a publicly available corpus.
(Show Context)

Citation Context

...tc. In addition, most of the content found on Twitter is not related to any event and the high volume of data on a diverse number of topics poses a big challenge in terms of scalability. Despite these challenges, it would be beneficial to develop a system that can lead to real time detection and tracking of events as Twitter does not provide a tool to see event summaries except for searching on trending topics or using hashtags. Different approaches have been taken by the researchers to deal with the event detection task. The approaches based on term interestingness [2]–[6] and topic modeling [7]–[11] suffer from high computational cost among other things. However, incremental clustering based approaches [12]–[14] usually provide a low computational cost solution. Taking this into consideration, we propose an incremental clustering based end-to-end solution to detect newsworthy events from a stream of time ordered tweets. The problem of event detection from the Twitter data stream in an incremental clustering context can be divided into two major stages. The first stage involves detecting a burst in the number of tweets discussing a topic/event and the second stage involves grouping/c...

GEAM: A General and Event-Related Aspects Model for Twitter Event Detection

by Yue You , Guangyan Huang , Jian Cao , Enhong Chen , Jing He , Yanchun Zhang , Liang Hu
"... Abstract. Event detection on Twitter has become a promising research direction due to Twitter's popularity, up-to-date feature, free writing style and so on. Unfortunately, it's a challenge to analyze Twitter dataset for event detection, since the informal expressions of short messages co ..."
Abstract - Add to MetaCart
Abstract. Event detection on Twitter has become a promising research direction due to Twitter's popularity, up-to-date feature, free writing style and so on. Unfortunately, it's a challenge to analyze Twitter dataset for event detection, since the informal expressions of short messages comprise many abbreviations, Internet buzzwords, spelling mistakes, meaningless contents etc. Previous techniques proposed for Twitter event detection mainly focus on clustering bursty words related to the events, while ignoring that these words may not be easily interpreted to clear event descriptions. In this paper, we propose a General and Eventrelated Aspects Model (GEAM), a new topic model for event detection from Twitter that associates General topics and Event-related Aspects with events. We then introduce a collapsed Gibbs sampling algorithm to estimate the word distributions of General topics and Event-related Aspects in GEAM. Our experiments based on over 7 million tweets demonstrate that GEAM outperforms the state-of-the-art topic model in terms of both Precision and DERate (measuring Duplicated Events Rate detected). Particularly, GEAM can get better event representation by providing a 4-tuple (T ime, Locations, Entities, Keywords) structure of the detected events. We show that GEAM not only can be used to effectively detect events but also can be used to analyze event trends.
(Show Context)

Citation Context

... Obama’s victory speech, there are 327k tweets per second posted/reposted to discuss Obama’s re-election6. One of the typical tweets like “Thank The Lord for that!! Well done Obama in Chicago, Illinois!! That was a Great victory speech!”. In this tweet: Time is “November 7, 2012”, “Chicago” and “Illinois” representLocations, “Obama” representsEntity,Keywords are “Great”, “victory”, and “speech”, and the rest are General words. GEAM can provide a clearer description of events by separating General words from Event-related Aspects words. The most related work is the model proposed by Lau et al. [12]. They introduce an online processing variant of topic model Latent Dirichlet Allocation (LDA) [13, 14] to analyze tweets trend. Their graphic structure is the same as LDA, which models each tweet as a multinomial mixture of all topics or events. However, this assumption is obviously unreasonable due to Twitter’s short text length, most users only discuss one event in a tweet. In this paper, we improve LDA for Twitter event detection by assigning only one event to an event-related tweet and differentiating General words from Event-related Aspects words. To the best of our knowledge, we are the...

Fast Topic Discovery From Web Search Streams

by Di Jiang , Kenneth , Wai-Ting Leung , Wilfred Ng
"... ABSTRACT Web search involves voluminous data streams that record millions of users' interactions with the search engine. Recently latent topics in web search data have been found to be critical for a wide range of search engine applications such as search personalization and search history war ..."
Abstract - Add to MetaCart
ABSTRACT Web search involves voluminous data streams that record millions of users' interactions with the search engine. Recently latent topics in web search data have been found to be critical for a wide range of search engine applications such as search personalization and search history warehousing. However, the existing methods usually discover latent topics from web search data in an offline and retrospective fashion. Hence, they are increasingly ineffective in the face of the ever-increasing web search data that accumulate in the format of online streams. In this paper, we propose a novel probabilistic topic model, the Web Search Stream Model (WSSM), which is delicately calibrated for handling two salient features of the web search data: it is in the format of streams and in massive volume. We further propose an efficient parameter inference method, the Stream Parameter Inference (SPI) to efficiently train WSSM with massive web search streams. Based on a large-scale search engine query log, we conduct extensive experiments to verify the effectiveness and efficiency of WSSM and SPI. We observe that WSSM together with SPI discovers latent topics from web search streams faster than the state-of-the-art methods while retaining a comparable topic modeling accuracy.
(Show Context)

Citation Context

...escribed an application of partially observable Markov model to analyze a large-scale query log. Among the diverse types of probabilistic models, topic models are found to be an effective tool for query log analysis [6, 15, 17, 23]. For example, [17] proposed a topic-concept cube that supports online multidimensional mining of query log. [34] presented a topic model that captures latent structure of textual data and how the structure changes over time. More recently, with the popularity of microblogs, researchers are aware of the importance of analyzing text streams in real-life applications. [19] presented a topic model to track emerging events in microblog data such as tweets. [13] presented an algorithm to model diversity phenomena in tweet streams based on topical diversity and geographical diversity. [9, 8] described how to efficiently capture the statistics of stream data. However, to the best of our knowledge, none of the existing techniques are primarily proposed for processing massive web search streams. Their capabilities are limited in term of the effectiveness and the efficiency of topic discovery from web search data. Besides designing effective probabilistic models, explo...

CREDBANK: A Large-scale Social Media Corpus With Associated Credibility Annotations

by Tanushree Mitra , Eric Gilbert
"... Abstract Social media has quickly risen to prominence as a news source, yet lingering doubts remain about its ability to spread rumor and misinformation. Systematically studying this phenomenon, however, has been difficult due to the need to collect large-scale, unbiased data along with in-situ jud ..."
Abstract - Add to MetaCart
Abstract Social media has quickly risen to prominence as a news source, yet lingering doubts remain about its ability to spread rumor and misinformation. Systematically studying this phenomenon, however, has been difficult due to the need to collect large-scale, unbiased data along with in-situ judgements of its accuracy. In this paper we present CREDBANK, a corpus designed to bridge this gap by systematically combining machine and human computation. Specifically, CREDBANK is a corpus of tweets, topics, events and associated human credibility judgements. It is based on the real-time tracking of more than 1 billion streaming tweets over a period of more than three months, computational summarizations of those tweets, and intelligent routings of the tweet streams to human annotators-within a few hours of those events unfolding on Twitter. In total CREDBANK comprises more than 60 million tweets grouped into 1049 real-world events, each annotated by 30 human annotators. As an example, with CREDBANK one can quickly calculate that roughly 24% of the events in the global tweet stream are not perceived as credible. We have made CREDBANK publicly available, and hope it will enable new research questions related to online information credibility in fields such as social science, data mining and health.

Place as topics: analysis of spatial and temporal evolution of topics from social networks data

by Giovanni Siragusa
"... Abstract Geography in a commonsense way is about place. Place is a term used to describe the meaning that humans give to a location. Characterising a location as a place requires a huge amount of time to collect and analyse data. Furthermore, a place definition associated to a location can become r ..."
Abstract - Add to MetaCart
Abstract Geography in a commonsense way is about place. Place is a term used to describe the meaning that humans give to a location. Characterising a location as a place requires a huge amount of time to collect and analyse data. Furthermore, a place definition associated to a location can become rapidly obsolete. Nowadays, social networks and social media became very popular. People on social networks act like social sensors, reporting information about society, politics, economics, etc. Thus, many researchers have focused on the analysis of posts, combining them together with algorithms or extracting their meaning, keywords or users' interests. In this paper, I will describe my research project, a visual framework that aims to simplify the process of place definition using topics generated from the application of Blei et al.'s Latent Dirichlet Allocation
(Show Context)

Citation Context

...e of the wave represents the number of co-occurrences between the main word and the other words inside a topic: high amplitudes represent elevate co-occurrences frequency. In (Wang and McCallum, 2006) Wang and McCallum proposed a modified LDA model, called Topics Over Time, where topic discovery is influenced both by word co-occurrences and temporal information. In their work, the authors model the time as a continue distribution, defined by a Beta distribution over a parameter Ψ, associated with each topic which is responsible to generate both patterns and topics distribution. Lau et al. in (Lau et al., 2012) proposed a novel method to track emerging events in microblogs (e.g., Twitter). Their method defines a window of time slices, where each time slice contains several documents, and updates parameters α and β for each old word and document. Novel words and documents are initialised using two parameters, α0 and β0, that are defined a priori. Another related work, not linked to the LDA model, is (Di Caro et al., 2011). Di Caro et al. proposed a framework called TMine which defines a navigable tag-flag. A tag-flag can be thought as a topic because it contains a set of related words. 3. Research Qu...

A Framework for Real-Time Twitter Data Analysis

by S. Gaglio, G. Lo Re, M. Morana, Salvatore Gaglioa, Marco Moranaa
"... Accepted version It is advisable to refer to the publisher’s version if you intend to cite from the work. ..."
Abstract - Add to MetaCart
Accepted version It is advisable to refer to the publisher’s version if you intend to cite from the work.
(Show Context)

Citation Context

...ar, often represent the basis of several works which analyze data from social networks in order to detect trending topics. A topic modeling technique based on an online variant of LDA is presented in =-=[10]-=-. The Twitter stream is processed in time slices and older tweets are discarded allowing the model to be constant in size. Moreover, differently from the on-line LDA approach [11], a dynamic vocabular...

Distant-supervised Language Model for Detecting Emotional Upsurge on Twitter

by Yoshinari Fujinumay Z, Hikaru Yokonoz, Pascual Martı́nez-gómezx Z, Akiko Aizaway Z
"... Event-specific twitter streams often reveal sudden spikes triggered by users ’ upsurge of emotions to crucial moments in the real world. Although upsurge of emotion is usually identi-fied by a sudden rise in the number of tweets, the detection for diverse event streams is not a trivial task. In this ..."
Abstract - Add to MetaCart
Event-specific twitter streams often reveal sudden spikes triggered by users ’ upsurge of emotions to crucial moments in the real world. Although upsurge of emotion is usually identi-fied by a sudden rise in the number of tweets, the detection for diverse event streams is not a trivial task. In this paper, we propose a new method to extract spiking tweets with up-surge of emotions based on characteristic ex-pressions used in tweets. The core part of our method is to use a distant-supervised language model (Spike LM) built from tweets in spikes to capture such expressions. We investigate the performance of detecting emotional spik-ing tweets using language models including Spike LM. Our experimental results show that the natural language expressions used in emo-tional upsurge fit specifically well to Spike LM. 1
(Show Context)

Citation Context

...n languages used in emotional spiking tweets. Many tasks on Twitter including burst detection (Kleinberg, 2003; Diao et al., 2012), first story detection (Petrović et al., 2010), and topic tracking (=-=Lau et al., 2012-=-) failed to effectively incorporate the textual characteristics of tweets and regard it is out of their scope. Being able to characterize tweets from emotional upsurge would open a window to the ident...

Privacy Detective: Detecting Private Information and Collective Privacy Behavior in a Large Social Network

by Aylin Caliskan-islam, Jonathan Walsh, Rachel Greenstadt
"... Detecting the presence and amount of private information being shared in online media is the first step towards an-alyzing information revealing habits of users in social net-works and a useful method for researchers to study aggre-gate privacy behavior. In this work, we aim to find out if text cont ..."
Abstract - Add to MetaCart
Detecting the presence and amount of private information being shared in online media is the first step towards an-alyzing information revealing habits of users in social net-works and a useful method for researchers to study aggre-gate privacy behavior. In this work, we aim to find out if text contains private content by using our novel learn-ing based approach ‘privacy detective ’ that combines topic modeling, named entity recognition, privacy ontology, senti-ment analysis, and text normalization to represent privacy features. Privacy detective investigates a broader range of privacy concerns compared to previous approaches that fo-cus on keyword searching or profile related properties. We collected 500,000 tweets from 100,000 Twitter users along with other information such as tweet linkages and follower relationships. We reach 95.45 % accuracy in a two-class task classifying Twitter users who do not reveal much private in-formation and Twitter users who share sensitive informa-tion. We score timelines according to three privacy lev-els after having Amazon Mechanical Turk (AMT) workers annotate collected tweets according to privacy categories. Supervised machine learning classification results on these annotations reach 69.63 % accuracy on a three-class task. Inter-annotator agreement on timeline privacy scores be-tween various AMT workers and our classifiers fall under the same positive agreement level. Additionally, we show that a user’s privacy level is correlated with her friends ’ pri-vacy scores and also with the privacy scores of people men-tioned in her text but not with the number of her followers. As such, privacy in social networks appear to be socially constructed, which can have great implications for privacy enhancing technologies and educational interventions. Keywords privacy; detecting private information; sensitive informa-tion; text classification; privacy behavior; social network 1.
(Show Context)

Citation Context

...ive information in text and gives suggestions to sanitize sentences. Their method involves querying the Internet for detections and recommendations. There have been numerous studies on topic modeling =-=[18]-=-, named entity recognition [25], and sentiment analysis [6] on Twitter as well as normalizing micro-text [30] though not focusing on tweets in particular. 3. PROBLEM STATEMENT AND THREAT MODEL The mai...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University