Results 1 -
3 of
3
Pinterest Board Recommendation for Twitter Users
"... Pinboard on Pinterest is an emerging media to engage on-line social media users, on which users post online images for specific topics. Regardless of its significance, there is little previous work specifically to facilitate information discov-ery based on pinboards. This paper proposes a novel pin- ..."
Abstract
- Add to MetaCart
(Show Context)
Pinboard on Pinterest is an emerging media to engage on-line social media users, on which users post online images for specific topics. Regardless of its significance, there is little previous work specifically to facilitate information discov-ery based on pinboards. This paper proposes a novel pin-board recommendation system for Twitter users. In order to associate contents from the two social media platforms, we propose to use MultiLabel classification to map Twitter user followees to pinboard topics and visual diversification to recommend pinboards given user interested topics. A pre-liminary experiment on a dataset with 2000 users validated our proposed system.
Comparing Tweet Classifications by Authors ’ Hashtags, Machine Learning, and Human Annotators
"... Abstract—Over the last years, many papers have been published about how to use machine learning for classifying postings on microblogging platforms like Twitter, e. g., in order to assist users to reach tweets that interest them. Typically, the automatic classification results are then evaluated aga ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Over the last years, many papers have been published about how to use machine learning for classifying postings on microblogging platforms like Twitter, e. g., in order to assist users to reach tweets that interest them. Typically, the automatic classification results are then evaluated against a gold standard classification which consists of either (i) the hashtags of the tweets ’ authors, or (ii) manual annotations of independent human annotators. In this paper, we show that there are fundamental differences between these two kinds of gold standard classifications, i. e., human annotators are more likely to classify tweets like other human annotators than like the tweets ’ authors. Furthermore, we discuss how these differences may influence the evaluation of automatic classifications, like they may be achieved by Latent Dirichlet Allocation (LDA). We argue that researchers who conduct machine learning experiments for tweet classification should pay particular attention to the kind of gold standard they use. One may even argue that hashtags are not appropriate as a gold standard for tweet classification. I.
Sources of Evidence for Automatic Indexing of Political Texts
"... Abstract. Political texts on the Web, documenting laws and policies and the pro-cess leading to them, are of key importance to government, industry, and every individual citizen. Yet access to such texts is difficult due to the ever increas-ing volume and complexity of the content, prompting the nee ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Political texts on the Web, documenting laws and policies and the pro-cess leading to them, are of key importance to government, industry, and every individual citizen. Yet access to such texts is difficult due to the ever increas-ing volume and complexity of the content, prompting the need for indexing or annotating them with a common controlled vocabulary or ontology. In this pa-per, we investigate the effectiveness of different sources of evidence—such as the labeled training data, textual glosses of descriptor terms, and the thesaurus structure—for automatically indexing political texts. Our main findings are the following. First, using a learning to rank (LTR) approach integrating all features, we observe significantly better performance than previous systems. Second, the analysis of feature weights reveals the relative importance of various sources of evidence, also giving insight in the underlying classification problem. Third, a lean-and-mean system using only four features (text, title, descriptor glosses, de-scriptor term popularity) is able to perform at 97 % of the large LTR model.