Results 1 -
7 of
7
Corpus creation for new genres: A crowdsourced approach to PP attachment
- In NAACL Workshop on Creating Speech and Language Data With Amazon’s Mechanical Turk
, 2010
"... This paper explores the task of building an accurate prepositional phrase attachment corpus for new genres while avoiding a large investment in terms of time and money by crowdsourcing judgments. We develop and present a system to extract prepositional phrases and their potential attachments from un ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper explores the task of building an accurate prepositional phrase attachment corpus for new genres while avoiding a large investment in terms of time and money by crowdsourcing judgments. We develop and present a system to extract prepositional phrases and their potential attachments from ungrammatical and informal sentences and pose the subsequent disambiguation tasks as multiple choice questions to workers from Amazon’s Mechanical Turk service. Our analysis shows that this two-step approach is capable of producing reliable annotations on informal and potentially noisy blog text, and this semi-automated strategy holds promise for similar annotation projects in new genres. 1
Adopting Inference Networks for Online Thread Retrieval
"... Online forums contain valuable human-generated information. End-users looking for information would like to find only those threads in forums where relevant information is present. Due to the distinctive characteristics of forum pages from generic web pages, special techniques are required to organi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Online forums contain valuable human-generated information. End-users looking for information would like to find only those threads in forums where relevant information is present. Due to the distinctive characteristics of forum pages from generic web pages, special techniques are required to organize and search for information in these forums. Threads and pages in forums are different from other webpages in their hyperlinking patterns. Forum posts also have associated social and non-textual metadata. In this paper, we propose a model for online thread retrieval based on inference networks that utilizes the structural properties of forum threads. We also investigate the effects of incorporating various relevance indicators in our model. We empirically show the effectiveness of our proposed model using real-world data. 1
Learning online discussion structures by conditional random fields
- In SIGIR ’11
, 2011
"... Online forum discussions are emerging as valuable information repository, where knowledge is accumulated by the interaction among users, leading to multiple threads with structures. Such replying structure in each thread conveys important information about the discussion content. Unfortunately, not ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Online forum discussions are emerging as valuable information repository, where knowledge is accumulated by the interaction among users, leading to multiple threads with structures. Such replying structure in each thread conveys important information about the discussion content. Unfortunately, not all the online forum sites would explicitly record such replying relationship, making it hard for both users and computers to digest the information buried in a discussion thread. In this paper, we propose a probabilistic model in the Conditional Random Fields framework to predict the replying structure for a threaded online discussion. Different from previous replying relation reconstruction methods, most of which fail to consider dependency between the posts, we cast the problem as a supervised structure learning problem to incorporate the features capturing the structural dependency and learn their relationship. Experiment results on three different online forums show that the proposed method can well capture the replying structures in online discussion threads, and multiple tasks such as forum search and question answering can benefit from the reconstructed replying structures.
Exploiting Thread Structure to Improve Smoothing of Language Models for Forum Post Retrieval
- In Proceedings of the 33rd ECIR
, 2011
"... Abstract. Due to many unique characteristics of forum data, forum post retrieval is different from traditional document retrieval and web search, raising interesting research questions about how to optimize the accuracy of forum post retrieval. In this paper, we study how to exploit the naturally av ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Due to many unique characteristics of forum data, forum post retrieval is different from traditional document retrieval and web search, raising interesting research questions about how to optimize the accuracy of forum post retrieval. In this paper, we study how to exploit the naturally available raw thread structures of forums to improve retrieval accuracy in the language modeling framework. Specifically, we propose and study two different schemes for smoothing the language model of a forum post based on the thread containing the post. We explore several different variants of the two schemes to exploit thread structures in different ways. We also create a human annotated test data set for forum post retrieval and evaluate the proposed smoothing methods using this data set. The experiment results show that the proposed methods for leveraging forum threads to improve estimation of document language models are effective, and they outperform the existing smoothing methods for the forum post retrieval task.
Decomposing Discussion Forums using User Roles
"... Discussion forums are a central part of Web 2.0 and Enterprise 2.0 infrastructures. The health and sustainability of forums is dependent on the information exchange behaviour of its members. Such behaviour needs to be better understood and characterised so that forums can be better managed, new serv ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Discussion forums are a central part of Web 2.0 and Enterprise 2.0 infrastructures. The health and sustainability of forums is dependent on the information exchange behaviour of its members. Such behaviour needs to be better understood and characterised so that forums can be better managed, new services delivered and opportunities and risks detected. In this paper, we present a method for analysing user communication roles in discussion forums. We analyse the composition of several forums from a medium-sized national bulletin board in terms of these roles, demonstrating similarities between forums based on underlying user behaviour rather than topic. We suggest that analysing the evolution of role composition is an important step in developing a predictive model of forum health.
Incorporating Participant Reputation in Community-driven Question Answering Systems
"... gaining increasing attention with tens of millions of users and hundreds of millions of posts in recent years. Due to its size, there is a need for users to be able to search these large question answer archives and retrieve high quality content. Research work shows that user reputation modeling mak ..."
Abstract
- Add to MetaCart
gaining increasing attention with tens of millions of users and hundreds of millions of posts in recent years. Due to its size, there is a need for users to be able to search these large question answer archives and retrieve high quality content. Research work shows that user reputation modeling makes a contribution when incorporated with relevance models. However, the effectiveness of different link analysis approaches and how to embed topical information—as a user may have different expertise in various areas—are still open questions. In this work, we address these two research questions by first reviewing different link analysis schemes—especially discussing the use of PageRank-based methods since they are less commonly utilized in user reputation modeling. We also introduce Topical Page-Rank analysis for modeling user reputation on different topics. Comparative experimental results on data from Yahoo! Answers show that PageRank-based approaches are more effective than HITS-like schemes and other heuristics, and that topical link analysis can improve performance. I.
Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media Decomposing Discussion Forums and Boards Using User Roles
"... Discussion forums are a central part of Web 2.0 and Enterprise 2.0 infrastructures. The health and sustainability of forums is dependent on the information exchange behaviour of its contributors, which is expressed through online conversation. The increasing popularity and importance of forums requi ..."
Abstract
- Add to MetaCart
Discussion forums are a central part of Web 2.0 and Enterprise 2.0 infrastructures. The health and sustainability of forums is dependent on the information exchange behaviour of its contributors, which is expressed through online conversation. The increasing popularity and importance of forums requires a better understanding and characterisation of communication behaviour so that forums can be better managed, new services delivered and opportunities and risks detected. In this paper, we present an empirical analysis of user communication roles in a medium-sized bulletin board and we analyse the composition of several forums in terms of these roles, demonstrating similarities between forums based on underlying user behaviour rather than topic. 1.

