Results 1 - 10
of
52
On the collective classification of email speech acts
- In Proceedings of SIGIR-2005
, 2005
"... We consider classification of email messages as to whether or not they contain certain “email acts”, such as a request or a commitment. We show that exploiting the sequential correlation among email messages in the same thread can improve email-act classification. More specifically, we describe a ne ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
We consider classification of email messages as to whether or not they contain certain “email acts”, such as a request or a commitment. We show that exploiting the sequential correlation among email messages in the same thread can improve email-act classification. More specifically, we describe a new textclassification algorithm based on a dependency-network based collective classification method, in which the local classifiers are maximum entropy models based on words and certain relational features. We show that statistically significant improvements over a bag-of-words baseline classifier can be obtained for some, but not all, email-act classes. Performance improvement obtained by collective classification is appears to be consistent across email acts suggested by prior speech-act theory.
Talk to me: Foundations for successful individual-group interactions in online communities
- in Proceedings of the ACM Conference on Human Factors in Computing Systems
, 2006
"... People come to online communities seeking information, encouragement, and conversation. When a community responds, participants benefit and become more committed. Yet interactions often fail. In a longitudinal sample of 6,172 messages from 8 Usenet newsgroups, 27 % of posts received no response. The ..."
Abstract
-
Cited by 25 (8 self)
- Add to MetaCart
People come to online communities seeking information, encouragement, and conversation. When a community responds, participants benefit and become more committed. Yet interactions often fail. In a longitudinal sample of 6,172 messages from 8 Usenet newsgroups, 27 % of posts received no response. The information context, posters ’ prior engagement in the community, and the content of their posts all influenced the likelihood that they received a reply, and, as a result, their willingness to continue active participation. Posters were less likely to get a reply if they were newcomers. Posting on-topic, introducing oneself via autobiographical testimonials, asking questions, using less complex language and other features of the messages, increased replies. Results suggest ways that developers might increase the ability of online communities to support successful individual-group interactions. Author Keywords Online communities, community success, contribution,
Locating Complex Named Entities in Web Text
- In Proc. of IJCAI
, 2007
"... Named Entity Recognition (NER) is the task of locating and classifying names in text. In previous work, NER was limited to a small number of predefined entity classes (e.g., people, locations, and organizations). However, NER on the Web is a far more challenging problem. Complex names (e.g., film or ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
Named Entity Recognition (NER) is the task of locating and classifying names in text. In previous work, NER was limited to a small number of predefined entity classes (e.g., people, locations, and organizations). However, NER on the Web is a far more challenging problem. Complex names (e.g., film or book titles) can be very difficult to pick out precisely from text. Further, the Web contains a wide variety of entity classes, which are not known in advance. Thus, hand-tagging examples of each entity class is impractical. This paper investigates a novel approach to the first step in Web NER: locating complex named entities in Web text. Our key observation is that named entities can be viewed as a species of multiword units, which can be detected by accumulating n-gram statistics over the Web corpus. We show that this statistical method’s F1 score is 50% higher than that of supervised techniques including Conditional Random Fields (CRFs) and Conditional Markov Models (CMMs) when applied to complex names. The method also outperforms CMMs and CRFs by 117 % on entity classes absent from the training data. Finally, our method outperforms a semi-supervised CRF by 73%. 1
Analyzing Collaborative Learning Processes Automatically: Exploiting the Advances of Computational Linguistics in Computer-Supported . . .
- INTERNATIONAL JOURNAL OF COMPUTER-SUPPORTED COLLABORATIVE LEARNING
, 2008
"... In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a new publicly available tool set called TagHelper tools. Analyzing the variety of different ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a new publicly available tool set called TagHelper tools. Analyzing the variety of different facets of learners’ interaction that are important for their learning is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. It also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by scaffolding technology as in the emerging area of context sensitive collaborative learning support triggered dynamically on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL discourse corpus that had been analyzed by human coders using a theory-based multi-dimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools.
Improving “email speech acts” analysis via n-gram selection
, 2006
"... In email conversational analysis, it is often useful to trace the the intents behind each message exchange. In this paper, we consider classification of email messages as to whether or not they contain certain intents or email-acts, such as “propose a meeting ” or “commit to a task”. We demonstrate ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
In email conversational analysis, it is often useful to trace the the intents behind each message exchange. In this paper, we consider classification of email messages as to whether or not they contain certain intents or email-acts, such as “propose a meeting ” or “commit to a task”. We demonstrate that exploiting the contextual information in the messages can noticeably improve email-act classification. More specifically, we describe a combination of n-gram sequence features with careful message preprocessing that is highly effective for this task. Compared to a previous study (Cohen et al., 2004), this representation reduces the classification error rates by 26.4 % on average. Finally, we introduce Ciranda: a new open source toolkit for email speech act prediction. 1
A comparative study of methods for transductive transfer learning
- In ICDM Workshop on Mining and Management of Biological Data
, 2007
"... The problem of transfer learning, where information gained in one learning task is used to improve performance in another related task, is an important new area of research. In this paper we address the subproblem of domain adaptation, in which a model trained over a source domain is generalized to ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
The problem of transfer learning, where information gained in one learning task is used to improve performance in another related task, is an important new area of research. In this paper we address the subproblem of domain adaptation, in which a model trained over a source domain is generalized to perform well on a related target domain, where these two domains ’ data are distributed similarly, but not identically. Previous work has studied the supervised version of this problem in which labeled data from both source and target domains are available for training. In this work, however, we study the more challenging problem of unsupervised transductive transfer learning, where no labeled data from the target domain are available at training time, but instead, unlabeled target test data are available during training. We describe some current state-of-the-art inductive and transductive approaches involving three popular learning models, namely the maximum entropy, support vector machines and naive Bayes models. We then adapt these models to the problem of transfer learning for protein name extraction. In the process, we introduce a novel maximum entropy based technique, Iterative Feature Transformation (IFT), and show that it achieves comparable performance with state-of-the-art transductive SVMs. Finally, we compare the relative strengths and weaknesses of these models across the various learning settings, shedding light both on the algorithms examined and the difficulty of the respective problems. In addition, we show how simple relaxations, such as providing additional information like the proportion of positive examples in the test data, can significantly improve the performance of some of the transductive transfer learners. 1
RADAR: A Personal Assistant that Learns to Reduce Email Overload
"... Email client software is widely used for personal task management, a purpose for which it was not designed and is poorly suited. Past attempts to remedy the problem have focused on adding task management features to the client UI. RADAR uses an alternative approach modeled on a trusted human assista ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
Email client software is widely used for personal task management, a purpose for which it was not designed and is poorly suited. Past attempts to remedy the problem have focused on adding task management features to the client UI. RADAR uses an alternative approach modeled on a trusted human assistant who reads mail, identifies task-relevant message content, and helps manage and execute tasks. This paper describes the integration of diverse AI technologies and presents results from human evaluation studies comparing RADAR user performance to unaided COTS tool users and users partnered with a human assistant. As machine learning plays a central role in many system components, we also compare versions of RADAR with and without learning. Our tests show a clear advantage for
Sentiment Extraction From Unstructured Text Using Tabu Search-Enhanced Markov Blanket
- In Proceedings of the Workshop on Mining the Semantic Web, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2004
"... Abstract. Extracting sentiments from unstructured text has emerged as an important problem in many disciplines. An accurate method would enable us, for example, to mine on-line opinions from the Internet and learn customers ’ preferences for economic or marketing research, or for leveraging a strate ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Abstract. Extracting sentiments from unstructured text has emerged as an important problem in many disciplines. An accurate method would enable us, for example, to mine on-line opinions from the Internet and learn customers ’ preferences for economic or marketing research, or for leveraging a strategic advantage. In this paper, we propose a two-stage Bayesian algorithm that is able to capture the dependencies among words, and, at the same time, finds a vocabulary that is efficient for the purpose of extracting sentiments. Experimental results on the Movie Reviews data set show that our algorithm is able to select a parsimonious feature set with substantially fewer predictor variables than in the full data set and leads to better predictions about sentiment orientations than several state-of-the-art machine learning methods. Our findings suggest that sentiments are captured by conditional dependence relations
Sampling algorithms for pure network topologies
, 2005
"... In a time of information glut, observations about complex systems and phenomena of interest are available in several applications areas, such as biology and text. As a consequence, scientists have started searching for patterns that involve interactions among the objects of analysis, to the effect t ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
In a time of information glut, observations about complex systems and phenomena of interest are available in several applications areas, such as biology and text. As a consequence, scientists have started searching for patterns that involve interactions among the objects of analysis, to the effect that research on models and algorithms for network analysis has become a central theme for knowledge discovery and data mining (KDD). The intuitions behind the plethora of approaches rely upon few basic types of networks, identified by specific local and global topological properties, which we term “pure ” topology types. In this paper, (1) we survey pure topology types along with existing sampling algorithms that generate them, (2) we introduce novel algorithms that enhance the diversity of samples, and address the case of cellular topologies, (3) we perform statistical studies of the stability of the properties of pure types to alternative generative algorithms, and a joint study of the separability of pure types, in terms of their embedding in a space of metrics for network analysis, widely adopted in the social and physical sciences. We conclude with a word of caution to the practitioners, who sample pure topology types to assess the “statistical significance” of their findings, e.g., the p-value of the clustering coefficient is sensitive to the sampling algorithm used. We find that different pure types share similar topological properties. Further, real world networks hardly present the variability profile of a single pure type. We suggest the assumption of “mixtures of types ” as an alternative starting point for developing models and algorithms for network analysis.
Using Transactivity in Conversation for Summarization of Educational Dialogue
"... We present our ongoing work towards using the concept of transactivity [1] for automatically assessing learning of students working together in a collaborative setting. Transactive segments of student dialogue are proposed as useful components of conversation summaries generated for instructors. Exp ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We present our ongoing work towards using the concept of transactivity [1] for automatically assessing learning of students working together in a collaborative setting. Transactive segments of student dialogue are proposed as useful components of conversation summaries generated for instructors. Experimental evaluation of this hypothesis shows promising results. Further, initial results are presented for automatic identification of transactive contributions in student dialogue. Index Terms: transactivity, conversation summarization, educational dialogue

