Results 1 - 10
of
44
Get out the vote: Determining support or opposition from Congressional floor-debate transcripts
- In Proceedings of EMNLP
, 2006
"... We investigate whether one can determine from the transcripts of U.S. Congressional floor debates whether the speeches represent support of or opposition to proposed legislation. To address this problem, we exploit the fact that these speeches occur as part of a discussion; this allows us to use sou ..."
Abstract
-
Cited by 56 (2 self)
- Add to MetaCart
We investigate whether one can determine from the transcripts of U.S. Congressional floor debates whether the speeches represent support of or opposition to proposed legislation. To address this problem, we exploit the fact that these speeches occur as part of a discussion; this allows us to use sources of information regarding relationships between discourse segments, such as whether a given utterance indicates agreement with the opinion expressed by another. We find that the incorporation of such information yields substantial improvements over classifying speeches in isolation. 1
Collective classification in network data
, 2008
"... Numerous real-world applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification te ..."
Abstract
-
Cited by 45 (17 self)
- Add to MetaCart
Numerous real-world applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification techniques to classify nodes in such data. In this report, we attempt to provide a brief introduction to this area of research and how it has progressed during the past decade. We introduce four of the most widely used inference algorithms for classifying networked data and empirically compare them on both synthetic and real-world data. 1
Contextual Search and Name Disambiguation in Email using Graphs
- SIGIR
, 2006
"... Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closely connected to other documents, as well as other non-textual objects: for instance, email messages are connected to other ..."
Abstract
-
Cited by 28 (10 self)
- Add to MetaCart
Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closely connected to other documents, as well as other non-textual objects: for instance, email messages are connected to other messages via header information. In this paper we consider extended similarity metrics for documents and other objects embedded in graphs, facilitated via a lazy graph walk. We provide a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a structural graph. The suggested framework is evaluated for two email-related problems: disambiguating names in email documents, and threading. We show that reranking schemes based on the graph-walk similarity measures often outperform baseline methods, and that further improvements can be obtained by use of appropriate learning methods.
Analyzing Collaborative Learning Processes Automatically: Exploiting the Advances of Computational Linguistics in Computer-Supported . . .
- INTERNATIONAL JOURNAL OF COMPUTER-SUPPORTED COLLABORATIVE LEARNING
, 2008
"... In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a new publicly available tool set called TagHelper tools. Analyzing the variety of different ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a new publicly available tool set called TagHelper tools. Analyzing the variety of different facets of learners’ interaction that are important for their learning is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. It also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by scaffolding technology as in the emerging area of context sensitive collaborative learning support triggered dynamically on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL discourse corpus that had been analyzed by human coders using a theory-based multi-dimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools.
Learning to Detect Conversation Focus of Threaded Discussions
- In Proceedings of HLTNAACL 2006
, 2006
"... In this paper we present a novel feature enriched approach that learns to detect the conversation focus of threaded discussions by combining NLP analysis and IR techniques. Using the graph-based algorithm HITS, we integrate different features such as lexical similarity, poster trustworthiness, and s ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
In this paper we present a novel feature enriched approach that learns to detect the conversation focus of threaded discussions by combining NLP analysis and IR techniques. Using the graph-based algorithm HITS, we integrate different features such as lexical similarity, poster trustworthiness, and speech act analysis of human conversations with feature oriented link generation functions. It is the first quantitative study to analyze human conversation focus in the context of online discussions that takes into account heterogeneous sources of evidence. Experimental results using a threaded discussion corpus from an undergraduate class show that it achieves significant performance improvements compared with the baseline system. 1
Email thread reassembly using similarity matching
- In Proc. of CEAS
, 2006
"... Email thread reassembly is the task of linking messages by parentchild relationships. In this paper, we present two approaches to address this problem. One exploits previously undocumented header information from the Microsoft Exchange Protocol. The other uses string similarity metrics and a heurist ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Email thread reassembly is the task of linking messages by parentchild relationships. In this paper, we present two approaches to address this problem. One exploits previously undocumented header information from the Microsoft Exchange Protocol. The other uses string similarity metrics and a heuristic algorithm to reassemble threads in the absence of header information. The pros and cons of both methods are discussed. The similarity matching method is evaluated using the Enron email corpus and found to perform well. 1.
Profiling Student Interactions in Threaded Discussions with Speech Act Classifiers, internal project report
, 2007
"... ..."
Mining and assessing discussions on the web through speech act analysis
- In Proceedings of the ISWC’06 Workshop on Web Content Mining with Human Language Technologies
, 2006
"... Abstract. Online discussion is a popular form of web-based computermediated communication, and is a dominant medium for cyber communities in areas of customer support and distributed education. Automatic tools for analyzing online discussions are highly desirable for better information management an ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Abstract. Online discussion is a popular form of web-based computermediated communication, and is a dominant medium for cyber communities in areas of customer support and distributed education. Automatic tools for analyzing online discussions are highly desirable for better information management and assistance. This paper describes an extensive study of “speech acts ” in discussions. We present an approach to classifying student discussions according to a set of speech act patterns and show how we use the patterns in assessing participant roles and identifying discussion threads that may have confusions and unanswered questions. We also show how speech act analysis can improve automatic question answering capabilities. This analysis of human conversation via online discussions provides a basis for the development of future information extraction and intelligent assistance techniques for online discussions.
A publicly available annotated corpus for supervised email summarization
, 2008
"... Annotated email corpora are necessary for evaluation and training of machine learning summarization techniques. The scarcity of corpora has been a limiting factor for research in this field. We describe our process of creating a new annotated email thread corpus that will be made publicly available. ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Annotated email corpora are necessary for evaluation and training of machine learning summarization techniques. The scarcity of corpora has been a limiting factor for research in this field. We describe our process of creating a new annotated email thread corpus that will be made publicly available. We present the trade-offs of the different annotation methods that could be used.

