Results 1 -
6 of
6
Learning to Extract Signature and Reply Lines from Email
- IN PROCEEDINGS OF THE CONFERENCE ON EMAIL AND ANTI-SPAM
, 2004
"... We describe methods for automatically identifying signature blocks and reply lines in plaintext email messages. This analysis has many potential applications, such as preprocessing email for text-to-speech systems ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
We describe methods for automatically identifying signature blocks and reply lines in plaintext email messages. This analysis has many potential applications, such as preprocessing email for text-to-speech systems
Deriving marketing intelligence from online discussion
- In KDD
, 2005
"... Weblogs and message boards provide online forums for discussion that record the voice of the public. Woven into this mass of discussion is a wide range of opinion and commentary about consumer products. This presents an opportunity for companies to understand and respond to the consumer by analyzing ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
Weblogs and message boards provide online forums for discussion that record the voice of the public. Woven into this mass of discussion is a wide range of opinion and commentary about consumer products. This presents an opportunity for companies to understand and respond to the consumer by analyzing this unsolicited feedback. Given the volume, format and content of the data, the appropriate approach to understand this data is to use large-scale web and text data mining technologies. This paper argues that applications for mining large volumes of textual data for marketing intelligence should provide two key elements: a suite of powerful mining and visualization technologies and an interactive analysis environment which allows for rapid generation and testing of hypotheses. This paper presents such a system that gathers and annotates online discussion relating to consumer products using a wide variety of state-of-the-art techniques, including crawling, wrapping, search, text classification and computational linguistics. Marketing intelligence is derived through an interactive analysis framework uniquely configured to leverage the connectivity and content of annotated online discussion.
Combining Visual Layout and Lexical Cohesion Features for Text Segmentation
, 2001
"... We propose integrating features from lexical cohesion with elements from layout recognition to build a composite framework. We use supervised machine learning on this composite feature set to derive discourse structure on the topic level. We demonstrate a system based on this principle and use both ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We propose integrating features from lexical cohesion with elements from layout recognition to build a composite framework. We use supervised machine learning on this composite feature set to derive discourse structure on the topic level. We demonstrate a system based on this principle and use both an intrinsic evaluation as well as the task of genre classication to assess its performance. 2 Introduction A document structure tree 1 can be dened as a data structure that allows navigation of a document by sections. These trees can be hierarchically organized, having subsections of sections and may embed special items, such as gures, tables or hyperlinks. They may be used directly by an end user for document access, or indirectly through other applications. This paper describes a strategy to compute document structure using a framework that deals both with rich, semi-structured documents with layout features as well as impoverished, text stream-like documents. Our system, the Comb...
Treetables and Other Visualizations for Email Threads. Xerox PARC
, 2001
"... In this paper, we describe some new visualization methods for email threads. The methods concatenate initial message texts, or full texts shorn of extraneous material, into logical groupings embedded in, or closely aligned with, thread structure representations. The results are intended to provide u ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper, we describe some new visualization methods for email threads. The methods concatenate initial message texts, or full texts shorn of extraneous material, into logical groupings embedded in, or closely aligned with, thread structure representations. The results are intended to provide useful thread overviews, and to enable coherent, efficient reading of thread content. Keywords Email archives, email threads, computer mediated communication, tree visualization, trees, narrow trees, treetables, email message analysis, threading, overview + detail, focus + context
Segmenting Email Message Text into Zones
"... In the early days of email, widely-used conventions for indicating quoted reply content and email signatures made it easy to segment email messages into their functional parts. Today, the explosion of different email formats and styles, coupled with the ad hoc ways in which people vary the structure ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In the early days of email, widely-used conventions for indicating quoted reply content and email signatures made it easy to segment email messages into their functional parts. Today, the explosion of different email formats and styles, coupled with the ad hoc ways in which people vary the structure and layout of their messages, means that simple techniques for identifying quoted replies that used to yield 95% accuracy now find less than 10 % of such content. In this paper, we describe Zebra, an SVM-based system for segmenting the body text of email messages into nine zone types based on graphic, orthographic and lexical cues. Zebra performs this task with an accuracy of 87.01%; when the number of zones is abstracted to two or three zone classes, this increases to 93.60 % and 91.53 % respectively. 1
Learning to Extract Signature and Reply Lines from Email
- in Proceedings of the Conference on Email and Anti-Spam
, 2004
"... We describe methods for automatically identifying signature blocks and reply lines in plaintext email messages. This analysis has many potential applications, such as preprocessing email for text-to-speech systems ..."
Abstract
- Add to MetaCart
We describe methods for automatically identifying signature blocks and reply lines in plaintext email messages. This analysis has many potential applications, such as preprocessing email for text-to-speech systems

