A fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time. E-mail and news articles are two natural examples of such streams, each characterized by topics that appear, grow in intensity for a period of time, and then fade away. The published literature in a particular research field can be seen to exhibit similar phenomena over a much longer time scale. Underlying much of the text mining work in this area is the following intuitive premise --- that the appearance of a topic in a document stream is signaled by a "burst of activity," with certain features rising sharply in frequency as the topic emerges.
|
2372
|
A tutorial on hidden Markov Models and selected applications in speech recognition
– Rabiner
- 1989
|
|
858
|
Agents that reduce work and information overload
– Maes
- 1994
|
|
828
|
Attention, intentions, and the structure of discourse
– Grosz, Sidner
- 1986
|
|
783
|
Applied Cryptography
– Schneier
- 1994
|
|
707
|
Mining sequential patterns
– Agrawal, Srikant
|
|
540
|
Nearest neighbor pattern classification
– Cover, Hart
- 1967
|
|
295
|
Stochastic Theory of a Data-Handling System with Multiple Sources
– Anick, Mitra, et al.
- 1982
|
|
241
|
Effective bandwidth of general Markovian traffic sources and admission control of high speed networks
– Elwalid, Mitra
- 1993
|
|
236
|
A Bayesian approach to filtering junk e-mail
– Sahami, Dumais, et al.
- 1998
|
|
210
|
Email Overload: Exploring Personal Information Management of Email
– Whittaker, Sidner
- 1997
|
|
199
|
Principles of mixed-initiative user interfaces
– Horvitz
- 1999
|
|
196
|
Discovering frequent episodes in sequences
– Mannila, Toivonen, et al.
- 1995
|
|
138
|
The hierarchical hidden Markov model: Analysis and applications,” Machine Learning 32(1998
– Fine, Singer, et al.
|
|
135
|
Statistical models for text segmentation
– Beeferman, Berger, et al.
- 1999
|
|
126
|
Learning rules that classify e-mail
– Cohen
- 1996
|
|
105
|
The Analysis of Time Series: An Introduction
– Chatfield
- 1984
|
|
79
|
On-line new event detection and tracking
– Allan, Papka, et al.
- 1998
|
|
78
|
A probabilistic approach to fast pattern matching in time series databases
– Keogh, Smyth
- 1997
|
|
76
|
Notes on effective bandwidths,” in Stochastic Networks: Theory and Applications
– Kelly
- 1996
|
|
70
|
MailCat: An intelligent assistant for organizing e-mail
– Segal, Kephart
- 1999
|
|
60
|
Concept features in Re:Agent, an intelligent e-mail agent
– Boone
- 1998
|
|
56
|
Event detection from time series data
– GURALNIK, J
- 1999
|
|
54
|
Linear time inference in hierarchical HMMs
– Murphy, Paskin
|
|
52
|
Story and Discourse: Narrative Structure in Fiction and Film, Cornell Univ
– Chatman
- 1978
|
|
50
|
Threading electronic mail: A preliminary study
– Lewis, Knowles
- 1997
|
|
46
|
Finding a Happy Medium: Explaining the Negative Effects of Electronic Communication on Social Life at
– Markus
- 1996
|
|
41
|
Interface Agents that Learn: An Investigation of Learning Issues in a Mail Agent Interface. Submitted to Applied
– Payne, Edwards
- 1995
|
|
31
|
Automatic generation of overview timelines
– Swan, Allan
- 2000
|
|
28
|
ifile: An Application of Machine Learning to E-Mail Filtering
– Rennie
- 2000
|
|
28
|
TimeMines: Constructing Timelines with Statistical Models of Word Usage
– Swan, Jensen
- 2000
|
|
25
|
ThemeRiver: Visualizing Theme Changes over Time
– Havre, Hetzler, et al.
|
|
24
|
A rule-based message filtering system
– Pollock
- 1988
|
|
24
|
Improving text categorization methods for event tracking
– Yang, Ault, et al.
- 2000
|
|
21
|
Extracting significant time varying features from text
– Swan, Allan
- 1999
|
|
17
|
The effect of wireless computing in collaborative learning environments
– Gay, Stefanone, et al.
|
|
16
|
Topic islands: A wavelet-based text visualization system
– Miller, Wong, et al.
- 1998
|
|
13
|
Ishmail: Immediate identification of important information
– Helfman, Isbell
- 1995
|
|
11
|
Visualizing sequential patterns for text mining
– Wong, Cowley, et al.
|
|
10
|
Fitting segmented curves whose join points have to be estimated
– Hudson
- 1966
|
|
10
|
Structural Processing of Waveforms as Trees
– Shaw, DeFigueiredo
|
|
8
|
Topic Detection and Tracking Pilot Study
– Allan, Carbonell, et al.
- 1998
|
|
8
|
Point estimation of the parameters of piecewise regression models
– Hawkins
- 1976
|
|
7
|
E-mail: The good, the bad, and the ugly
– Berghel
- 1997
|
|
7
|
Representation of Random Waveforms by Relational Trees
– Ehrich, Foith
|
|
6
|
Mining Segment-Wise Periodic Patterns
– Han, Gong, et al.
- 1998
|
|
6
|
E-mail and potential loss to future archives and scholarship, or, The dog that didn’t bark,” First Monday 4(9
– Lukesh
- 1999
|
|
5
|
Finding simple intensity descriptions from event sequence data
– Mannila, Salmenkivi
- 2001
|
|
5
|
Data mining for unusual movements in temporal data
– Martin, Yohai
- 2001
|
|
5
|
Bayesian Methods and Extensions for the Two State Markov Modulated Poisson Process
– Scott
- 1998
|
|
2
|
Mail-by-Example: A visual query interface for managing large volumes of electronic messages
– Becker, Cardoso
- 2000
|