Results 1 -
1 of
1
Applying Data Mining Techniques in Text Analysis
, 1997
"... Anumber of recent data mining techniques have been targeted especially for the analysis of sequential data. Traditional examples of sequential data involve telecommunication alarms, Www log les, user action registration for Hci studies, or any other series of events consisting ofanevent type and ati ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Anumber of recent data mining techniques have been targeted especially for the analysis of sequential data. Traditional examples of sequential data involve telecommunication alarms, Www log les, user action registration for Hci studies, or any other series of events consisting ofanevent type and atime of occurrence. Text can also be seen as sequential data, in many respects similar to the data collected by sensors, or other observation systems. Traditionally, texts have been analysed using various information retrieval related methods, such as full-text analysis, and natural language processing. However, only few examples of data mining in text, particularly in full text, are available. In this paper we show that general data mining methods are applicable to text analysis tasks under certain conditions. Moreover, we present a general framework for text mining. The framework follows the general Kdd process, thus containing steps from preprocessing tothe utilization of the results. The data mining method that weapply is based on generalized episodes and episode rules. We consider preprocessing ofthe text to beessentialintext mining: by shifting the focus in the preprocessing phase, data mining can be used to obtain results for various purposes. We give concrete examples of howto preprocess texts based on the intended use of the discovered results andhow to balance preprocessing with postprocessing. We also present example applications including search for key words, key phrases andother co-occurringwords, e.g. collocations and generalized concordances. These applications are both common and relevant tasks in information retrieval and natural language processing. We also present results from real-life data experiments to show that our approach isapplicable in practice.

