BibTeX
@MISC{Núñez_distributeddata,
author = {Marlon Núñez},
title = {Distributed Data Mining for Learning Predictive Knowledge},
year = {}
}
OpenURL
Abstract
We propose a distributed data mining method that learns to predict future events. The learnt predictive knowledge is generalized from the experiences of multiple situations in parallel and used as a single knowledge source for all situations. A distributed architecture is the most suitable environment to perform this data mining method, called DBPL. The proposed system is a new and distributed version of an existing system called BPL (Núñez, 2000) that was originally implemented in a single system and applied in several fields, like communication networks for predicting faults (Núñez et al, 2002) and space weather for predicting solar flares (Núñez, et al, 2005). We plan to implement this data mining system in a distributed infrastructure. The grid architecture seems to be the most appropriate one for processing the knowledge discovery process. Several types of jobs may be sent to the grid network: collectors, learners and users:- The Collectors detect events from one or several situations. Collectors analyze the input events for building summaries at specific times. Each summary, called a behavior summary, has a numeric class that is the temporal distance from the moment of the summary to the occurrence of a target event. These summaries are sent to the learners.- The Learners take the behavior summaries from all situation collectors, process them as training examples for constructing regression trees for each target event to be predicted and translate them into behavior trees. Behavior trees are sent to the distributed network as a repository of the learnt knowledge.- The Users detect their own events in real-time and make use of the behavior trees for predicting future events. Every user takes benefit from the experiences of all the monitored situations.
Keyphrases
learning predictive knowledge data mining behavior tree future event data mining method behavior summary target event single system regression tree data mining system solar flare proposed system distributed infrastructure single knowledge source temporal distance learnt predictive knowledge communication network distributed architecture several situation numeric class space weather learnt knowledge several type grid architecture specific time suitable environment training example building summary grid network multiple situation monitored situation situation collector input event distributed version knowledge discovery process several field distributed network