Results 1 - 2 of 2
Australasian Journal of Information Systems Special Issue 2003/2004 SUPPORTING TOPIC MAP CREATION USING DATA MINING TECHNIQUES
"... There is an increasing interest in automating creation of semantic structures, especially topic maps, by taking advantage of existing, structured information resources. This article gives a preview of the most popular method – based on RDF triples, and suggests a way to automate topic map creation f ..."
Abstract - Add to MetaCart
There is an increasing interest in automating creation of semantic structures, especially topic maps, by taking advantage of existing, structured information resources. This article gives a preview of the most popular method – based on RDF triples, and suggests a way to automate topic map creation from unstructured information sources. The method can be applied in information systems development domain when analysing vast unstructured data repositories in preparation for system design, or when migrating large amounts of unstructured data from legacy systems. There are two innovative methods presented in the paper – Term Crawling (TC) and Clustering Hierarchy Projection (CHP), which are applied to build a topic map based on free text documents from local repositories and those downloaded from the Internet. The methods originate from data mining techniques for knowledge discovery. A sample tool, which uses described techniques, has been implemented. The preliminary results that have been achieved on the test collection are presented in concluding sections of the article. BACKGROUND In today’s enterprises, documentation is spread throughout the whole organisation. The locations may include corporate portals, document management systems, users ’ private folders, web servers and many others. In order to provide employees with access to all relevant documents one should consider using a common data structure which is able to identify location of any (or almost any) document in the enterprise. One such structure, discussed in this paper, can be topic maps. However, introducing a new data structure imposes a requirement to enter data about all the documents which should be made available, and fill in all the necessary attributes. Due to heterogeneity of data sources, automated techniques are still more a concept than reality. On the other hand, just supporting of implementation may be helpful. When using topic maps as a data structure describing the whole document repository, one may apply concepts from the field of information retrieval. After providing background information, we further explore selected techniques and suggest an extension of some.