Results 11 - 20
of
20
An Application of KEFIR to the Analysis of Healthcare Information
, 1994
"... The Key Findings Reporter (KEFIR) is a system for discovering and explaining "key findings" in large, relational databases. This paper describes an application of KEFIR to the analysis of health-care information. The system performs an automatic analysis of data along multiple dimensions to determi ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
The Key Findings Reporter (KEFIR) is a system for discovering and explaining "key findings" in large, relational databases. This paper describes an application of KEFIR to the analysis of health-care information. The system performs an automatic analysis of data along multiple dimensions to determine the most interesting deviations of specific quantitative measures relative to norms and previous values. It explains key findings through their relationship to other findings in the data, and, where possible, generates simple recommendations for correcting detected problems. A final written report, complete with business graphics, is produced for viewing remotely over the internet with Mosaic, or for printing to hardcopy. Keywords: knowledge discovery, databases, health care 1 Introduction Knowledge discovery techniques are being used successfully today to analyze and explore large databases in numerous scientific, financial, and manufacturing domains [PiatetskyShapiro, 1993, Matheus et ...
Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Document Collections
- IN PROCEEDINGS OF THE IEEE FORUM ON RESEARCH AND TECHNOLOGY ADVANCES IN DIGITAL LIBRARIES
, 1998
"... Traditionally, texts have been analysed using various information retrieval related methods, such as full-text analysis, and natural language processing. However, only few examples of data mining in text, particularly in full text, are available. In this paper we show that general data mining metho ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Traditionally, texts have been analysed using various information retrieval related methods, such as full-text analysis, and natural language processing. However, only few examples of data mining in text, particularly in full text, are available. In this paper we show that general data mining methods are applicable to text analysis tasks such as descriptive phrase extraction. Moreover, we present a general framework for text mining. The framework follows the general knowledge discovery process, thus containing steps from preprocessing to the utilization of the results. The data mining method that we apply is based on generalized episodes and episode rules. We give concrete examples of how to preprocess texts based on the intended use of the discovered results and we introduce a weighting scheme that helps in pruning out redundant or non-descriptive phrases. We also present results from real-life data experiments.
Multiple Predicate Learning in Two Inductive Logic Programming Settings
, 1996
"... Inductive logic programming (ILP) is a research area which has its roots in inductive machine learning and computational logic. The paper gives an introduction to this area based on a distinction between two different semantics used in inductive logic programming, and illustrates their application i ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Inductive logic programming (ILP) is a research area which has its roots in inductive machine learning and computational logic. The paper gives an introduction to this area based on a distinction between two different semantics used in inductive logic programming, and illustrates their application in knowledge discovery and programming. Whereas most research in inductive logic programming has focussed on learning single predicates from given datasets using the normal ILP semantics (e.g. the well known ILP systems GOLEM and FOIL), the paper investigates also the non-monotonic ILP semantics and the learning problems involving multiple predicates. The non-monotonic ILP setting avoids the order dependency problem of the normal setting when learning multiple predicates, extends the representation of the induced hypotheses to full clausal logic, and can be applied to different types of application. Keywords: inductive logic programming, induction, logic programming, machine learning 1 Intro...
Attribute Similarity and Event Sequence Similarity in Data Mining
, 1998
"... In data mining and knowledge discovery, similarity between objects is one of the central concepts. A measure of similarity can be user-defined, but an important problem is defining similarity on the basis of data. In this thesis we consider two kinds of similarity notions: similarity between binary ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In data mining and knowledge discovery, similarity between objects is one of the central concepts. A measure of similarity can be user-defined, but an important problem is defining similarity on the basis of data. In this thesis we consider two kinds of similarity notions: similarity between binary valued attributes and between event sequences. Traditional approaches for defining similarity between two attributes typically consider only the values of those two attributes, not the values of any other attributes in the relation. Such similarity measures are often useful, but unfortunately, they cannot reflect certain kinds of similarity. Therefore, we introduce a new attribute similarity measure that takes into account the values of the other attributes. The behavior of the different measures of attribute similarity is demonstrated by giving empirical results on two real-life data sets. We also present a simple model for defining similarity between event sequences. The model is based on ...
Understanding complex systems through examples: A framework for qualitative example finding
- Kingston University
, 2000
"... Many complex systems have the characteristic that we can classify objects in the system in some way, but that these classi cations are distributed through a parameter space in some complex fashion. In order for a human to get an understanding of the system, we would like to present this user with on ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Many complex systems have the characteristic that we can classify objects in the system in some way, but that these classi cations are distributed through a parameter space in some complex fashion. In order for a human to get an understanding of the system, we would like to present this user with one example of an object for each class. Examples of such problems can be found in information retrieval, bioinformatics, computational geometry, computer-aided design, software testing and cellular automata. In this paper we will show how problems in all these areas can be put into a general framework of nding qualitative examples, and argue that general heuristic approaches to this type of problem are an important and neglected area of machine learning. We contrast this with some other well-studied problems, showing how this problem is distinct and investigating what we can learn from these problems. We then discuss some of the requirements for a heuristic to solve these problems,...
Data Summarization with Linguistic Labels: A Loss Less Decomposition Approach
- In IFSA
, 1997
"... . In this work we accomplish the problem of applying the concept of fuzzy dependency to data summarization in databases. We recover the definition introduced by the authors in previous works which was used to fuzzify the concept of classical dependency, maintaining properties like soundness and comp ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
. In this work we accomplish the problem of applying the concept of fuzzy dependency to data summarization in databases. We recover the definition introduced by the authors in previous works which was used to fuzzify the concept of classical dependency, maintaining properties like soundness and completeness of Armstrong Axioms. Now we relax it in such a way that we can apply it to compress the database in more cases than previously, although some properties like transitivity will be lost. Keywords: Data Summarization, Loss Less Decomposition 1 Preliminaries There is a growing interest to extract general knowledge from a database, and techniques of knowledge discovery in databases are recently been developed. These are learning procedures which helps to find some connections among attributes, for instance, 90 percent of the consumers of product A, is a teenager with high studies ( [10, 8]). On the other hand, data compression [7, 12] is also an important issue in large databases. It ...
Data Mining With an Evolving Population of Database Queries
- In: Proceedings of MENDEL'95 - the International Conference on Genetic Algorithms, Brno (CR
, 1996
"... With the increasing size and complexity of modern database systems the need for sophisticated methods of information retrieval is essential. Many institutions are ignorant of vital information stored in their databases because the query that would produce it has not been run. The goal of all data mi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
With the increasing size and complexity of modern database systems the need for sophisticated methods of information retrieval is essential. Many institutions are ignorant of vital information stored in their databases because the query that would produce it has not been run. The goal of all data mining systems is to uncover significant information from a database without the need for an explicit request. This paper describes work done to perform data mining by using techniques borrowed from Genetic Programming to evolve a population of relational database queries, such that the system generates increasingly significant information. The resulting program has been tested with a selection of standard machine learning problems and appears to work well, outperforming many of the standard algorithms. The system described produces SQL queries for database information stored within an Ingres DBMS and the techniques used are applicable to any relational database. 1 Introduction Data mining co...
Possibilistic Conditional Independence: a similarity-based measure and its application to causal network learning
, 1994
"... A definition for similarity between possibility distributions is introduced and discussed as a basis for detecting dependence between variables by measuring the similarity degree of their respective distributions. This definition is used to detect conditional independence relations in possibility di ..."
Abstract
- Add to MetaCart
A definition for similarity between possibility distributions is introduced and discussed as a basis for detecting dependence between variables by measuring the similarity degree of their respective distributions. This definition is used to detect conditional independence relations in possibility distributions derived from data. This is the basis for a new hybrid algorithm for recovering possibilistic causal networks. The algorithm POSSCAUSE is presented and its applications discussed and compared with analogous developments in possibilistic and probabilistic causal networks learning. 1 This work has been supported by project CICYT-TIC960878 of the Spanish Science and Technology Commission Address correspondence to Dept. Llenguatges i Sistemes Inform'atics Technical University of Catalonia Campus Nord, M'odul C5, Despatx 205 Gran Capit'an, s /n 08028 Barcelona SPAIN International Journal of Approximate Reasoning 1994 11:1--158 c fl 1994 Elsevier Science Inc. 655 Avenue of th...
Bulletin of the Technical Committee on
"... this paper, as in [9], we draw a distinction between the latter, which we call KDD, and "data mining". The term data mining has been mostly used by statisticians, data analysts, and the database communities. The earliest uses of the term come from statistics and its usage in most settings was associ ..."
Abstract
- Add to MetaCart
this paper, as in [9], we draw a distinction between the latter, which we call KDD, and "data mining". The term data mining has been mostly used by statisticians, data analysts, and the database communities. The earliest uses of the term come from statistics and its usage in most settings was associated with negative connotations of blind exploration of data without a priori hypotheses to be verified. However, notable exceptions can be found. For example, as early as 1978 [16], the term is used in a positive sense in a demonstration of how generalized linear regression can be used to solve problems that are very difficult for humans and the traditional statistical techniques
20 Selecting and Reporting What is Interesting: The KEFIR Application to Healthcare Data
"... Information by itself is a pretty thin meal, if not mixed with other ingredients. { Internet quote One of the most promising areas in Knowledge Discovery in Databases is the automatic analysis of deviations. Success in this task hinges on the ability to identify a few important and relevant events a ..."
Abstract
- Add to MetaCart
Information by itself is a pretty thin meal, if not mixed with other ingredients. { Internet quote One of the most promising areas in Knowledge Discovery in Databases is the automatic analysis of deviations. Success in this task hinges on the ability to identify a few important and relevant events among the multitude of potentially interesting deviations. In this chapter we present our approach to determining the interestingness of a deviation via the potential bene t from a relevant action. This approach has been implemented in the Key Findings Reporter (KEFIR), a system for discovering and explaining \key ndings " in large, changing databases, currently being applied to the analysis of healthcare data. The system performs an automatic drill-down through data along multiple dimensions to determine the most interesting deviations of speci c quantitative measures relative to their previous and expected values. It explains \key" deviations through their relationship to other deviations in the data, and, where appropriate, generates recommendations for actions in response to these deviations. KEFIR uses Mosaic, a WWW browser, to present its ndings in a hypertext report, using natural language and

