Results 1 - 10
of
46
Mining Association Rules between Sets of Items in Large Databases
- IN: PROCEEDINGS OF THE 1993 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, WASHINGTON DC (USA
, 1993
"... We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel esti ..."
Abstract
-
Cited by 1953 (15 self)
- Add to MetaCart
We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the effectiveness of the algorithm.
Data Mining and Knowledge Discovery: A Review of Issues and a Multistrategy Approach
- MACHINE LEARNING AND DATA MINING: METHODS AND APPLICATIONS
, 1997
"... An enormous proliferation of databases in almost every area of human endeavor has created a great demand for new, powerful tools for turning data into useful, task-oriented knowledge. In efforts to satisfy this need, researchers have been exploring ideas and methods developed in machine learning, pa ..."
Abstract
-
Cited by 24 (12 self)
- Add to MetaCart
An enormous proliferation of databases in almost every area of human endeavor has created a great demand for new, powerful tools for turning data into useful, task-oriented knowledge. In efforts to satisfy this need, researchers have been exploring ideas and methods developed in machine learning, pattern recognition, statistical data analysis, data visualization, neural nets, etc. These efforts have led to the emergence of a new research area, frequently called data mining and knowledge discovery. The first part of this chapter is a compendium of ideas on the applicability of symbolic machine learning methods to this area. The second part describes a multistrategy methodology for conceptual data exploration, by which we mean the derivation of high-level concepts and descriptions from data through symbolic reasoning involving both data and background knowledge. The methodology, which has been implemented in the INLEN system, combines machine learning, database and knowledge-based techn...
Learning Qualitative Models of Dynamic Systems
- Machine Learning
, 1997
"... . The automated construction of dynamic system models is an important application area for ILP. We describe a method that learns qualitative models from time-varying physiological signals. The goal is to understand the complexity of the learning task when faced with numerical data, what signal proce ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
. The automated construction of dynamic system models is an important application area for ILP. We describe a method that learns qualitative models from time-varying physiological signals. The goal is to understand the complexity of the learning task when faced with numerical data, what signal processing techniques are required, and how this affects learning. The qualitative representation is based on Kuipers' Qsim. The learning algorithm for model construction is based on Coiera's Genmodel. We show that Qsim models are efficiently PAC learnable from positive examples only, and that Genmodel is an ILP algorithm for efficiently constructing a Qsim model. We describe both Genmodel which performs RLGG on qualitative states to learn a Qsim model, and the frontend processing and segmenting stages that transform a signal into a set of qualitative states. Next we describe results of experiments on data from six cardiac bypass patients. Useful models were obtained, representing both normal and abnormal physiological states. Model variation across time and across different levels of temporal abstraction and fault tolerance is explored. The assumption made by many previous workers that the abstraction of examples from data can be separated from the learning task is not supported by this study. Firstly, the effects of noise in the numerical data manifest themselves in the qualitative examples. Secondly, the models learned are directly dependent on the initial qualitative abstraction chosen.
Discovering Admissible Models of Complex Systems Based on Scale-Types and Identity Constraints
- In Proceedings of IJCAI'97, Vol.2
, 1997
"... SDS is a discovery system from numeric measurement data. It outperforms the existing systems in every aspect of search e ciency, noise tolerancy, credibility of the resulting equations and complexity of the target system that it can handle. The power of SDS comes from the use of the scale-types of t ..."
Abstract
-
Cited by 18 (7 self)
- Add to MetaCart
SDS is a discovery system from numeric measurement data. It outperforms the existing systems in every aspect of search e ciency, noise tolerancy, credibility of the resulting equations and complexity of the target system that it can handle. The power of SDS comes from the use of the scale-types of the measurement data and mathematical property of identity by which to constrain the admissible solutions. Its algorithm is described with a complex working example and the performance comparison with other systems are discussed. 1
Discovering interesting holes in data
- In Proceedings of IJCAI
, 1997
"... Current machine learning and discovery techniques focus on discovering rules or regularities that exist in data. An important aspect of the research that has been ignored in the past is the learning or discovering of interesting holes in the database. If we view each case in the database as a point ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Current machine learning and discovery techniques focus on discovering rules or regularities that exist in data. An important aspect of the research that has been ignored in the past is the learning or discovering of interesting holes in the database. If we view each case in the database as a point in a it-dimensional space, then a hole is simply a region in the space that contains no data point. Clearly, not every hole is interesting. Some holes are obvious because it is known that certain value combinations are not possible. Some holes exist because there are insufficient cases in the database. However, in some situations, empty regions do carry important information. For instance, they could warn us about some missing value combinations that are either not known before or are unexpected. Knowing these missing value combinations may lead to significant discoveries. In this paper, we propose an algorithm to discover holes in databases. 1
Discovering and Reconciling Value Conflicts for Numerical Data Integration
, 2001
"... The built-up in Information Technology capital fueled by the Internet and cost-e#ectiveness of new telecommunications technologies has led to a proliferation of information systems that are in dire need to exchange information but incapable of doing so due to the lack of semantic interoperability. I ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
The built-up in Information Technology capital fueled by the Internet and cost-e#ectiveness of new telecommunications technologies has led to a proliferation of information systems that are in dire need to exchange information but incapable of doing so due to the lack of semantic interoperability. It is now evident that physical connectivity (the ability to exchange bits and bytes) is no longer adequate: the integration of data from autonomous and heterogeneous systems calls for the prior identification and resolution of semantic conflicts that may be present. Unfortunately, this requires the system integrator to sift through the data from disparate systems in a painstaking manner. We suggest that this process can be partially automated by presenting a methodology and technique for the discovery of potential semantic conflicts as well as the underlying data transformation needed to resolve the conflicts. Our methodology begins by classifying data value conflicts into two categories: context independent and context dependent. While context independent conflicts are usually caused by unexpected errors, the context dependent conflicts are primarily a result of the heterogeneity of underlying data sources. To facilitate data integration, data value conversion rules are proposed to describe the quantitative relationships among data values involving context dependent conflicts. A general approach is proposed to discover data value conversion rules from the data. The approach consists of the five major steps: relevant attribute analysis, candidate model selection, conversion function generation, conversion function selection and conversion rule formation. It is being implemented in a prototype system, DIRECT, for business data using statistics based techniques. Preliminary stu...
Robot Programming by Demonstration (RPD) - Using Machine Learning and User Interaction Methods for the Development of Easy and Comfortable Robot Programming Systems
- In Proceedings of the 24th International Symposium on Industrial Robots
, 1994
"... Robot Programming by Demonstration is an intuitive method to program a robot. The programmer shows how a particular task is performed, using an interface device that allows the measurement and recording of the human's motions and other parameters that are relevant to perform the demonstrated task. T ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Robot Programming by Demonstration is an intuitive method to program a robot. The programmer shows how a particular task is performed, using an interface device that allows the measurement and recording of the human's motions and other parameters that are relevant to perform the demonstrated task. This paper presents an analysis of the learning and interaction requirements that are characteristic for an RPD system. Based on these requirements, a new system architecture is proposed that supports all phases of the interactive programming process. For an example task, experimental results are given. Keywords: Programming by Demonstration, Man-Machine Interface, Machine Learning 1. INTRODUCTION Due to the costs involved in the development and maintenance of robot programs, automatic programming techniques and Programming by (Human) Demonstration (PbD) are currently attracting a lot of interest (e. g. [8], [15], [18]). Moreover, learning capabilities are becoming continuously attractive for...
Discovering and Reconciling Semantic Conflicts: A Data Mining Perspective
, 1997
"... Current approaches to semantic interoperability require human intervention in detecting potential con#icts and in defining how those con#icts maybe resolved. This is a major impedance to achieving "logical connectivity", especially when the number of disparate sources is large. In this paper, we dem ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Current approaches to semantic interoperability require human intervention in detecting potential con#icts and in defining how those con#icts maybe resolved. This is a major impedance to achieving "logical connectivity", especially when the number of disparate sources is large. In this paper, we demonstrate that the detection and reconciliation of semantic conflicts can be automated using tools and techniques developed by the data mining community. We describe a process for discovering such rules that represent the relationships among semanticaly related attributes and illustrate the effectiveness of our approach with examples. Keywords Database Integration, Data Mining, Regression Analysis, Semantic Conflicts 1 INTRODUCTION Avariety of online information sources and receivers #i.e., users and applications has emerged at an unprecedented rate in the last few years, contributed in large part by the exponential growth of the Internet as well as advances in telecommunications technolog...
Law discovery using neural networks
- Proceedings of the Fifteenth International Joint Conference on Arti Intelligence
, 1997
"... This paper proposes a new connectionist approach to numeric law discovery; i.e., neural networks (law-candidates) are trained by using a newly invented second-order learning algorithm based on a quasi-Newton method, called BPQ, and the MDL criterion selects the most suitable from law-candidates. The ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
This paper proposes a new connectionist approach to numeric law discovery; i.e., neural networks (law-candidates) are trained by using a newly invented second-order learning algorithm based on a quasi-Newton method, called BPQ, and the MDL criterion selects the most suitable from law-candidates. The main advantage of our method over previous work of symbolic or connectionist approach is that it can efficiently discover numeric laws whose power values are not restricted to integers. Experiments showed that the proposed method works well in discovering such laws even from data containing irrelevant variables or a small amount of noise. 1.
Non-Linear Decision Trees -- NDT
- IN INT. CONF. ON MACHINE LEARNING
, 1996
"... Most decision tree algorithms focus on univariate, i.e. axis-parallel tests at each internal node of a tree. Oblique decision trees use multivariate linear tests at each non-leaf node. This paper reports a novel approach to the construction of non-linear decision trees. The crux of this method ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Most decision tree algorithms focus on univariate, i.e. axis-parallel tests at each internal node of a tree. Oblique decision trees use multivariate linear tests at each non-leaf node. This paper reports a novel approach to the construction of non-linear decision trees. The crux of this method consists of the generation of new features and the augmentation of the primitive features with these new ones. The resulted non-linear decision trees are more accurate than their axis-parallel or oblique counterparts. Experiments on several artificial and real-world data sets demonstrate this property.

