Results 1 -
7 of
7
Spanned Patterns for the Logical Analysis of Data
, 2002
"... In a finite dataset consisting of positive and negative observations represented as real valued n-vectors, a positive (negative) pattern is an interval in R with the property that it contains sufficiently many positive (negative) observations, and sufficiently few negative (positive) ones. A patt ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
In a finite dataset consisting of positive and negative observations represented as real valued n-vectors, a positive (negative) pattern is an interval in R with the property that it contains sufficiently many positive (negative) observations, and sufficiently few negative (positive) ones. A pattern is spanned if it does not include properly any other interval containing the same set of observations. Although large collections of spanned patterns can provide highly accurate classification models within the framework of the Logical Analysis of Data, no efficient method for their generation is currently known. We propose in this paper an incrementally polynomial time algorithm for the generation of all spanned patterns in a dataset, which runs in linear time in the output; the algorithm resembles closely the Blake and Quine consensus method for finding the prime implicants of Boolean functions. The efficiency of the proposed algorithm is tested on various publicly available datasets. In the last part of the paper, we present the results of a series of computational experiments which show the high degree of robustness of spanned patterns.
Pattern-based clustering and attribute analysis
- RUTCOR Research
, 2003
"... Abstract. The Logical Analysis of Data (LAD) is a combinatorics, optimization and logic based methodology for the analysis of datasets with binary or numerical input variables, and binary outcomes. It has been established in previous studies that LAD provides a competitive classification tool compar ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract. The Logical Analysis of Data (LAD) is a combinatorics, optimization and logic based methodology for the analysis of datasets with binary or numerical input variables, and binary outcomes. It has been established in previous studies that LAD provides a competitive classification tool comparable in efficiency with the top classification techniques available. The goal of this paper is to show that the methodology of LAD can be useful in the discovery of new classes of observations and in the analysis of attributes. After a brief description of the main concepts of LAD, two efficient combinatorial algorithms are described for the generation of all prime, respectively all spanned, patterns (rules) satisfying certain conditions. It is shown that the application of classic clustering techniques to the set of observations represented in prime pattern space leads to the identification of a subclass of, say positive, observations, which is accurately recognizable, and is sharply distinct from the observations in the opposite, say negative, class. It is also shown that the set of all spanned patterns allows the introduction of a measure of significance and of a concept of monotonicity in the set of attributes. Acknowledgements: The partial support provided by ONR grant N00014-92-J-1375 and DIMACS is gratefully acknowledged. 1.
Comprehensive vs. Comprehensible Classifiers in Logical Analysis of Data
- RUTCOR Research Report, RRR 9-2002; DIMACS Technical Report 2002-49; Annals of Operations Research (in print
, 2002
"... The main objective of this paper is to compare the classification accuracy provided large, comprehensive collections of patterns (rules) derived from archives of past observations, with that provided by small, comprehensible collections of patterns. This comparison is carried out here on the basi ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The main objective of this paper is to compare the classification accuracy provided large, comprehensive collections of patterns (rules) derived from archives of past observations, with that provided by small, comprehensible collections of patterns. This comparison is carried out here on the basis of an empirical study, using several publicly available datasets. The results of this study show that the use of comprehensive collections allows a slight increase of classification accuracy, and that the "cost of comprehensibility" is small.
PATTERN-BASED FEATURE SELECTION IN GENOMICS AND PROTEOMICS
, 2003
"... Abstract. A major difficulty in data analysis is due to the size of the datasets, which contain frequently large numbers of irrelevant or redundant variables. In particular, in some of the most rapidly developing areas of bioinformatics, e.g., genomics and proteomics, the expressions of the intensit ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. A major difficulty in data analysis is due to the size of the datasets, which contain frequently large numbers of irrelevant or redundant variables. In particular, in some of the most rapidly developing areas of bioinformatics, e.g., genomics and proteomics, the expressions of the intensity levels of tens of thousands of genes or proteins are reported for each observation, in spite of the fact that very small subsets of these features are sufficient for distinguishing positive observations from negative ones. In this study, we describe a two-step procedure for feature selection. In a first “filtering ” stage, a relatively small subset of relevant features is identified on the basis of several combinatorial, statistical, and information-theoretical criteria. In the second stage, the importance of variables selected in the first step is evaluated based on the frequency of their participation in the set of all maximal patterns (defined as in the Logical Analysis of Data, and generated using an efficient, total-polynomial time algorithm), and low impact variables are eliminated. This step is applied iteratively, until arriving to a Pareto-optimal “support set”, which balances the conflicting criteria of simplicity and accuracy.
From Data Mining to Knowledge Mining
"... In view of the tremendous production of computer data worldwide, there is a strong need for new powerful tools that can automatically generate useful knowledge from a variety of data, and present it in human-oriented forms. In efforts to satisfy this need, researchers have been exploring ideas and m ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In view of the tremendous production of computer data worldwide, there is a strong need for new powerful tools that can automatically generate useful knowledge from a variety of data, and present it in human-oriented forms. In efforts to satisfy this need, researchers have been exploring ideas and methods developed in machine learning, statistical data analysis, data mining, text mining, data visualization, pattern recognition, etc. The first part of this chapter is a compendium of ideas on the applicability of symbolic machine learning and logical data analysis methods toward this goal. The second part outlines a multistrategy methodology for an emerging research direction, called knowledge mining, by which we mean the derivation of high-level concepts and descriptions from data through symbolic reasoning involving both data and relevant background knowledge. The effective use of background as well as previously created knowledge in reasoning about new data makes it possible for the knowledge mining system to derive useful new knowledge not only from large amounts of data, but also from limited and weakly relevant data. 1
LOGICAL ANALYSIS OF COMPUTED TOMOGRAPHY
, 2004
"... Abstract. The aim of this paper is to analyze computed tomography (CT) data by using the Logical Analysis of Data (LAD) methodology in order to distinguish between three types of idiopathic interstitial pneumonias (IIPs). The paper demonstrates that LAD can distinguish with high accuracy different f ..."
Abstract
- Add to MetaCart
Abstract. The aim of this paper is to analyze computed tomography (CT) data by using the Logical Analysis of Data (LAD) methodology in order to distinguish between three types of idiopathic interstitial pneumonias (IIPs). The paper demonstrates that LAD can distinguish with high accuracy different forms (IPF, NSIP and DIP) of IIPs. It shows also that the patterns developed by LAD techniques provide additional information about outliers, redundant features, the relative significance of attributes, and makes possible the identification of promoters and blockers of various forms of IIPs.

