Results 1  10
of
19
Hybrid ruleextraction from support vector machines
 in Proc. of IEEE conference on cybernetics and intelligent systems
, 2004
"... Abstract — Support vector machines (SVMs) have shown superior performance compared to other machine learning techniques, especially in classification problems. Yet one limitation of SVMs is the lack of an explanation capability which is crucial in some applications, e.g. in the medical and security ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
(Show Context)
Abstract — Support vector machines (SVMs) have shown superior performance compared to other machine learning techniques, especially in classification problems. Yet one limitation of SVMs is the lack of an explanation capability which is crucial in some applications, e.g. in the medical and security domains. In this paper, a novel approach for eclectic ruleextraction from support vector machines is presented. This approach utilizes the knowledge acquired by the SVM and represented in its support vectors as well as the parameters associated with them. The approach includes three stages; training, propositional ruleextraction and rule quality evaluation. Results from four different experiments have demonstrated the value of the approach for extracting comprehensible rules of high accuracy and fidelity.
On the Performance of AntBased Clustering
 Proc. of the 3 rd Int. Conf. on Hybrid Intelligent Systems, IOS
, 2003
"... Antbased clustering and sorting is a natureinspired heuristic for general clustering tasks. It has been applied variously, from problems arising in commerce, to circuit design, to textmining, all with some promise. However, although early results were broadly encouraging, there has been very l ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
Antbased clustering and sorting is a natureinspired heuristic for general clustering tasks. It has been applied variously, from problems arising in commerce, to circuit design, to textmining, all with some promise. However, although early results were broadly encouraging, there has been very limited analytical evaluation of the algorithm. Toward this end, we first propose a scheme that enables unbiased interpretation of the clustering solutions obtained, and then use this to conduct a full evaluation of the algorithm. Our analysis uses three sets each of real and artificial data, and four distinct analytical measures. These results are compared with those obtained using established clustering techniques and we find evidence that antbased clustering is a robust and viable alternative.
Learning hybrid Bayesian networks from data
, 1998
"... We illustrate two different methodologies for learning Hybrid Bayesian networks, that is, Bayesian networks containing both continuous and discrete variables, from data. The two methodologies differ in the way of handling continuous data when learning the Bayesian network structure. The first method ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
We illustrate two different methodologies for learning Hybrid Bayesian networks, that is, Bayesian networks containing both continuous and discrete variables, from data. The two methodologies differ in the way of handling continuous data when learning the Bayesian network structure. The first methodology uses discretized data to learn the Bayesian network structure, and the original nondiscretized data for the parameterization of the learned structure. The second methodology uses nondiscretized data both to learn the Bayesian network structure and its parameterization. For the direct handling of continuous data, we propose the use of artificial neural networks as probability estimators, to be used as an integral part of the scoring metric defined to search the space of Bayesian network structures. With both methodologies, we assume the availability of a complete dataset, with no missing values or hidden variables. We report experimental results aimed at comparing the two methodologies. These results provide evidence that learning with discretized data presents advantages both in terms of efficiency and in terms of accuracy of the learned models over the alternative approach of using nondiscretized data.
Fast distributed outlier detection in mixedattribute data sets
 Data Min. Knowl. Discov
, 2006
"... Efficiently detecting outliers or anomalies is an important problem in many areas of science, medicine and information technology. Applications range from data cleaning to clinical diagnosis, from detecting anomalous defects in materials to fraud and intrusion detection. Over the past decade, resear ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Efficiently detecting outliers or anomalies is an important problem in many areas of science, medicine and information technology. Applications range from data cleaning to clinical diagnosis, from detecting anomalous defects in materials to fraud and intrusion detection. Over the past decade, researchers in data mining and statistics have addressed the problem of outlier detection using both parametric and nonparametric approaches in a centralized setting. However, there are several challenges that must still be addressed. First, most approaches to date have focused on detecting outliers in a continuous attribute space. However, almost all realworld data sets contain a mixture of categorical and continuous attributes. The categorical attributes are typically ignored or incorrectly modeled by existing approaches, resulting in a significant loss of information. Second, there have not been any generalpurpose distributed outlier detection algorithms. Most distributed detection algorithms are designed with a specific domain (e.g. sensor networks) in mind. Third, the data sets being analyzed may be streaming or otherwise dynamic in nature. Such data sets are prone to concept drift, and models of the data must be dynamic as well. To address these challenges, we present a tunable algorithm for distributed outlier detection in mixedattribute data sets.
ClassificationBased Objective Functions
 Machine Learning. In
, 2007
"... Abstract. Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classificationbased objective functions, an intuitive approach to training artificial neural networks on classification problems. Classificationbased ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classificationbased objective functions, an intuitive approach to training artificial neural networks on classification problems. Classificationbased learning attempts to guide the network directly to correct pattern classification rather than using an implicit search of common error minimization heuristics, such as sumsquarederror (SSE) and crossentropy (CE). CB1 is presented here as a novel objective function for learning classification problems. It seeks to directly minimize classification error by backpropagating error only on misclassified patterns from culprit output nodes. CB1 discourages weight saturation and overfitting and achieves higher accuracy on classification problems than optimizing SSE or CE. Experiments on a large OCR data set have shown CB1 to significantly increase generalization accuracy over SSE or CE optimization, from 97.86 % and 98.10%, respectively, to 99.11%. Comparable results are achieved over several data sets from the UC Irvine Machine Learning Database Repository, with an average increase in accuracy from 90.7 % and 91.3 % using optimized SSE and CE networks, respectively, to 92.1 % for CB1. Analysis indicates that CB1 performs a fundamentally different search of the feature space than optimizing SSE or CE and produces significantly different solutions.
Bayesian Network Learning with Abstraction Hierarchies and ContextSpecific Independence
"... Abstract. Contextspecific independence representations, such as treestructured CPTs, reduce the number of parameters in Bayesian networks by capturing local independence relationships. We previously presented AbstractionBased Search (ABS), a technique for using attribute value hierarchies during B ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Contextspecific independence representations, such as treestructured CPTs, reduce the number of parameters in Bayesian networks by capturing local independence relationships. We previously presented AbstractionBased Search (ABS), a technique for using attribute value hierarchies during Bayesian network learning to remove unimportant distinctions within the CPTs. Recently, we have recognized that the abstraction performed by ABS is complementary to that of TCPTs. In this paper, we introduce TCPT ABS (TABS), which integrates ABS with TCPT learning. Since expertprovided hierarchies may not be available, or may not provide the most useful distinctions, we provide a clustering technique for deriving hierarchies from data. We present empirical results for three realworld domains, finding that (1) combining TCPTs and ABS provides a dramatic reduction in the number of parameters in the learned networks, without loss of accuracy, and (2) dataderived hierarchies perform as well or better than expertprovided hierarchies. 1
A Bounded Index for Cluster Validity
"... Abstract. Clustering is one of the most well known types of unsupervised learning. Evaluating the quality of results and determining the number of clusters in data is an important issue. Most current validity indices only cover a subset of important aspects of clusters. Moreover, these indices are r ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Clustering is one of the most well known types of unsupervised learning. Evaluating the quality of results and determining the number of clusters in data is an important issue. Most current validity indices only cover a subset of important aspects of clusters. Moreover, these indices are relevant only for data sets containing at least two clusters. In this paper, a new bounded index for cluster validity, called the score function (SF), is introduced. The score function is based on standard cluster properties. Several artificial and reallife data sets are used to evaluate the performance of the score function. The score function is tested against four existing validity indices. The index proposed in this paper is found to be always as good or better than these indices in the case of hyperspheroidal clusters. It is shown to work well on multidimensional data sets and is able to accommodate unique and subcluster cases. Key words: clustering, cluster validity, validity index, kmeans 1
An Empirical Comparison of Outlier Detection Algorithms
"... In recent years, researchers have proposed many different techniques for detecting outliers and other anomalies in data sets. In this paper we wish to examine a subset of these techniques, those that have been designed to discover outliers quickly. The algorithms in question are ORCA, LOADED, and RE ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In recent years, researchers have proposed many different techniques for detecting outliers and other anomalies in data sets. In this paper we wish to examine a subset of these techniques, those that have been designed to discover outliers quickly. The algorithms in question are ORCA, LOADED, and RELOADED. We have performed an empirical evaluation of these algorithms, and here present our results as guide to their strengths and weaknesses. 1.
OFFD: Optimal Flexible Frequency Discretization for Naïve Bayes Classification
"... Abstract. Incremental Flexible Frequency Discretization (IFFD) is a recently proposed discretization approach for Naïve Bayes (NB). IFFD performs satisfactory by setting the minimal interval frequency for discretized intervals as a fixed number. In this paper, we first argue that this setting canno ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Incremental Flexible Frequency Discretization (IFFD) is a recently proposed discretization approach for Naïve Bayes (NB). IFFD performs satisfactory by setting the minimal interval frequency for discretized intervals as a fixed number. In this paper, we first argue that this setting cannot guarantee optimal classification performance in terms of classification error. We observed empirically that an optimal minimal interval frequency existed for each dataset. We thus proposed a sequential search and wrapper based incremental discretization method for NB: named Optimal Flexible Frequency Discretization (OFFD). Experiments were conducted on 17 datasets from UCI machine learning repository and performance was compared between NB trained on the data discretized by OFFD, IFFD, PKID, and FFD respectively. Results show that OFFD works better than these alternatives for NB. Experiments between NB discretized on the data with OFFD and C4.5 showed that our new method outperforms C4.5 on most of the datasets we have tested. 1
INTERACTIVE LEARNING PROTOCOLS FOR NATURAL LANGUAGE APPLICATIONS
, 2009
"... Statistical machine learning has become an integral technology for solving many informatics applications. In particular, corpusbased statistical techniques have emerged as the dominant paradigm for core natural language processing (NLP) tasks such as parsing, machine translation, and information ex ..."
Abstract
 Add to MetaCart
Statistical machine learning has become an integral technology for solving many informatics applications. In particular, corpusbased statistical techniques have emerged as the dominant paradigm for core natural language processing (NLP) tasks such as parsing, machine translation, and information extraction, amongst others. However, while supervised machine learning is well understood, its successful application to practical scenarios is predicated on obtaining large annotated corpora and performing significant feature engineering, both notably expensive undertakings. Interactive learning protocols offer one promising solution for reducing these costs by allowing the learner and domain expert to interact during learning in an effort to both reduce sample complexity and improve system performance. By specifying a method where the learner may request targeted information, the domain expert is focused on providing the most useful information. This work formalizes a general framework for interactive learning and examines two interactive learning protocols with particular attention to natural language scenarios. We first examine active learning for structured output spaces, the scenario where there are multiple predictions which must be composed into a structurally coherent global prediction. Secondly, we examine active learning for pipeline models, where a complex prediction is decomposed into a sequence of predictions