## Nomograms for visualizing support vector machines (2005)

### Cached

### Download Links

- [eprints.fri.uni-lj.si]
- [eprints.fri.uni-lj.si]
- [stat.columbia.edu]
- [www.ailab.si]
- DBLP

### Other Repositories/Bibliography

Venue: | Association for Computing Machinery |

Citations: | 7 - 1 self |

### BibTeX

@INPROCEEDINGS{Jakulin05nomogramsfor,

author = {Aleks Jakulin},

title = {Nomograms for visualizing support vector machines},

booktitle = {Association for Computing Machinery},

year = {2005},

pages = {108--117}

}

### OpenURL

### Abstract

We propose a simple yet potentially very effective way of visualizing trained support vector machines. Nomograms are an established model visualization technique that can graphically encode the complete model on a single page. The dimensionality of the visualization does not depend on the number of attributes, but merely on the properties of the kernel. To represent the effect of each predictive feature on the log odds ratio scale as required for the nomograms, we employ logistic regression to convert the distance from the separating hyperplane into a probability. Case studies on selected data sets show that for a technique thought to be a black-box, nomograms can clearly expose its internal structure. By providing an easy-to-interpret visualization the analysts can gain insight and study the effects of predictive factors.

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...st important factors that determine the class of the instance?”, and “What is the magnitude of the effect of these?”, and “How do various factors interact?”, and alike. A support vector machine (SVM) =-=[23, 22]-=- is a popular and much applied supervised machine learning method. It is known for good predictive performance, but may be at a disadvantage in terms of intuitive presentation of the classifier, parti... |

3436 | LIBSVM: A Library for Support Vector Machines, 2001. Software available at www.csie.ntu.edu.tw/˜cjlin/libsvm
- Chang, Lin
(Show Context)
Citation Context ...ent a nomogram-based comparison of SVM and the naïve Bayesian classifier model. 3.1 Accuracy As for earlier nomograms, all experiments were performed within the Orange toolkit [4]. We employed LIBSVM =-=[3]-=- with default settings for training the SVM classifiers, and iteratively re-weighted least squares fitting [18] of the logistic regression model, as implemented in the Orange extensions package [12]. ... |

2028 | Learning with Kernels
- Scholkopf, Smola
- 2002
(Show Context)
Citation Context ...st important factors that determine the class of the instance?”, and “What is the magnitude of the effect of these?”, and “How do various factors interact?”, and alike. A support vector machine (SVM) =-=[23, 22]-=- is a popular and much applied supervised machine learning method. It is known for good predictive performance, but may be at a disadvantage in terms of intuitive presentation of the classifier, parti... |

1320 | Generalized Additive Models
- Hastie, Tibshirani
- 1990
(Show Context)
Citation Context ....05, the tick corresponding to the value ‘high’ for the attribute Low status is aligned with 0.05 on the ‘Log OR’ axis. The class of models of the above type are the generalized additive models (GAM, =-=[9]-=-). When each effect function is linear, we speak of a generalized linear model (GLM). For a GLM, the response or the systematic component is written as β0 + È j [β]j[x]j, where[x]jis the j-thcoordinat... |

749 |
Applied logistic regression
- Hosmer, Lemeshow
- 1989
(Show Context)
Citation Context ... are two classes of areas, the expensive with the median values above $21000, and the cheap. To make a prediction using a nomogram, the contributions of attributes on the scale of the log odds ratios =-=[11]-=- (topmost axis of the nomogram) are summed up, and used to determine the probability whether the price is less than $21000 (bottommost axis of the nomogram). For instance summing the effects of 6 room... |

701 | Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods
- Platt
- 1999
(Show Context)
Citation Context ...he identity(p) =p, probit (the inverse of the cumulative Gaussian distribution) and logit(p) =log(p/(1 − p)). The inverse logit link function is F (d) =1/(1 + exp d), and it has been used in the past =-=[21]-=-. While the logistic regression too employs a generalized linear model with the logit link, the effect vector β is optimized directly in order to minimize the probabilistic loss (deviance) of the resu... |

247 |
Verification of forecasts expressed in terms of probability
- BRIER
- 1950
(Show Context)
Citation Context ...teria: classification accuracy, outcome probability estimation (as measured by Brier score, the mean square error of predicted class probabilities given the true class probabilities for each instance =-=[2]-=-), and the area under the receiver operating characteristic. Table 1 compares the naïve Bayesian classifier (NB), logistic regression (LR), support vector machines with RBF kernels (SVM), and support ... |

150 |
The UCI KDD archive
- Bay
- 1999
(Show Context)
Citation Context ... its value changes from the lowest to the highest interval. This kind of presentation is also suitable for ordered discrete attributes. We have used the ‘Horse Colic’ data set from the UCI repository =-=[10]-=-. Among the attributes, we have chosen the respiratory rate and the body temperature, as they are continuous attributes with potentially non-linear effects. It is clear that there is a particular rang... |

116 |
Regression modeling strategies with application to linear models, logistic regression, and survival analysis
- Harrell
- 2001
(Show Context)
Citation Context ...gression model, the use of nomograms was first proposed by Lubsen and coauthors [17]. With an excellent implementation of logistic regression nomograms in S-Plus and R statistical packages by Harrell =-=[7]-=-, the idea has recently been picked up and the nomograms have been used much to present probabilistic classification models in, for instance, clinical medicine and oncology (e.g., [15]). A naïve Bayes... |

103 |
From experimental machine learning to interactive data mining,” white paper, Faculty of Computer and Information
- Zupan, Demˇsar
- 2004
(Show Context)
Citation Context ...additive models. We present a nomogram-based comparison of SVM and the naïve Bayesian classifier model. 3.1 Accuracy As for earlier nomograms, all experiments were performed within the Orange toolkit =-=[4]-=-. We employed LIBSVM [3] with default settings for training the SVM classifiers, and iteratively re-weighted least squares fitting [18] of the logistic regression model, as implemented in the Orange e... |

91 |
Hedonic prices and the demand for clean air
- Harrison, Rubinfeld
- 1978
(Show Context)
Citation Context ...n. To illustrate the general idea, consider the nomogram in Figure 1 which represents a linear SVM model induced from the Boston Housing data set (StatLib, http://lib.stat.cmu.edu/datasets/, also see =-=[8]-=-). The Housing data set consists of 506 different instances (areas of Boston); about 50% of the areas have the median value of housing price lower than $21000. For convenience of this presentation we ... |

74 | Kernel conditional random fields: representation and clique selection
- Lafferty
- 2004
(Show Context)
Citation Context ...tions f(Si), localized for each subset of interacting attributes. The dimensionality of the resulting nomogram visualization is maxi |Si| +1. Thekernel(2) is a special case of the kernels proposed by =-=[5, 1, 16]-=-. In particular, [5] motivated the choice of these kernels through the ability to effectively visualize them. Visualizing Interaction Effects. Fig 4 shows the comparison between models that use intera... |

37 | Structural modelling with sparse kernels
- Gunn, Kandola
- 2002
(Show Context)
Citation Context ...tions f(Si), localized for each subset of interacting attributes. The dimensionality of the resulting nomogram visualization is maxi |Si| +1. Thekernel(2) is a special case of the kernels proposed by =-=[5, 1, 16]-=-. In particular, [5] motivated the choice of these kernels through the ability to effectively visualize them. Visualizing Interaction Effects. Fig 4 shows the comparison between models that use intera... |

30 | Analyzing attribute dependencies
- Jakulin, Bratko
- 2003
(Show Context)
Citation Context ... attributes. If we hold the value of one attribute constant, the effects of other correlated attributes will decrease. This problem is referred to as a negative interaction or as attribute redundancy =-=[13]-=-. We illustrated this on an example in Fig. 7, where all attributes became irrelevant once the sex attribute was accounted for. It does not mean that the other attributes are irrelevant, just that sex... |

22 | T.: Exponential families for conditional random fields
- Altun, Smola, et al.
- 2004
(Show Context)
Citation Context ...tions f(Si), localized for each subset of interacting attributes. The dimensionality of the resulting nomogram visualization is maxi |Si| +1. Thekernel(2) is a special case of the kernels proposed by =-=[5, 1, 16]-=-. In particular, [5] motivated the choice of these kernels through the ability to effectively visualize them. Visualizing Interaction Effects. Fig 4 shows the comparison between models that use intera... |

20 | Testing the significance of attribute interactions
- Jakulin, Bratko
- 2004
(Show Context)
Citation Context ... of employment duration, controlling for residence duration. This pair of attributes in ‘German-credit’ has been identified as significantly interacting, using the methodology of interaction analysis =-=[14]-=-. Both nomograms are very similar when comparing the first four attributes. Among those four, the most influential attribute is the purpose of the credit: buying used cars incurs low risk, and educati... |

11 |
P: A preoperative nomogram for disease recurrence following radical prostatectomy for prostate cancer
- Kattan, Eastham, et al.
- 1998
(Show Context)
Citation Context ...kages by Harrell [7], the idea has recently been picked up and the nomograms have been used much to present probabilistic classification models in, for instance, clinical medicine and oncology (e.g., =-=[15]-=-). A naïve Bayesian classifier can too be visualized in the form of a nomogram [19]. The nomograms for support vector machines that we inFigure 1: A nomogram of the SVM model that predicts the probab... |

8 |
der Does. A practical device for the application of a diagnostic or prognostic function
- Lubsen, Pool, et al.
- 1978
(Show Context)
Citation Context ... Nomograms are not an uncertain novelty, but a milestone in the history of visualization [6]. To visualize a logistic regression model, the use of nomograms was first proposed by Lubsen and coauthors =-=[17]-=-. With an excellent implementation of logistic regression nomograms in S-Plus and R statistical packages by Harrell [7], the idea has recently been picked up and the nomograms have been used much to p... |

7 |
Nomograms for visualization of naive bayesian classifier
- Moˇzina, Demˇsar, et al.
- 2004
(Show Context)
Citation Context ...een used much to present probabilistic classification models in, for instance, clinical medicine and oncology (e.g., [15]). A naïve Bayesian classifier can too be visualized in the form of a nomogram =-=[19]-=-. The nomograms for support vector machines that we inFigure 1: A nomogram of the SVM model that predicts the probability of costly housing in a given Boston area. The dots illustrate the classificat... |

5 |
nomograms: A particular history of graphs
- Hankins, “Blood
- 1999
(Show Context)
Citation Context ...support vector machine*’ yielded fewer than 400). A search for nomograms on Google resulted in 77000 web pages. Nomograms are not an uncertain novelty, but a milestone in the history of visualization =-=[6]-=-. To visualize a logistic regression model, the use of nomograms was first proposed by Lubsen and coauthors [17]. With an excellent implementation of logistic regression nomograms in S-Plus and R stat... |

4 |
Extensions to the Orange data mining framework
- Jakulin
- 2002
(Show Context)
Citation Context ...VM [3] with default settings for training the SVM classifiers, and iteratively re-weighted least squares fitting [18] of the logistic regression model, as implemented in the Orange extensions package =-=[12]-=-. We experimented on 16 well-known UCI [10] data sets with a binary outcome. For data sets with more than 1000 examples (‘mushroom’ and ‘spam base’) we have selected a stratified random subset of 1000... |

4 |
Algorithm AS 274: Least squares routines to supplement those of Gentleman
- Miller
- 1992
(Show Context)
Citation Context ...mograms, all experiments were performed within the Orange toolkit [4]. We employed LIBSVM [3] with default settings for training the SVM classifiers, and iteratively re-weighted least squares fitting =-=[18]-=- of the logistic regression model, as implemented in the Orange extensions package [12]. We experimented on 16 well-known UCI [10] data sets with a binary outcome. For data sets with more than 1000 ex... |

4 | Building sparse representations and structure determination on LS-SVM substrates
- Pelckmans, Suykens, et al.
- 2005
(Show Context)
Citation Context ...f model selection. Finally, it is possible to express a preference for sparse and smooth kernels as a part of the optimization problem, combining the quest for decomposability and the actual learning =-=[5, 20]-=-. With the example of Sect. 3.2, we pointed out that nomograms may be the right tool for experimental comparison of different models and modelling techniques, as it allows to easily spot the similarit... |