Results 1 - 10
of
35
Regression Error Characteristic Surfaces
- In Proc. of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'05), ACM
, 2005
"... This paper presents a generalization of Regression Error Characteristic (REC) curves. REC curves describe the cumulative distribution function of the prediction error of models and can be seen as a generalization of ROC curves to regression problems. REC curves provide useful information for analyzi ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
This paper presents a generalization of Regression Error Characteristic (REC) curves. REC curves describe the cumulative distribution function of the prediction error of models and can be seen as a generalization of ROC curves to regression problems. REC curves provide useful information for analyzing the performance of models, particularly when compared to error statistics like for instance the Mean Squared Error. In this paper we present Regression Error Characteristic (REC) surfaces that introduce a further degree of detail by plotting the cumulative distribution function of the errors across the distribution of the target variable, i.e. the joint cumulative distribution function of the errors and the target variable. This provides a more detailed analysis of the performance of models when compared to REC curves. This extra detail is particularly relevant in applications with non-uniform error costs, where it is important to study the performance of models for specific ranges of the target variable. In this paper we present the notion of REC surfaces, describe how to use them to compare the performance of models, and illustrate their use with an important practical class of applications: the prediction of rare extreme values.
Understanding geometric manipulations of images through BOVW-based hashing
- In IEEE International Workshop on Content Protection &Forensics
, 2011
"... The increasing use of low cost imaging devices and the innovations in terms of media distribution technologies induce a growing interest on technologies able to protect digital visual media against malicious manipulations of the visual contents. One of the main problems addressed in this research ar ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
(Show Context)
The increasing use of low cost imaging devices and the innovations in terms of media distribution technologies induce a growing interest on technologies able to protect digital visual media against malicious manipulations of the visual contents. One of the main problems addressed in this research area is the blind detection of traces of forgery on an image obtained through the internet. Specifically, in this paper we consider the context of communications, where malicious image manipulations should be detected by a receiver. In the proposed method, an image hash based on the Bag of Visual Words paradigm is attached as signature to the image before transmission. The forensic hash is then analyzed at destination to detect the geometric transformations which have been applied to the received image. This task is fundamental for further processing which usually assumes that the received image is aligned with the original one, as in the case of tampering detection systems. Experiments show that the proposed approach outperforms state-of-the art methods by obtaining a good margin in terms of performances.
Benchmarking of linear and nonlinear approaches for quantitative structure-property relationship studies of metal complexation with ionophores
"... property relationships (QSPR) of stability constants logK1 for the 1:1 (M:L) and log�2 for 1:2 complexes of metal cations Ag + and Eu 3+ with diverse sets of organic molecules in water at 298 K and ionic strength 0.1 M. The methods were tested on three types of descriptors: molecular descriptors inc ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
property relationships (QSPR) of stability constants logK1 for the 1:1 (M:L) and log�2 for 1:2 complexes of metal cations Ag + and Eu 3+ with diverse sets of organic molecules in water at 298 K and ionic strength 0.1 M. The methods were tested on three types of descriptors: molecular descriptors including E-state values, counts of atoms determined for E-state atom types, and substructural molecular fragments (SMF). Comparison of the models was performed using a 5-fold external cross-validation procedure. Robust statistical tests (bootstrap and Kolmogorov-Smirnov statistics) were employed to evaluate the significance of calculated models. The Wilcoxon signed-rank test was used to compare the performance of methods. Individual structure-complexation property models obtained with nonlinear methods demonstrated a significantly better performance than the models built using multilinear regression analysis (MLRA). However, the averaging of several MLRA models based on SMF descriptors provided as good of a prediction as the most efficient nonlinear techniques. Support Vector Machines and Associative Neural Networks contributed in the largest number of significant models. Models based on fragments (SMF descriptors and E-state counts) had higher prediction ability than those based on E-state indices. The use of SMF descriptors and E-state counts provided
Lamb Meat Quality Assessment by Support Vector Machines †
"... Abstract. The correct assessment of meat quality (i.e., to fulfill the consumer’s needs) is crucial element within the meat industry. Although there are several factors that affect the perception of taste, tenderness is considered the most important characteristic. In this paper, a Feature Selection ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract. The correct assessment of meat quality (i.e., to fulfill the consumer’s needs) is crucial element within the meat industry. Although there are several factors that affect the perception of taste, tenderness is considered the most important characteristic. In this paper, a Feature Selection procedure, based on a Sensitivity Analysis, is combined with a Support Vector Machine, in order to predict lamb meat tenderness. This real-world problem is defined in terms of two difficult regression tasks, by modeling objective (e.g. Warner-Bratzler Shear force) and subjective (e.g. human taste panel) measurements. In both cases, the proposed solution is competitive when compared with other neural (e.g. Multilayer Perceptron) and Multiple Regression approaches.
Multi-Objective Supervised Learning
"... Abstract. This paper sets out a number of the popular areas from the literature in multi-objective supervised learning, along with simple examples. It continues by highlighting some specific areas of interest/concern when dealing with multi-objective supervised learning problems, and highlights futu ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract. This paper sets out a number of the popular areas from the literature in multi-objective supervised learning, along with simple examples. It continues by highlighting some specific areas of interest/concern when dealing with multi-objective supervised learning problems, and highlights future areas of potential research. 1 Introduction: What
Boosting for Regression Using Regression Error Characteristic Curves
- In Proceedings of the ICML 2005 Workshop on ROC Analysis in Machine Learning (ROCML
, 2005
"... Boosting is one of the most popular methods for constructing ensembles. The objective of this work is to present a boosting algorithm for regression based on the Regressor-Boosting algorithm, in which we propose the use of REC curves in order to select a good threshold value, so that only residuals ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Boosting is one of the most popular methods for constructing ensembles. The objective of this work is to present a boosting algorithm for regression based on the Regressor-Boosting algorithm, in which we propose the use of REC curves in order to select a good threshold value, so that only residuals greater than that value are considered as errors. The algorithm was empirically evaluated and its results were analyzed also by means of REC curves. 1.
The Unbalanced Classification Problem: Detecting Breaches in Security
- DOCTORAL DISSERTATION, RENSSELAER POLYTECHNIC INSTITUTE
, 2006
"... ..."
Modeling the Relationship between Software Effort and Size Using Deming Regression
"... Background: The relation between software effort and size has been modeled in literature as exponential, in the sense that the natural logarithm of effort is expressed as a linear function of the logarithm of size. The common approach to estimate the parameters of the linear model is ordinary least ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Background: The relation between software effort and size has been modeled in literature as exponential, in the sense that the natural logarithm of effort is expressed as a linear function of the logarithm of size. The common approach to estimate the parameters of the linear model is ordinary least squares regression which has been extensively applied to various datasets. The least squares estimation takes into account only the error arising from the dependent variable (effort), while the measurement of independent variable (size) is considered free of errors. Aims: The basis of the study is that in practice the assumption of measuring the size without error is hardly true, since the size of a software project depends on the precision of the tool of measurement and often by the subjectivity of the rater. Moreover, the sizes of projects comprising a dataset have been measured by
RESEARCH INTO MULTIPLE OUTLIERS IN LINEAR REGRESSION ANALYSIS
"... Studying the observations in regression analysis it is seen that the out-put of regression is affected from outliers in the direction of the depen-dent and / or the independent variables. In this paper multiple outliers are examined in two real data sets. The results concerned with which method can ..."
Abstract
- Add to MetaCart
(Show Context)
Studying the observations in regression analysis it is seen that the out-put of regression is affected from outliers in the direction of the depen-dent and / or the independent variables. In this paper multiple outliers are examined in two real data sets. The results concerned with which method can determine multiple outliers better are examined with the help of some statistics and REC curve which can be used for determin-ing efficiency. Also, the results are tried to support by using Monte