## A Distance Based Regression Model for Prediction with Mixed Data (1990)

Venue: | Communications in Statistics A. Theory and Methods |

Citations: | 7 - 7 self |

### BibTeX

@ARTICLE{Cuadras90adistance,

author = {C. M. Cuadras and C. Arenas},

title = {A Distance Based Regression Model for Prediction with Mixed Data},

journal = {Communications in Statistics A. Theory and Methods},

year = {1990},

volume = {19},

pages = {2261--2279}

}

### OpenURL

### Abstract

A multiple regression method based on distance analysis and metric scaling is proposed and studied. This method allow us to predict a continuous response variable from several explanatory variables, is compatible with the general linear model and is found to be useful when the predictor variables are both continuous and categorical. Real data examples are given to illustrate the results obtained. 1 Introduction Many authors have considered the problem in regression or multivariate analysis of having both qualitative and quantitative variables. Some procedures have been used in regression and association (Young et al. 1976; Daudin 1980; Roskam 1980; Lauritzen and Wermuth 1989), principal components (Young et al. 1978; Kiers 1989a , 1989b) and discriminant analysis Krzanowski (1975, 1986; Knoke 1982). The methodologies are mainly based on optimal scaling, generalized correlation coefficients, the location model and distance-based analysis. Although statistical analysis on mixed data is ...

### Citations

861 |
Multivariate Analysis
- Mardia, Kent, et al.
(Show Context)
Citation Context ...milar, i.e., d ij �� = 0 if w i �� = w j . Henceforth, we suppose that D = (d ij ) is a Euclidean distance matrix. Let A = (a ij ), where a ij = \Gamma(d 2 ij )=2, and set B = HAH. It is well-=-=-known (Mardia et al. 1979-=-) that B is positive semi--definite and, assuming rank(B) = m, we have B = XX 0 ; (9) where X is a n \Theta m--matrix of rank m. If, in addition, X 0 X =s= diag ( 1 ; : : : ; m ); (10) wheresi , i = 1... |

233 | Multivariate Observations - Seber - 1984 |

164 |
Graphical models for associations between variables, some of which are qualitative and some quantitative’, Annals of Statistics
- Lauritzen, Wermuth
- 1989
(Show Context)
Citation Context ...n regression or multivariate analysis of having both qualitative and quantitative variables. Some procedures have been used in regression and association (Young et al. 1976; Daudin 1980; Roskam 1980; =-=Lauritzen and Wermuth 1989-=-), principal components (Young et al. 1978; Kiers 1989a , 1989b) and discriminant analysis Krzanowski (1975, 1986; Knoke 1982). The methodologies are mainly based on optimal scaling, generalized corre... |

147 | Information and accuracy attainable in the estimation of statistical parameters - Rao - 1945 |

145 | Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika - Gower - 1966 |

123 | Principal Component Analysis - Jollife - 2002 |

106 | A General Coefficient of Similarity and Some of its Properties - Gower - 1971 |

70 | SAS/STAT user’s guide - INSTITUTE - 1992 |

38 | Entropy differential metric, distance and divergence measures in probability spaces: a unified approach - Burbea, Rao - 1982 |

33 |
Quantitative Analysis of Qualitative Data
- Young
- 1981
(Show Context)
Citation Context ...and indolene (2). We have 110 cases and three variables: 0 binary, 1 qualitative and 2 continuous. In this case we also compute the results for the transformed model obtained by MORALS algorithm (see =-=Young 1981-=-), which uses four transformed variables instead of the three initial variables. Note that DB improves CR for untransformed data, and DB for transformed data yields a very close result to MORALS. Howe... |

24 | Applied Linear Regression, Second Edition - Weisberg - 1985 |

19 | Adding a point to vector diagrams in multivariate analysis - Gower - 1968 |

13 | Distance Analysis in discrimination and classification using both continuous and categorical variables - Cuadras - 1989 |

11 | Regression with qualitative and quantitative variables: An alternating least squares method with optimal scaling features - Young, Leeuw, et al. - 1976 |

8 | Rao's distance measure - ATKINSON, MITCHELL - 1981 |

8 | Discrimination and classification using both binary and continuous variables - Krzanowski - 1975 |

8 |
Distance Between Populations Using Mixed Continuous and Categorical Variables
- Krzanowski
- 1983
(Show Context)
Citation Context ...l way of proceeding could be to subject all the qualitative variables to some scoring system (optimal scaling, for example) and consider all the variables as quantitative. Other options are possible (=-=Krzanowski 1983-=-), but a very satisfactory option does not exist. However, such mixtures of variables arise frequently in the applications (medicine, biometry, psychology, econometrics) and only comparatively few mod... |

4 | Statistical Data Analysis and Inference - DODGE - 1989 |

4 | Component selection norms for principal components regression - Hill, Fomby, et al. - 1977 |

4 | Some geometrical aspects of data analysis and statistics. See Dodge - OLLER - 1989 |

4 | Rao’s distance for negative multinomial distributions - Oller, Cuadras - 1985 |

3 |
Discriminant Analysis with Discrete and Continuous Variables
- KNOKE
- 1982
(Show Context)
Citation Context ... association (Young et al. 1976; Daudin 1980; Roskam 1980; Lauritzen and Wermuth 1989), principal components (Young et al. 1978; Kiers 1989a , 1989b) and discriminant analysis Krzanowski (1975, 1986; =-=Knoke 1982-=-). The methodologies are mainly based on optimal scaling, generalized correlation coefficients, the location model and distance-based analysis. Although statistical analysis on mixed data is commonly ... |

2 |
Three--Way Methods for the Analysis of Qualitative and Quantitative Data, no
- Kiers
- 1989
(Show Context)
Citation Context ...uantitative variables. Some procedures have been used in regression and association (Young et al. 1976; Daudin 1980; Roskam 1980; Lauritzen and Wermuth 1989), principal components (Young et al. 1978; =-=Kiers 1989-=-a , 1989b) and discriminant analysis Krzanowski (1975, 1986; Knoke 1982). The methodologies are mainly based on optimal scaling, generalized correlation coefficients, the location model and distance-b... |

2 | Multiple Discriminant Analysis in the presence of mixed continuous and categorical data - KRZANOWSKI - 1986 |

2 | The optimal set of principal component restrictions on a least squares regression - Lott - 1973 |

2 |
Principal component regression under exchangeability
- Soofi
- 1988
(Show Context)
Citation Context ...77). In fact, this choice is an open question Jollife (1986, p.238) and a coherent criterion for the dimension reduction does not exist in the classical formulation of principal component regression (=-=Soofi 1988-=-). 3.2 Computing the coefficient of determination We consider model (11) and the OLS estimates given in (13). Let b Y k = b fi 0 y 1 +X (k) b fi (k) : (15) The coefficient of determination is given by... |

1 | Health status age: an age predictive health status index - Abrahamse, Kisch - 1975 |

1 | Distancias estad'isticas (with discussion - Cuadras - 1988 |

1 | M'etodos estad'isticos aplicables a la reconstrucci'on prehist'orica - Cuadras - 1988 |

1 |
Regression qualitative: choix de l'espace predicteur. See Diday
- Daudin
- 1980
(Show Context)
Citation Context ...e considered the problem in regression or multivariate analysis of having both qualitative and quantitative variables. Some procedures have been used in regression and association (Young et al. 1976; =-=Daudin 1980-=-; Roskam 1980; Lauritzen and Wermuth 1989), principal components (Young et al. 1978; Kiers 1989a , 1989b) and discriminant analysis Krzanowski (1975, 1986; Knoke 1982). The methodologies are mainly ba... |

1 | Data analysis and informatics - Diday - 1989 |

1 | The many faces of multivariate analysis - Jansen, Schuur - 1989 |

1 |
Principal component analysis on a mixture of quantitative and qualitative data based on generalized correlation coefficients. See Jansen and van Schuur
- Kiers
- 1989
(Show Context)
Citation Context ...uantitative variables. Some procedures have been used in regression and association (Young et al. 1976; Daudin 1980; Roskam 1980; Lauritzen and Wermuth 1989), principal components (Young et al. 1978; =-=Kiers 1989-=-a , 1989b) and discriminant analysis Krzanowski (1975, 1986; Knoke 1982). The methodologies are mainly based on optimal scaling, generalized correlation coefficients, the location model and distance-b... |

1 |
Nonmetric analysis of linear models. See Diday
- Roskam
- 1980
(Show Context)
Citation Context ...the problem in regression or multivariate analysis of having both qualitative and quantitative variables. Some procedures have been used in regression and association (Young et al. 1976; Daudin 1980; =-=Roskam 1980-=-; Lauritzen and Wermuth 1989), principal components (Young et al. 1978; Kiers 1989a , 1989b) and discriminant analysis Krzanowski (1975, 1986; Knoke 1982). The methodologies are mainly based on optima... |