## The VGAM Package for Categorical Data Analysis

### Abstract

Classical categorical regression models such as the multinomial logit and proportional odds models are shown to be readily handled by the vector generalized linear and additive model (VGLM/VGAM) framework. Additionally, there are natural extensions, such as reduced-rank VGLMs for dimension reduction, and allowing covariates that have values specific to each linear/additive predictor, e.g., for consumer choice modeling. This article describes some of the framework behind the VGAM R package, its usage and implementation details.

### Citations

Bias-reduction (Firth 1993) is a method for removing the O(n −1 ) bias from a maximum likelihood estimate. For a substantial class of models including GLMs it can be formulated in terms of a minor adjustment of the score vector within an IRLS algorithm.

Its proc logistic handles the multinomial logit and proportional odds models, as well as exact logistic regression (see Stokes et al. 2000, which is for Version 8 of SAS). The fact that the proportional odds model may be fitted by proc logistic, proc genmod and proc probit arguably leads to possible confusion rather than the making of choices.

A package for visualizing categorical data in R is vcd (Meyer et al. 2006, 2009). 2. VGLM/VGAM overview This section summarizes the VGLM/VGAM framework with a particular emphasis toward categorical models since the classes encapsulates many multivariate response models.

The usage of vgam() with smoothing is very similar to gam() (Hastie 2008), e.g., to fit a nonparametric proportional odds model (cf. p.179 of McCullagh and Nelder 1989) to the pneumoconiosis data one could try R> pneumo <- transform(pneumo, let = log(exposure.time)) R> fi...

For a substantial class of models including GLMs it can be formulated in terms of a minor adjustment of the score vector within an IRLS algorithm (Kosmidis and Firth 2009). One by-product, for logistic regression, is that while the maximum likelihood estimate (MLE) can be infinite, the adjustment leads to estimates that are always finite. At present the R package brglm implements bias-reduction for a number of models.

Like brat(), one can choose a different reference group and reference value. Other R packages for the Bradley-Terry model include BradleyTerry2 by H. Turner and D. Firth (with and without ties; Firth 2005, 2008) and prefmod (Hatzinger 2009). 4.4. Genetic models There are quite a number of population genetic models based on the multinomial distribution, e.g., Weir (1996), Lange (2002).

One by-product, for logistic regression, is that while the maximum likelihood estimate (MLE) can be infinite, the adjustment leads to estimates that are always finite. At present the R package brglm (Kosmidis 2008) implements bias-reduction for a number of models. Bias-reduction might be implemented by adding an argument bred = FALSE, say, to some existing VGAM family functions. 2. Nested logit models were developed...

Imai et al. (2008) present another perspective on the xij problem with illustrations from Zelig (Imai et al. 2009). Using the xij argument VGAM handles variables whose values depend on ηj, (22), using the xij argument. It is assigned an S formula or a list of S formulas.

With such data, all (to my knowledge) R implementations give warnings that are vague, if any at all, and this is rather unacceptable (Allison 2004). The safeBinaryRegression package (Konis 2009) overloads glm() so that a check for the existence of the MLE is made before fitting a binary response GLM. In closing, the VGAM package is continually being developed, therefore some future changes are expected.

Other R packages for the Bradley-Terry model include BradleyTerry2 by H. Turner and D. Firth (with and without ties; Firth 2005, 2008) and prefmod (Hatzinger 2009). 4.4. Genetic models There are quite a number of population genetic models based on the multinomial distribution, e.g., Weir (1996), Lange (2002).

With such data, all (to my knowledge) R implementations give warnings that are vague, if any at all, and this is rather unacceptable (Allison 2004). The safeBinaryRegression package (Konis 2009) overloads glm() so that a check for the existence of the MLE is made before fitting a binary response GLM. In closing, the VGAM package is continually being developed, therefore some future changes are expected.

Neither polr() or lrm() appear able to fit the nonproportional odds model. There are non-CRAN packages too, such as the modeling function nordr() (in gnlm; Lindsey 2007), which can fit the proportional odds, continuation ratio and adjacent categories models; however it calls nlm().

We reproduce some of the analyses of Anderson (1984) regarding the progress of 101 patients with back pain using the data frame backPain from gnm (Turner and Firth 2007, 2009). The three prognostic variables are length of previous attack (x1 = 1, 2), pain change (x2 = 1, 2, 3) and lordosis (x3 = 1, 2). Like him, we treat these as numerical and standardize and negate them.

