## Stochastic Complexity Based Estimation of Missing Elements in Questionnaire Data (1998)

Venue: | in Questionnaire Data”. the Annual American Educational Research Association Meeting, SIG Educational Statisticians |

Citations: | 2 - 0 self |

### BibTeX

@INPROCEEDINGS{Tirri98stochasticcomplexity,

author = {Henry Tirri and Tomi Silander},

title = {Stochastic Complexity Based Estimation of Missing Elements in Questionnaire Data},

booktitle = {in Questionnaire Data”. the Annual American Educational Research Association Meeting, SIG Educational Statisticians},

year = {1998}

}

### OpenURL

### Abstract

this paper we study a new information-theoretically justified approach to missing data estimation for multivariate categorical data. The approach discussed is a model-based imputation procedure relative to a model class (i.e., a functional form for the probability distribution of the complete data matrix), which in our case is the set of multinomial models with some independence assumptions. Based on the given model class assumption an information-theoretic criterion can be derived to select between the different complete data matrices. Intuitively this general criterion, called stochastic complexity, represents the shortest code length needed for coding the complete data matrix relative to the model class chosen. Using this information-theoretic criteria, the missing data problem is reduced to a search problem, i.e., finding the data completion with minimal stochastic complexity. In the experimental part of the paper we present empirical results of the approach using two real data sets, and compare these results to those achived by commonly used techniques such as case deletion and imputating sample averages. Introduction

### Citations

9132 | Elements of Information Theory - Cover, Thomas - 1991 |

8919 | Maximum likelihood from incomplete data via the EM algorithm (with discussion - Dempster, Laird, et al. - 1977 |

7440 |
Probabilistie Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...6 52.84 41.24 50 20.70 41.18 44.80 22.62 5 61.80 61.84 61.96 61.79 10 60.34 61.47 61.80 61.06 250 20 53.80 59.86 60.74 56.91 35 40.49 54.74 57.87 46.85 50 27.01 48.19 51.40 30.80 works (Jensen, 1996; =-=Pearl, 1988)) allows -=-better "compression " of the data, i.e., better completions can be made. It is important to notice that in this sense the informationtheoretic approach to modeling based on compression is ak... |

3884 | Optimization by simulated annealing - Kirkpatrick, Gelatt, et al. - 1983 |

2689 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ...an be approximated by SC(D obs ; D mis jM )slogP(D obs ; D mis jsQ;M )P(sQjM ) \Gamma C; (7) where C is a constant depending on the number of the model parameters, and the number of the data vectors (=-=Schwarz, 1978-=-; Rissanen, 1989). As the expectation of the first term of this approximation is maximized during the EM process, it can be argued that the EM optimizes the stochastic complexity indirectly by optimiz... |

2490 | Equation of state calculation by fast computing machines - Metropolis, Rosenbluth, et al. - 1953 |

1460 | Bayesian data analysis
- Gelman, Carlin, et al.
- 1995
(Show Context)
Citation Context ...values to elements in D mis , in optimal manner with respect to the inference tasks for which the data is to be used. Here we restrict ourselves to predictive inference tasks (Bernardo & Smith, 1994; =-=Gelman et al., 1995-=-), i.e., we aim at developing augmentation methods that produce completions which result in good predictive performance, when the completed data is used to build a predictive model. Completion criteri... |

1374 |
Statistical Decision Theory and Bayesian Analysis
- Berger
- 1985
(Show Context)
Citation Context ...iscussed here is similar in the sense that it is always relative to a model class, and the criterion used for finding the values to be imputed can be approximated by the Bayesian marginal likelihood (=-=Berger, 1985-=-; Bernardo & Smith, 1994; O'Hagan, 1994). However, we are interested in the problem of finding a single optimal completion of the incomplete data set instead of a set of completions typical to multipl... |

1149 | Bayesian theory - Bernardo, Smith - 1994 |

1132 | A Bayesian method for the induction of probabilistic networks from data - Cooper, Herskovits - 1992 |

968 |
Introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ... 35 33.04 49.66 52.84 41.24 50 20.70 41.18 44.80 22.62 5 61.80 61.84 61.96 61.79 10 60.34 61.47 61.80 61.06 250 20 53.80 59.86 60.74 56.91 35 40.49 54.74 57.87 46.85 50 27.01 48.19 51.40 30.80 works (=-=Jensen, 1996; Pearl, 1-=-988)) allows better "compression " of the data, i.e., better completions can be made. It is important to notice that in this sense the informationtheoretic approach to modeling based on comp... |

948 | Learning bayesian networks: The combination of knowledge and statistical data - Heckerman, Geiger, et al. - 1995 |

920 |
Multiple Imputation for Nonresponse in Surveys
- Rubin
- 1987
(Show Context)
Citation Context ... are an important aspect of quantitative data analysis. The problem of missing data estimation has been addressed widely in the statistics literature (see e.g., (Gelman, Carlin, Stern, & Rubin, 1995; =-=Rubin, 1987-=-, 1996; Schafer, 1995)). The last quarter of a century has seen many developments in this area. The EM algorithm together with its extensions (Dempster, Laird, & Rubin, 1977; McLachlan & Thriyambakam,... |

806 |
Optimal Statistical Decisions
- DeGroot
- 1970
(Show Context)
Citation Context ...onal distributions P(X i jX s = k) are multinomial, i.e., X ssMulti(1;a 1 ; : : : ; aK ), and X ijksMulti(1;f ki1 ; : : : ; f kin i ). Since the family of Dirichlet densities is conjugate (see e.g., (=-=DeGroot, 1970-=-)) to the family of multinomials, it is convenient to assume that the prior distributions of the parameters are from this family (see, e.g., (Heckerman, Geiger, & Chickering, 1995)). More precisely, l... |

698 | Statistical Analysis of Finite Mixture Distributions - Titterington, Smith, et al. - 1985 |

626 | Markov chain Monte Carlo in practice - Gilks, Richardson, et al. - 1996 |

533 |
Stochastic Complexity
- Rissanen
- 1989
(Show Context)
Citation Context ...heoretic criteria can be derived to select between the different complete data matrices for the more "likely" one (in abstract sense). Intuitively this general criteria, called stochastic co=-=mplexity (Rissanen, 1987-=-, 1989, 1996) represents the shortest code length needed for coding the complete data matrix relative to the model class chosen. Unfortunately in general the exact criteria is very hard to compute for... |

521 |
Analysis of Incomplete Multivariate Data
- Schafer
- 1997
(Show Context)
Citation Context ...put our work in perspective, however, we would like to remind that the proposed methods can essentially be categorized into two general approaches: case deletion and imputation (Little & Rubin, 1987; =-=Schafer, 1997-=-). In case deletion all the cases with missing data are omitted and the analysis is performed only using the complete cases. Obviously this approach is a reasonable solution only if the incomplete cas... |

325 |
Stochastic Complexity in Statistical Inquiry
- Rissanen
(Show Context)
Citation Context ...on involves the use of a description method or code, which is a one--one mapping from datasets to their descriptions. Without loss of generality, these descriptions may be taken to be binary strings (=-=Rissanen, 1989-=-). Intuitively, the shorter the description or codelength of a set of D, the more regular or simpler the set D is. Rissanen (1987) defines the stochastic complexity informally as follows: The stochast... |

296 |
Universal coding, information prediction and estimation
- Rissanen
- 1984
(Show Context)
Citation Context ...study a new informationtheoretically justified approach to missing data estimation. The method discussed is deeply related to Bayesian inference, but originates from the research on universal coding (=-=Rissanen, 1984-=-), which aims at finding good (short) encodings of data. We do not make an attempt to provide a survey of the aforementioned developments in missing data estimation---an interested reader can consult ... |

290 |
Fisher information and stochastic complexity
- Rissanen
- 1996
(Show Context)
Citation Context ...gth of D should be short. However, it turns out to be very hard to define "with the help of" in a formal manner. Indeed, a completely satisfactory formal definition has only been found very =-=recently (Rissanen, 1996-=-). Note that the informal definition of stochastic complexity (SC) as given above presumes the existence of a code: by definition, the SC of a data set D is the length of the encoding of D where the e... |

267 | Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing - Aarts, Korst - 1989 |

67 |
Kendall’s Advanced Theory of Statistics, Volume 2B: Bayesian Inference
- O’Hagan
- 1994
(Show Context)
Citation Context ...that it is always relative to a model class, and the criterion used for finding the values to be imputed can be approximated by the Bayesian marginal likelihood (Berger, 1985; Bernardo & Smith, 1994; =-=O'Hagan, 1994-=-). However, we are interested in the problem of finding a single optimal completion of the incomplete data set instead of a set of completions typical to multiple imputation procedures. Moreover, as w... |

32 | MDL and MML: Similarities and differences - Baxter, Oliver |

20 | Comparing predictive inference methods for discrete domains - Kontkanen, Myllymäki, et al. - 1997 |

16 | Maximally Maintained Inequality: Expansion, Reform and Opportunity - Raftery, Hout - 1993 |

5 | Bayesian and information-theoretic priors for Bayesian network parameters - Kontkanen, Myllymaki, et al. - 1998 |

3 | On the accuracy of stochastic complexity approximations - Kontkanen, Myllymäki, et al. - 1997 |

2 | Bayes factors (Tech - Kass - 1994 |

2 | Comparing stochastic complexity minimization algorithms in estimating missing data - Kontkanen, Myllymaki, et al. - 1997 |

2 |
Model-based imputation of census short-form items
- Schafer
- 1995
(Show Context)
Citation Context ...spect of quantitative data analysis. The problem of missing data estimation has been addressed widely in the statistics literature (see e.g., (Gelman, Carlin, Stern, & Rubin, 1995; Rubin, 1987, 1996; =-=Schafer, 1995-=-)). The last quarter of a century has seen many developments in this area. The EM algorithm together with its extensions (Dempster, Laird, & Rubin, 1977; McLachlan & Thriyambakam, 1997), multiple impu... |

1 |
Equality of opportunity in irish schools
- Kelleghan
- 1984
(Show Context)
Citation Context ...lved a classification task of correctly identifying the one of the six schools using the other variables as predictors. The second data set was "Irish educational transitions data" (IRISH) (=-=Greaney & Kelleghan, 1984-=-) reanalyzed by Raftery and Hout (1993). Subjects of this data set were 500 Irish schoolchildren aged 11 in 1967. The data were also used, in a simplified form, as an example to illustrate Bayesian mo... |

1 |
The EM algorithm and extensions
- Thriyambakam
- 1997
(Show Context)
Citation Context ...; Rubin, 1987, 1996; Schafer, 1995)). The last quarter of a century has seen many developments in this area. The EM algorithm together with its extensions (Dempster, Laird, & Rubin, 1977; McLachlan & =-=Thriyambakam, 1997-=-), multiple imputation (Rubin, 1987, 1996; Schafer, 1995) and Markov Chain Monte Carlo (Gilks, Richardson, & J., 1996) all provide tools for inference in large classes of missing data problems. In pra... |

1 | Multiple inputation after 18 years - Rubin - 1996 |