Assessment of the performance of imputation techniques in observational studies with two measurements

Friday, March 2nd, 2018
11:30-12:30, TIMON Room at DeustoTech

Urko Aguirre

Research Unit, Hospital Galdakao-Usansolo, REDISSEC: Red de Investigación en Servicios Sanitarios y Enfermedades Crónicas, 48960 Galdakao, Spain.

Pre-post studies based on health related quality of life (HRQoL) variables are motivated to determine the potential predictors of the mean change of the outcome of interest. It is very common in such studies for data to be missing, which can bias the results. The appropriate statistical approach to analyze the whole sample, with nonignorable missingness is a relevant issue that statisticians must address. Imputation techniques such as K-Nearest Neighbour (K-NN), Markov Chain Monte Carlo (MCMC) or Propensity score (PS) have been suggested as alternative to naive methods Complete Case (CC), Available Case (AC)- to handle missing outcomes. The goal of the study was to compare the performance of various imputation techniques under different missingness mechanisms and rates.

Five analysis approaches – CC, AC, K-NN, MCMC and PS – combined with mixed models have been compared under different settings (rate: 10% and 30%; mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR)). These strategies were applied to a pre-post study of 400 patients with chronic obstructive pulmonary disease (COPD). We analyzed the relationship of the changes in subjects HRQoL over one year with clinical and sociodemographic characteristics. A simulation study was performed (500 and 1000 runs), where the standardized bias of the regression coefficient of the interaction between the Time effect and the covariate was computed.

In both 500 and 1000 simulation-runs, CC with mixed models showed the lowest standardized bias coefficients for MCAR and MAR scenarios. However, in MNAR setting, both approaches provided biased coefficients. PS was the worst imputation method.
MCMC has not additional benefit over CC when handling missing data for MCAR and MAR settings. In MNAR, all methods showed biased results.

1. Altman DG, Bland JM (2007). Missing data. BMJ 334 (7590):424.
2. Barnard, J. and Meng, X. (1999) Applications of multiple imputation in medical studies: From AIDS to NHANES. Statistical Methods in Medical Research 8, 1736.
3. Little, R.J.A. and Rubin, D.B. (2002) Statistical analysis with missing data. New York, Ed.