Abstract
Missing data is a commonly encountered problem in practice that can degrade the performance of predictive models and distort statistical inference. Many advanced imputation techniques suffer from a lack of interpretability, operating as “black boxes”. This paper introduces Parameter-based Imputation via Cycle-ensemble Averaging (PICA), a novel framework that first estimates distributional parameters directly from incomplete data and then performs imputation, offering a robust and transparent solution. PICA utilizes the direct parameter estimation method to compute the mean vector and covariance matrix. Subsequently, missing values are imputed using the conditional expectation within a cycle-ensemble strategy, which aggregates imputations from overlapping, cyclically structured feature subsets to enhance robustness. We provide a theoretical proof that the resulting estimator is the best linear unbiased estimator. Comprehensive experiments on multiple datasets, with missingness rates ranging from 15% to 75%, show that PICA achieves superior accuracy and stability compared to a wide range of state-of-the-art imputation methods, particularly at high missing rates.
Vo, T.L., Dang, U., Hua, V., Nguyen, X.-H., Nguyen, T. and Huynh, B. (2026) Information Sciences, 749, p. 123532.

