Abstract
Counterfactual explanations (CE) can make face classifiers more transparent, yet existing methods implicitly target a single attribute at a time, yield unnatural imagery, overly broad or off-target edits, and weak interpretability. We introduce a unified framework that (i) constrains edits with FacePart-based segmentation masks, (ii) performs localized blended latent diffusion on Stable Diffusion 2.1, (iii) re-ranks candidates via a semantic filter that combines CLIP with a trained classifier, and (iv) applies a controlled PGD adversarial step to expose decisive regions – jointly leveraging prompts, spatial masks, and classification gradients to steer generation while preserving identity. On CelebA-HQ with 40 attributes, the method attains a 98% flip ratio with perfect face-verification agreement (FVA = 100%) and high SimSiam similarity (S3 = 0.99), while averaging 60 s per image on an NVIDIA T4; the pipeline remains stable across diverse multi-label settings and is designed to run efficiently on commodity hardware. In ablations, FacePart masks improve blended-diffusion realism and control (lower FID, lower MNAC), whereas Grad-CAM++ masks maximize flip ratio during the adversarial step-evidence that the system balances visual quality with causal focus. These findings suggest a practical balance among realism, locality, and validity on commodity hardware.
Vo-Hoang, H.-V., Do, K.-H. and Le, B. (2026) Expert Systems with Applications, 314, p. 131612.

