Deformable tissue reconstruction in endoscopy is vital for surgery, yet current methods struggle with high-fidelity reconstruction of irreversible tissue deformations. We present D4Recon, a novel framework for real-time and high-fidelity endoscopic reconstruction, addressing crucial challenges in surgical applications. D4Recon features dual-stage deformation modeling (spatial and temporal) and dual-scale depth guidance (hard and soft constraints) in a dynamic 3D Gaussian Splatting paradigm. Extensive experiments on diverse endoscopic datasets show that D4Recon achieves superior geometric coherence and photorealism with real-time rendering speed, outperforming existing methods in PSNR, SSIM, and LPIPS metrics.
 
    D4Recon workflow: Gaussians are initialized from the input endoscopic video, followed by dual-stage deformation modeling: a spatial deformation model corrects multiview inconsistencies, while a temporal model captures dynamic tissue interactions. Dual-scale depth guidance is introduced, combining hard (global) and soft (local) constraints to refine depth and preserve fine-grained color accuracy. Optimization is performed using spatiotemporal Score Distillation Sampling (SDS) losses and dual-scale depth guidance loss, enabling stable, high-fidelity 3D reconstructions.
 
              Quantitative Results: D4Recon achieves state-of-the-art performance on both dynamic and static endoscopic datasets. On the EndoNeRF and StereoMIS datasets, D4Recon outperforms prior methods in PSNR, SSIM, and LPIPS, while maintaining real-time rendering speeds. On static datasets, it demonstrates a significant improvement margin over NeRF, SLAM, and previous 3DGS-based methods. The results highlight D4Recon's ability to deliver sharper tissue boundaries, stable geometry, and robust depth estimation under occlusion and deformation.
 
               
              Qualitative Results: Visual comparisons show that D4Recon produces high-fidelity reconstructions with superior structural coherence and photorealism, mitigating artifacts and flickering common in previous approaches.
 
              Ablation of Key Components: Removing dual-stage deformation or dual-scale depth guidance leads to notable drops in reconstruction quality. The combination of spatial and temporal SDS losses, along with both hard and soft depth guidance, yields the best performance, demonstrating the importance of each component for robust reconstruction in dynamic surgical environments.
@inproceedings{basak2025d4recon,
  title={D4Recon: Dual-stage Deformation and Dual-scale Depth Guidance for Endoscopic Reconstruction},
  author={Basak, Hritam and Yin, Zhaozheng},
  booktitle={Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)},
  year={2025}
}