gQIR: Generative Quanta Image Reconstruction

CVPR 2026

University of Wisconsin-Madison1, Snap Inc.2

gQIR is a general ISP for color SPADs that leverages a large-scale T2I prior, to work even under extreme scene motion!

Abstract

Capturing high-quality images from only a few detected photons is a fundamental challenge in computational imaging. Single-photon avalanche diodes (SPADs) promise high-quality imaging in regimes where conventional cameras fail, but raw quanta frames contain only sparse, noisy, binary photon detections. Recovering a coherent image from a burst of such frames requires handling alignment, denoising, and demosaicing under noise statistics far outside those assumed by standard restoration pipelines or modern generative models. We present an approach that adapts large text-to-image latent diffusion models to the photon-limited domain of quanta burst imaging. Our method leverages the structural and semantic priors of internet-scale diffusion models while introducing mechanisms to handle Bernoulli photon statistics. By integrating latent-space restoration with burst-level spatio-temporal reasoning, our approach produces reconstructions that are both photometrically faithful and perceptually pleasing, even under extreme motion. We evaluate the method on synthetic benchmarks & new real-world datasets, including the first color SPAD dataset and a challenging eXtreme Deformable (XD) video benchmark. Across all settings, the approach substantially improves perceptual quality over classical & modern learning-based baselines, demonstrating the promise of adapting large generative priors to extreme photon-limited sensing.

Extreme Motion Imaging Unlocked!

gQIR enables high perceptually quality and plausibility under extreme motion and deformation!

Color-burst reconstructions from XD dataset (extreme motion and deforming scenes):

Color Burst Reconstructions of High Speed Scenes.

Monochrome comparisons with existing literature:

Monochrome Burst and comparisons with existing baselines.

Prompts For Semantic Editing of Photons

As a byproduct of our base generative prior, we can also use natural language to guide reconstruction in ambiguous regions.

Prompt Guided Semantic Photon Editing.

gQIR's Architecture

As shown, gQIR is built in 3 stages: a) Simultaneous Pre-degradation Removal and SPAD-Alignment, b) Adversarial distillation of SD2.1-zsnr-L5 for perceptual enhancement, and c) Burst extension for higher fidelity using our hybrid-3D FusionViT

model_architecture

BibTeX


      @InProceedings{garg_2026_gqir,
        author    = {Garg, Aryan and Ma, Sizhuo and  Gupta, Mohit},
        title     = {gQIR: Generative Quanta Image Reconstruction},
        booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
        month     = {June},
        year      = {2026},
    }