NVIDIA researchers released PiD, a unified pixel-diffusion decoder

The system merges latent decoding and upscaling into a single pixel-diffusion module, with authors reporting a sixfold speed gain over cascaded alternatives, though all results are self-reported and unreviewed.

Thursday, May 28, 2026 · min

NVIDIA Research on May 25 released PiD, a pixel-diffusion decoder that collapses latent decoding and image upscaling into a single generative module, alongside a paper, open-source code, and research model weights. The decoder was developed by the company’s Toronto AI Lab and supports 4× and 8× resolution increases.

The work addresses a structural bottleneck in high-resolution image generation. Standard latent diffusion models—Stable Diffusion, Flux, and others—first produce a low-resolution latent, decode it to pixels, and then route the result through a separate super-resolution network. PiD replaces this two-step cascade with a conditional pixel-diffusion process that decodes directly to the target output size, potentially cutting latency and pipeline hand-offs.

The architecture uses a lightweight sigma-aware adapter that injects noise-corrupted latents into a pixel diffusion backbone, enabling the module to work on latents that are only partially denoised. This allows an early exit from the base latent diffusion model, truncating inference before the full denoising schedule completes. The released checkpoints are distilled with DMD2 and run in four inference steps, according to the paper.

All speed and quality figures are self-reported by the authors. They state that decoding a 512×512 latent to a 2048×2048 image completes in under one second on an Nvidia RTX 5090 with 13 GB of peak memory, and as fast as 210 milliseconds on an Nvidia GB200 GPU. The team claims PiD is roughly six times faster than the cascaded diffusion-based super-resolution pipeline SeedVR2. Visual fidelity, the authors said, rated higher in judge evaluations, but no independent benchmarks have been published.

The code is available on GitHub under an Apache 2.0 license. Model weights were posted on Hugging Face under the non-commercial NSCLv1 license, with checkpoints for 4× upscaling of Flux, Flux2, Stable Diffusion 3, and DINOv2 latents, plus an 8× variant for SigLIP and Scale-RAE representations. The module operates on both conventional VAE latents and the semantic latents used in RAE-based models.

Because all performance and quality claims originate from the research team and have not been peer-reviewed or independently replicated, the results should be treated as preliminary. The non-commercial license precludes production deployment without a separate agreement, and the project page contains no product announcement. A training-data figure of 2.6 million high-quality images, mentioned by a Japanese tech outlet citing the paper, was not independently verified.

The release points to intensifying research interest in end-to-end diffusion decoders as demand for high-resolution imagery strains traditional multi-stage pipelines. Whether the reported gains hold under independent scrutiny will shape how quickly the approach gains traction in the broader research community.

— End —

NVIDIA researchers released PiD, a unified pixel-diffusion decoder

Related

NVIDIA releases Cosmos 3 physical AI models, with Edge variant still to come

Anthropic released Claude Opus 4.8, introducing effort controls and a faster preview mode

OpenAI closes Windows gap with Codex remote control and computer use

Alibaba released Qwen3.7-Max, a proprietary agent model