LatentCRF: Continuous CRF for Efficient Latent Diffusion

Kanchana Ranasinghe
Sadeep Jayasumana
Andreas Veit
Ayan Chakrabarti
Daniel Glasner
Michael S. Ryoo
Srikumar Ramalingam
Sanjiv Kumar



Intro Image
LatentCRF replaces multiple LDM reverse diffusion iterations with a lightweight conditional random field (CRF) module containing 10 times less parameters. The overall pipeline achieves a 33% inference speed-up. The LatentCRF architecture contains strong inductive biases motivated by natural image priors, which we attribute to its strong performance with lesser parameters.



Abstract

Latent Diffusion Models (LDMs) produce high-quality, photo-realistic images, however, the latency incurred by multiple costly inference iterations can restrict their applicability. We introduce LatentCRF, a continuous Conditional Random Field (CRF) model, implemented as a neural network layer, that models the spatial and semantic relationships among the latent vectors in the LDM. By replacing some of the computationally-intensive LDM inference iterations with our lightweight LatentCRF, we achieve a superior balance between quality, speed and diversity. We increase inference efficiency by 33% with no loss in image quality or diversity compared to the full LDM. LatentCRF is an easy add-on, which does not require modifying the LDM.



Methodology

We speed up reverse diffusion using a continuous Conditional Random Field (CRF) model formulated as a trainable neural network layer. Our continuous CRF layer is an order of magnitude less expensive than the LDM U-Net. Our method is easy to apply, in that it can be trained with relatively few resources, and does not require modifying the LDM. Moreover, it can be combined with other efficiency improving methods like alternate samplers or model compression. In fact, we apply LatentCRF over a DDIM scheduler based LDM.

Intro Image

As illustrated in the figure above, LatentCRF replaces several LDM reverse diffusion iterations a lightweight CRF module. While the CRF probabilistic inference involves iterative refinements, this is limited to under 5 iterations. The entire probabilistic inference process is still much faster than several LDM reverse diffusion iterations.

The following video describes our methodology in more detail.


We refer the reader to our paper for further details on our CRF continuous space formulation, energy functions, and training procedure.




Qualitative Results

LatentCRF retains Full-LDM Quality

Comparison Image


Effect of LatentCRF Inference

Diff Image


Diversity of Generated Images

For a common textual prompt, multiple images are generated with different starting noise values. A common noise tensor is used for all different methods. For the two example images below (use sliders to navigate), we use the following two textual prompts respectively:
  • A cinematic shot of a baby racoon wearing an intricate italian priest robe.
  • A plant with orange flowers shaped like stars.




Conclusion

We propose LatentCRF to learn spatial and semantic relationships among the latent feature vectors used for image generation. By replacing LDM iterations with LatentCRF we achieve a speedup of 33% with virtually no losses in image quality or diversity. By carefully analyzing LDM inference iterations, we identified an opportunity to replace costly LDM iterations with a lightweight CRF module. The inductive biases which we baked into the CRF help it learn to mimic multiple iterations of the LDMs U-Net, using an order of magnitude fewer parameters. Even though our method achieves better FID scores than the LDM model, we observe some loss patterns where LatentCRF produces artifacts, such as breaking lines in man-made structures. On the other hand, we also notice several instances of LatentCRF generating more natural looking images compared to the Full-LDM.




Citation

@inproceedings{Ranasinghe2024LatentCRF,
title={LatentCRF: Continuous CRF for Efficient Latent Diffusion},
author={Kanchana Ranasinghe and Sadeep Jayasumana and Andreas Veit and Ayan Chakrabarti and Daniel Glasner and Michael S. Ryoo and Srikumar Ramalingam and Sanjiv Kumar},
year={2024}
}
Citation copied to clipboard!