Course Resources

This page contains curated resources to support your learning throughout the course. Resources are organized into monographs, tutorials, and research papers. This list will be updated throughout the class.


📚 Books

Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon
2025
Comprehensive monograph covering diffusion models, flow matching, and transport-based generative modeling from first principles.

🎓 Courses

Stefano Ermon, Aditya Grover
2023
Stanford course on generative models including VAEs, GANs, EBMs, normalizing flows, diffusion models, and autoregressive models.
Matt Gormley, Yuanzhi Li, Henry Chai, Pat Virtue, Aran Nayebi
2025
CMU course on generative models including LLMs, GANs, and diffusion models.
Beidi Chen, Xun Huang
2025
CMU course on generative models including LLMs, VAEs, and diffusion models.
Peter Holderrieth, Ezra Erives
2025
MIT class on diffusion and flow matching from a flow-based theoretical perspective.
Andrej Risteski, Albert Gu
2025
CMU course that focuses on probabilistic modeling (including some deep generative models from a more theoretical perspective).
Stefano Ermon
2024
Stanford course that focuses on probabilistic modeling.

📝 Tutorials

Yang Song
2021
Introduction to score-based generative models and their connection to diffusion models.
Lilian Weng
2021
Comprehensive introduction to diffusion models with clear explanations and intuitive visualizations.
Calvin Luo
2022
Unifies VAEs, hierarchical VAEs, and diffusion models under a single framework.
Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky T. Q. Chen, David Lopez-Paz, Heli Ben-Hamu, Itai Gat
2024
Comprehensive guide to flow matching with code examples and applications.

📄 Key Papers

Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli
2015
The original paper introducing diffusion probabilistic models for generative modeling.
Jonathan Ho, Ajay Jain, Pieter Abbeel
2020
Landmark paper that made diffusion models practical and effective for high-quality image generation.
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole
2021
Unifies score-based models and diffusion models using SDEs, enabling new sampling methods.
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, Matt Le
2022
Introduces flow matching as a simulation-free approach to training continuous normalizing flows.
Xingchao Liu, Chengyue Gong, Qiang Liu
2022
Introduces rectified flow to learn straight trajectories between distributions for fast sampling.
Michael S. Albergo, Eric Vanden-Eijnden
2022
General framework for building normalizing flows via stochastic interpolation between distributions.
Jiaming Song, Chenlin Meng, Stefano Ermon
2021
OG fast sampling algorithm paper for diffusion model.

🚀 Advanced Papers

Tero Karras, Miika Aittala, Timo Aila, Samuli Laine
2022
Systematic analysis of design choices in diffusion models with improved sampling.
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
2022
Fast high-order ODE solver for diffusion models.
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
2022
Improved solver with guided sampling support.
Alex Nichol, Prafulla Dhariwal
2021
Improved DDPM with learned variance and cosine noise schedule.
Diederik P. Kingma, Tim Salimans, Ben Poole, Jonathan Ho
2021
Continuous-time diffusion with learned noise schedule.
Tim Salimans, Jonathan Ho
2022
Introduces v-prediction parameterization and progressive distillation.
Jonathan Ho, Tim Salimans
2022
Guidance method that combines conditional and unconditional models.
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
2021
Introduces latent diffusion models, the foundation of Stable Diffusion.
Prafulla Dhariwal, Alex Nichol
2021
Introduces classifier guidance for conditional generation.
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon
2021
Training-free image editing by hijacking the diffusion process with user inputs.
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc Van Gool
2022
Time traveling technique for training-free inpainting.
Hyungjin Chung, Jeongsol Kim, Michael T. McCann, Marc L. Klasky, Jong Chul Ye
2022
Using off-the-shelf discriminative models for diffusion guidance.
Hyungjin Chung, Byeongsu Sim, Dohoon Ryu, Jong Chul Ye
2022
Discovered that diffusion manifolds are layered bubble shells.
Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, Jian Zhang
2023
ICCV 2023 - Talks about which range of the time steps matter more for controllable generation.
Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, Stefano Ermon
2023
Guidance method that preserves the data manifold during sampling, making the guided sampling faster and better.
Hang Shao, Abhishek Kumar, P. Thomas Fletcher
2017
Discovered that we can access the tangent space of the data manifold via the Jacobians of the decoder in an autoencoder.
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu
2017
Introduces vector quantized variational autoencoders for learning discrete latent representations.
Patrick Esser, Robin Rombach, Björn Ommer
2020
Combines VQ-VAE with adversarial training and autoregressive transformers for high-resolution image synthesis.
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
2021
Learns visual representations from natural language supervision, widely used as the text encoder in text-to-image models.
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
2019
A text-to-text transformer used as the text encoder in many text-to-image models including Stable Diffusion.
William Peebles, Saining Xie
2022
Replaces the U-Net backbone with a transformer, establishing the foundation for modern diffusion transformer architectures.
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, Robin Rombach
2024
Combines rectified flow matching with a multimodal DiT architecture (MMDiT).
Black Forest Labs
2024
Rectified flow transformer for text-to-image generation; no official paper, open-weight model released August 2024.
Black Forest Labs
2025
32B flow matching transformer that couples a vision language model with a rectified flow transformer; supports multi-reference generation and up to 4MP output.
Z-Image Team, Tongyi
2025
Efficient 6B-parameter model built on a single-stream DiT architecture; trained in only 314K H800 GPU hours.
Tencent Hunyuan Foundation Model Team
2025
Native multimodal MoE model with 80B+ total parameters (13B activated per token); unifies multimodal understanding and generation within an autoregressive framework.
Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy
2024
Trains a single transformer over mixed-modality sequences by combining next-token prediction loss with diffusion loss.
Google DeepMind
2025
Google's native multimodal image generation model built into Gemini.
OpenAI
2025
GPT-4o's native autoregressive image generation capability.
Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever
2023
New family of models that enable single-step generation while maintaining sample quality.
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon
2023
Generalizes consistency models to learn the full trajectory of the probability flow ODE.
Nicholas M. Boffi, Michael S. Albergo, Eric Vanden-Eijnden
2025
Provides a theoretical framework for building consistency models via flow map learning and self-distillation.
Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis
2025
Scales continuous-time flow map distillation for high-quality few-step generation.
Xinyue Ai, Yutong He, Albert Gu, Ruslan Salakhutdinov, J. Zico Kolter, Nicholas M. Boffi, Max Simchowitz
2025
Joint distillation framework that enables both fast sampling and likelihood evaluation in flow-based models.

Compute Resources

Access to GPU compute resources is essential for training your generative models. Below are the available platforms with tutorials and guides to help you get started.

Modal

Available
Serverless cloud computing platform for running GPU workloads

AWS (Amazon Web Services)

Coming Soon
Cloud computing platform with EC2 GPU instances

SLURM Clusters

Available
Tutorials for navigating slurm-based clusters (like the ones we use at CMU)

Additional Resources

  • Discord: Communication server for discussions, questions and collaboration
  • Office Hours: In-person and virtual communication directly with the instructor, check the home page and the schedule page for the time and locations.

Note: This list will be updated throughout the course. Additional readings specific to each lecture can be found on the schedule page.