Course Resources

This page contains curated resources to support your learning throughout the course. Resources are organized into monographs, tutorials, and research papers. This list will be updated throughout the class.


📚 Books

Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon
2025
Comprehensive monograph covering diffusion models, flow matching, and transport-based generative modeling from first principles.

🎓 Courses

Stefano Ermon, Aditya Grover
2023
Stanford course on generative models including VAEs, GANs, EBMs, normalizing flows, diffusion models, and autoregressive models.
Matt Gormley, Yuanzhi Li, Henry Chai, Pat Virtue, Aran Nayebi
2025
CMU course on generative models including LLMs, GANs, and diffusion models.
Beidi Chen, Xun Huang
2025
CMU course on generative models including LLMs, VAEs, and diffusion models.
Peter Holderrieth, Ezra Erives
2025
MIT class on diffusion and flow matching from a flow-based theoretical perspective.
Andrej Risteski, Albert Gu
2025
CMU course that focuses on probabilistic modeling (including some deep generative models from a more theoretical perspective).
Stefano Ermon
2024
Stanford course that focuses on probabilistic modeling.

📝 Tutorials

Yang Song
2021
Introduction to score-based generative models and their connection to diffusion models.
Lilian Weng
2021
Comprehensive introduction to diffusion models with clear explanations and intuitive visualizations.
Calvin Luo
2022
Unifies VAEs, hierarchical VAEs, and diffusion models under a single framework.
Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky T. Q. Chen, David Lopez-Paz, Heli Ben-Hamu, Itai Gat
2024
Comprehensive guide to flow matching with code examples and applications.
Aaron Lou
2024
Written by the first author of SEDD (ICML 2024 Best Paper). Explains the motivation for discrete diffusion, the concrete score, score entropy loss, and sampling from first principles.
Subham Sekhar Sahoo
2024
Blog post and video walkthrough by the MDLM authors. Connects absorbing-state diffusion to masked language modeling (BERT); the video tutorial is especially helpful.
Christopher Beckham
2022
Detailed walkthrough of D3PM with the author's own clarifications. Great for understanding the transition matrix formulation and why the network predicts x₀ instead of x_{t-1}.

📄 Key Papers

Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli
2015
The original paper introducing diffusion probabilistic models for generative modeling.
Jonathan Ho, Ajay Jain, Pieter Abbeel
2020
Landmark paper that made diffusion models practical and effective for high-quality image generation.
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole
2021
Unifies score-based models and diffusion models using SDEs, enabling new sampling methods.
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, Matt Le
2022
Introduces flow matching as a simulation-free approach to training continuous normalizing flows.
Xingchao Liu, Chengyue Gong, Qiang Liu
2022
Introduces rectified flow to learn straight trajectories between distributions for fast sampling.
Michael S. Albergo, Eric Vanden-Eijnden
2022
General framework for building normalizing flows via stochastic interpolation between distributions.
Jiaming Song, Chenlin Meng, Stefano Ermon
2021
OG fast sampling algorithm paper for diffusion model.

🚀 Advanced Papers

Tero Karras, Miika Aittala, Timo Aila, Samuli Laine
2022
Systematic analysis of design choices in diffusion models with improved sampling.
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
2022
Fast high-order ODE solver for diffusion models.
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
2022
Improved solver with guided sampling support.
Alex Nichol, Prafulla Dhariwal
2021
Improved DDPM with learned variance and cosine noise schedule.
Diederik P. Kingma, Tim Salimans, Ben Poole, Jonathan Ho
2021
Continuous-time diffusion with learned noise schedule.
Tim Salimans, Jonathan Ho
2022
Introduces v-prediction parameterization and progressive distillation.
Jonathan Ho, Tim Salimans
2022
Guidance method that combines conditional and unconditional models.
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
2021
Introduces latent diffusion models, the foundation of Stable Diffusion.
Prafulla Dhariwal, Alex Nichol
2021
Introduces classifier guidance for conditional generation.
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon
2021
Training-free image editing by hijacking the diffusion process with user inputs.
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc Van Gool
2022
Time traveling technique for training-free inpainting.
Hyungjin Chung, Jeongsol Kim, Michael T. McCann, Marc L. Klasky, Jong Chul Ye
2022
Using off-the-shelf discriminative models for diffusion guidance.
Hyungjin Chung, Byeongsu Sim, Dohoon Ryu, Jong Chul Ye
2022
Discovered that diffusion manifolds are layered bubble shells.
Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, Jian Zhang
2023
ICCV 2023 - Talks about which range of the time steps matter more for controllable generation.
Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, Stefano Ermon
2023
Guidance method that preserves the data manifold during sampling, making the guided sampling faster and better.
Hang Shao, Abhishek Kumar, P. Thomas Fletcher
2017
Discovered that we can access the tangent space of the data manifold via the Jacobians of the decoder in an autoencoder.
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu
2017
Introduces vector quantized variational autoencoders for learning discrete latent representations.
Patrick Esser, Robin Rombach, Björn Ommer
2020
Combines VQ-VAE with adversarial training and autoregressive transformers for high-resolution image synthesis.
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
2021
Learns visual representations from natural language supervision, widely used as the text encoder in text-to-image models.
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
2019
A text-to-text transformer used as the text encoder in many text-to-image models including Stable Diffusion.
William Peebles, Saining Xie
2022
Replaces the U-Net backbone with a transformer, establishing the foundation for modern diffusion transformer architectures.
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, Robin Rombach
2024
Combines rectified flow matching with a multimodal DiT architecture (MMDiT).
Black Forest Labs
2024
Rectified flow transformer for text-to-image generation; no official paper, open-weight model released August 2024.
Black Forest Labs
2025
32B flow matching transformer that couples a vision language model with a rectified flow transformer; supports multi-reference generation and up to 4MP output.
Z-Image Team, Tongyi
2025
Efficient 6B-parameter model built on a single-stream DiT architecture; trained in only 314K H800 GPU hours.
Tencent Hunyuan Foundation Model Team
2025
Native multimodal MoE model with 80B+ total parameters (13B activated per token); unifies multimodal understanding and generation within an autoregressive framework.
Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy
2024
Trains a single transformer over mixed-modality sequences by combining next-token prediction loss with diffusion loss.
Google DeepMind
2025
Google's native multimodal image generation model built into Gemini.
OpenAI
2025
GPT-4o's native autoregressive image generation capability.
Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever
2023
New family of models that enable single-step generation while maintaining sample quality.
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon
2023
Generalizes consistency models to learn the full trajectory of the probability flow ODE.
Nicholas M. Boffi, Michael S. Albergo, Eric Vanden-Eijnden
2025
Provides a theoretical framework for building consistency models via flow map learning and self-distillation.
Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis
2025
Scales continuous-time flow map distillation for high-quality few-step generation.
Xinyue Ai, Yutong He, Albert Gu, Ruslan Salakhutdinov, J. Zico Kolter, Nicholas M. Boffi, Max Simchowitz
2025
Joint distillation framework that enables both fast sampling and likelihood evaluation in flow-based models.
Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, Rieck van den Berg
2021
The D3PM paper; introduces a family of discrete diffusion models with structured transition matrices including absorbing (mask), uniform, and embedding-based diffusion.
Andrew Campbell, Joe Benton, Valentin De Bortoli, Thomas Rainforth, George Deligiannidis, Arnaud Doucet
2022
Extends discrete diffusion to continuous time using Continuous Time Markov Chains (CTMCs), enabling more principled training and sampling.
Aaron Lou, Chenlin Meng, Stefano Ermon
2023
The SEDD paper (ICML 2024 Best Paper); introduces score entropy as a training objective for discrete diffusion by estimating ratios of the data distribution, analogous to score matching in continuous diffusion.
Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, Volodymyr Kuleshov
2024
Proposes MDLM, a simple masked diffusion language model with an efficient training objective and absorbing-state noise schedule that matches or outperforms autoregressive models on language benchmarks.
Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, Michalis K. Titsias
2024
Unifies and simplifies masked diffusion models, showing that a simple masked diffusion objective generalizes prior work and yields strong performance on text generation.
Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, Chongxuan Li
2025
A masked diffusion language model trained from scratch at scale that matches LLaMA3 8B in instruction following, demonstrating the viability of discrete diffusion for LLMs.
Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, Tommi Jaakkola
2024
Extends flow matching to discrete state-spaces using CTMCs; demonstrates applications in protein co-design.
Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, Yaron Lipman
2024
Proposes a flow matching framework for discrete data that enables flexible noise schedules and competitive language modeling performance.
Marton Havasi, Brian Karrer, Itai Gat, Ricky T. Q. Chen
2025
Introduces edit flows, which define flows on discrete state-spaces via structured edit operations (insert, delete, substitute), enabling more expressive discrete generative models.
John Nguyen, Marton Havasi, Tariq Berrada, Luke Zettlemoyer, Ricky T. Q. Chen
2025
Builds on edit flows to enable unified mixed-modal (text and image) generation and editing within a single model.
Marianne Arriola, Aaron Gokaslan, Justin T. Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo, Volodymyr Kuleshov
2025
Introduces block diffusion, which interpolates between autoregressive and masked diffusion LMs by denoising blocks of tokens, enabling flexible context length and KV-cache compatibility.

Compute Resources

Access to GPU compute resources is essential for training your generative models. Below are the available platforms with tutorials and guides to help you get started.

Modal

Available
Serverless cloud computing platform for running GPU workloads

AWS (Amazon Web Services)

Coming Soon
Cloud computing platform with EC2 GPU instances

SLURM Clusters

Available
Tutorials for navigating slurm-based clusters (like the ones we use at CMU)

Additional Resources

  • Discord: Communication server for discussions, questions and collaboration
  • Office Hours: In-person and virtual communication directly with the instructor, check the home page and the schedule page for the time and locations.

Note: This list will be updated throughout the course. Additional readings specific to each lecture can be found on the schedule page.