resources | CMU 10-799 Diffusion & Flow Matching

Course Resources

This page contains curated resources to support your learning throughout the course. Resources are organized into monographs, tutorials, and research papers. This list will be updated throughout the class.

📚 Books

The Principles of Diffusion Models

Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon

2025

Comprehensive monograph covering diffusion models, flow matching, and transport-based generative modeling from first principles.

Website • arXiv • PDF • Blog Post

🎓 Courses

Stanford CS236: Deep Generative Models

Stefano Ermon, Aditya Grover

2023

Stanford course on generative models including VAEs, GANs, EBMs, normalizing flows, diffusion models, and autoregressive models.

Course Website • Lecture Notes • Video Recordings

CMU 10-423/10-623: Generative AI

Matt Gormley, Yuanzhi Li, Henry Chai, Pat Virtue, Aran Nayebi

2025

CMU course on generative models including LLMs, GANs, and diffusion models.

Course Website • Video Recordings

CMU 18-789: Deep Generative Modeling

Beidi Chen, Xun Huang

2025

CMU course on generative models including LLMs, VAEs, and diffusion models.

Course Website

MIT 6.S184: Introduction to Flow Matching and Diffusion Models

Peter Holderrieth, Ezra Erives

2025

MIT class on diffusion and flow matching from a flow-based theoretical perspective.

Course Website • Lecture Notes • Video Recordings

CMU 10-708: Probabilistic Graphical Models

Andrej Risteski, Albert Gu

2025

CMU course that focuses on probabilistic modeling (including some deep generative models from a more theoretical perspective).

Course Website

Stanford CS228: Probabilistic Graphical Models

Stefano Ermon

2024

Stanford course that focuses on probabilistic modeling.

Course Website • Lecture Notes

📝 Tutorials

Generative Modeling by Estimating Gradients of the Data Distribution

Yang Song

2021

Introduction to score-based generative models and their connection to diffusion models.

What are Diffusion Models?

Lilian Weng

2021

Comprehensive introduction to diffusion models with clear explanations and intuitive visualizations.

Understanding Diffusion Models: A Unified Perspective

Calvin Luo

2022

Unifies VAEs, hierarchical VAEs, and diffusion models under a single framework.

arXiv • Blog Post

Flow Matching Guide and Code

Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky T. Q. Chen, David Lopez-Paz, Heli Ben-Hamu, Itai Gat

2024

Comprehensive guide to flow matching with code examples and applications.

arXiv • PDF

Discrete Diffusion (SEDD) blog post

Aaron Lou

2024

Written by the first author of SEDD (ICML 2024 Best Paper). Explains the motivation for discrete diffusion, the concrete score, score entropy loss, and sampling from first principles.

MDLM project page and video tutorial

Subham Sekhar Sahoo

2024

Blog post and video walkthrough by the MDLM authors. Connects absorbing-state diffusion to masked language modeling (BERT); the video tutorial is especially helpful.

Notes on D3PMs

Christopher Beckham

2022

Detailed walkthrough of D3PM with the author's own clarifications. Great for understanding the transition matrix formulation and why the network predicts x₀ instead of x_{t-1}.

📄 Key Papers

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli

2015

The original paper introducing diffusion probabilistic models for generative modeling.

arXiv • PDF

Denoising Diffusion Probabilistic Models (DDPM)

Jonathan Ho, Ajay Jain, Pieter Abbeel

2020

Landmark paper that made diffusion models practical and effective for high-quality image generation.

arXiv • Code

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

2021

Unifies score-based models and diffusion models using SDEs, enabling new sampling methods.

arXiv • Project Page

Flow Matching for Generative Modeling

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, Matt Le

2022

Introduces flow matching as a simulation-free approach to training continuous normalizing flows.

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, Qiang Liu

2022

Introduces rectified flow to learn straight trajectories between distributions for fast sampling.

Building Normalizing Flows with Stochastic Interpolants

Michael S. Albergo, Eric Vanden-Eijnden

2022

General framework for building normalizing flows via stochastic interpolation between distributions.

Denoising Diffusion Implicit Models (DDIM)

Jiaming Song, Chenlin Meng, Stefano Ermon

2021

OG fast sampling algorithm paper for diffusion model.

arXiv • Code

🚀 Advanced Papers

Elucidating the Design Space of Diffusion-Based Generative Models (EDM)

Tero Karras, Miika Aittala, Timo Aila, Samuli Laine

2022

Systematic analysis of design choices in diffusion models with improved sampling.

DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu

2022

Fast high-order ODE solver for diffusion models.

arXiv

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu

2022

Improved solver with guided sampling support.

arXiv

Improved Denoising Diffusion Probabilistic Models

Alex Nichol, Prafulla Dhariwal

2021

Improved DDPM with learned variance and cosine noise schedule.

arXiv

Variational Diffusion Models

Diederik P. Kingma, Tim Salimans, Ben Poole, Jonathan Ho

2021

Continuous-time diffusion with learned noise schedule.

arXiv

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans, Jonathan Ho

2022

Introduces v-prediction parameterization and progressive distillation.

arXiv

Classifier-Free Diffusion Guidance

Jonathan Ho, Tim Salimans

2022

Guidance method that combines conditional and unconditional models.

High-Resolution Image Synthesis with Latent Diffusion Models (Stable Diffusion)

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer

2021

Introduces latent diffusion models, the foundation of Stable Diffusion.

Diffusion Models Beat GANs on Image Synthesis

Prafulla Dhariwal, Alex Nichol

2021

Introduces classifier guidance for conditional generation.

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon

2021

Training-free image editing by hijacking the diffusion process with user inputs.

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc Van Gool

2022

Time traveling technique for training-free inpainting.

Diffusion Posterior Sampling for General Noisy Inverse Problems

Hyungjin Chung, Jeongsol Kim, Michael T. McCann, Marc L. Klasky, Jong Chul Ye

2022

Using off-the-shelf discriminative models for diffusion guidance.

Improving Diffusion Models for Inverse Problems using Manifold Constraints

Hyungjin Chung, Byeongsu Sim, Dohoon Ryu, Jong Chul Ye

2022

Discovered that diffusion manifolds are layered bubble shells.

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, Jian Zhang

2023

ICCV 2023 - Talks about which range of the time steps matter more for controllable generation.

Manifold Preserving Guided Diffusion

Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, Stefano Ermon

2023

Guidance method that preserves the data manifold during sampling, making the guided sampling faster and better.

The Riemannian Geometry of Deep Generative Models

Hang Shao, Abhishek Kumar, P. Thomas Fletcher

2017

Discovered that we can access the tangent space of the data manifold via the Jacobians of the decoder in an autoencoder.

Neural Discrete Representation Learning (VQ-VAE)

Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu

2017

Introduces vector quantized variational autoencoders for learning discrete latent representations.

Taming Transformers for High-Resolution Image Synthesis (VQ-GAN)

Patrick Esser, Robin Rombach, Björn Ommer

2020

Combines VQ-VAE with adversarial training and autoregressive transformers for high-resolution image synthesis.

Learning Transferable Visual Models From Natural Language Supervision (CLIP)

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever

2021

Learns visual representations from natural language supervision, widely used as the text encoder in text-to-image models.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transfer Transformer (T5)

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

2019

A text-to-text transformer used as the text encoder in many text-to-image models including Stable Diffusion.

Scalable Diffusion Models with Transformers (DiT)

William Peebles, Saining Xie

2022

Replaces the U-Net backbone with a transformer, establishing the foundation for modern diffusion transformer architectures.

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (Stable Diffusion 3)

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, Robin Rombach

2024

Combines rectified flow matching with a multimodal DiT architecture (MMDiT).

FLUX.1

Black Forest Labs

2024

Rectified flow transformer for text-to-image generation; no official paper, open-weight model released August 2024.

FLUX.2: Frontier Visual Intelligence

Black Forest Labs

2025

32B flow matching transformer that couples a vision language model with a rectified flow transformer; supports multi-reference generation and up to 4MP output.

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Z-Image Team, Tongyi

2025

Efficient 6B-parameter model built on a single-stream DiT architecture; trained in only 314K H800 GPU hours.

HunyuanImage 3.0 Technical Report

Tencent Hunyuan Foundation Model Team

2025

Native multimodal MoE model with 80B+ total parameters (13B activated per token); unifies multimodal understanding and generation within an autoregressive framework.

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy

2024

Trains a single transformer over mixed-modality sequences by combining next-token prediction loss with diffusion loss.

Nano Banana (Gemini 2.5 Flash Image)

Google DeepMind

2025

Google's native multimodal image generation model built into Gemini.

Introducing 4o Image Generation (GPT-4o)

OpenAI

2025

GPT-4o's native autoregressive image generation capability.

Consistency Models

Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever

2023

New family of models that enable single-step generation while maintaining sample quality.

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon

2023

Generalizes consistency models to learn the full trajectory of the probability flow ODE.

How to build a consistency model: Learning flow maps via self-distillation

Nicholas M. Boffi, Michael S. Albergo, Eric Vanden-Eijnden

2025

Provides a theoretical framework for building consistency models via flow map learning and self-distillation.

Align Your Flow: Scaling Continuous-Time Flow Map Distillation

Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

2025

Scales continuous-time flow map distillation for high-quality few-step generation.

Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models

Xinyue Ai, Yutong He, Albert Gu, Ruslan Salakhutdinov, J. Zico Kolter, Nicholas M. Boffi, Max Simchowitz

2025

Joint distillation framework that enables both fast sampling and likelihood evaluation in flow-based models.

Structured Denoising Diffusion Models in Discrete State-Spaces

Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, Rieck van den Berg

2021

The D3PM paper; introduces a family of discrete diffusion models with structured transition matrices including absorbing (mask), uniform, and embedding-based diffusion.

A Continuous Time Framework for Discrete Denoising Models

Andrew Campbell, Joe Benton, Valentin De Bortoli, Thomas Rainforth, George Deligiannidis, Arnaud Doucet

2022

Extends discrete diffusion to continuous time using Continuous Time Markov Chains (CTMCs), enabling more principled training and sampling.

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, Stefano Ermon

2023

The SEDD paper (ICML 2024 Best Paper); introduces score entropy as a training objective for discrete diffusion by estimating ratios of the data distribution, analogous to score matching in continuous diffusion.

Simple and Effective Masked Diffusion Language Models

Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, Volodymyr Kuleshov

2024

Proposes MDLM, a simple masked diffusion language model with an efficient training objective and absorbing-state noise schedule that matches or outperforms autoregressive models on language benchmarks.

Simplified and Generalized Masked Diffusion for Discrete Data

Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, Michalis K. Titsias

2024

Unifies and simplifies masked diffusion models, showing that a simple masked diffusion objective generalizes prior work and yields strong performance on text generation.

LLaDA: Large Language Diffusion with mAsking

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, Chongxuan Li

2025

A masked diffusion language model trained from scratch at scale that matches LLaMA3 8B in instruction following, demonstrating the viability of discrete diffusion for LLMs.

Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design

Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, Tommi Jaakkola

2024

Extends flow matching to discrete state-spaces using CTMCs; demonstrates applications in protein co-design.

Discrete Flow Matching

Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, Yaron Lipman

2024

Proposes a flow matching framework for discrete data that enables flexible noise schedules and competitive language modeling performance.

Edit Flows: Flow Matching with Edit Operations

Marton Havasi, Brian Karrer, Itai Gat, Ricky T. Q. Chen

2025

Introduces edit flows, which define flows on discrete state-spaces via structured edit operations (insert, delete, substitute), enabling more expressive discrete generative models.

OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows

John Nguyen, Marton Havasi, Tariq Berrada, Luke Zettlemoyer, Ricky T. Q. Chen

2025

Builds on edit flows to enable unified mixed-modal (text and image) generation and editing within a single model.

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Marianne Arriola, Aaron Gokaslan, Justin T. Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo, Volodymyr Kuleshov

2025

Introduces block diffusion, which interpolates between autoregressive and masked diffusion LMs by denoising blocks of tokens, enabling flexible context length and KV-cache compatibility.

Compute Resources

Access to GPU compute resources is essential for training your generative models. Below are the available platforms with tutorials and guides to help you get started.

Modal

Available

Serverless cloud computing platform for running GPU workloads

Docs Video Tutorial Example Code

AWS (Amazon Web Services)

Coming Soon

Cloud computing platform with EC2 GPU instances

Docs Video Tutorial Setup Guide

SLURM Clusters

Available

Tutorials for navigating slurm-based clusters (like the ones we use at CMU)

Video Tutorial Example Scripts

Additional Resources

Discord: Communication server for discussions, questions and collaboration
Office Hours: In-person and virtual communication directly with the instructor, check the home page and the schedule page for the time and locations.

Note: This list will be updated throughout the course. Additional readings specific to each lecture can be found on the schedule page.