schedule | CMU 10-799 Diffusion & Flow Matching

Schedule Overview

This is a 7-week half-semester course (Mini 3) meeting Tuesdays and Thursdays for 80 minutes each. The schedule below is tentative and subject to changes.

Office Hours

Kelly hosts regular office hours every week in-person in Gates and virtually on Discord.

In-person: Wednesdays 1:00 PM - 2:00 PM, Gates 8th Floor common area near the printer
Virtual: Fridays 11:00 AM - 12:00 PM, Discord

Krish also hosts regular office hours in-person in Gates.

In-person: Tuesdays 4:00 PM - 5:00 PM, Gates 8th Floor common area near the printer

Lecture Schedule

Below is the tentative schedule of the course (subject to changes).

Lecture	Date	Topic	Resources	Deliverables
1	01/13	Basics of Probabilistic & Generative Modeling	Class overview slides Lecture 1 slides YouTube Video Panopto Recording (CMU only) 📖 View readings
2	01/15	Denoising Diffusion Models	Lecture 2 slides YouTube Video Panopto Recording (CMU only) 📖 View readings
3	01/16	Guest Lecture (Modal): How to train & serve your models on Modal	YouTube Video Panopto Recording (CMU only)
4	01/20	Score-Based Models	Lecture 4 slides YouTube Video Panopto Recording (CMU only) 📖 View readings	Quiz 1
5	01/22	Flow Matching	Lecture 5 slides YouTube Video Panopto Recording (CMU only) 📖 View readings	HW 1 (15%) Due 01/24 Sat Quiz 2
6	01/27	The Design Space of Diffusion Models & Solvers for Fast Sampling	Lecture 6 slides YouTube Video Panopto Recording (CMU only) 📖 View readings
7	01/29	Guidance & Controllable Generation	Lecture 7 slides YouTube Video Panopto Recording (CMU only) 📖 View readings	Quiz 3
8	02/03	Guest Lecture: Q&A with Max Simchowitz, Diffusion & Flow for Robotics, Control & Decision Making	Panopto Recording (CMU only) No YouTube sorry :(
9	02/05	SOTA Diffusion/Flow Models for Text-to-Image Generation	Lecture 9 slides YouTube Video Panopto Recording (CMU only) 📖 View readings	Quiz 4 HW 2 (15%) Due 02/05 Thur
10	02/10	Distillation, Consistency Models & Flow Maps	Lecture 10 slides YouTube Video Panopto Recording (CMU only) 📖 View readings	Quiz 5
11	02/12	Guest Lecture: Linqi (Alex) Zhou from Luma AI	Lecture 11 slides YouTube Video Panopto Recording (CMU only)	HW 3 (20%) Due 02/15 Sun
12	02/17	Discrete Diffusion & Masked Diffusion	Lecture 12 slides YouTube Video Panopto Recording (CMU only) 📖 View readings	Quiz 6
13	02/19	Discrete Flow Matching & Edit Flow	Lecture 13 slides YouTube Video Panopto Recording (CMU only) 📖 View readings	Quiz 7
14	02/24	No Class		Final Presentation (15%) Poster submission due 02/25 Wed
15	02/26	Final Poster Presentation		HW 4 (20%) Due 02/27 Fri

Readings and Resources by Lecture

Below are the related papers and tutorials for each lecture. All readings are optional and meant to be additional resources for you to deepen your understanding. The reading list will be updated throughout the class.

Lecture 1: Basics of Probabilistic & Generative Modeling

Tutorials

Stanford CS236: Deep Generative Models
Stefano Ermon, Aditya Grover
Stanford course on generative models including VAEs, GANs, EBMs, normalizing flows, diffusion models, and autoregressive models
CMU 10-423/10-623: Generative AI
Matt Gormley, Yuanzhi Li, Henry Chai, Pat Virtue, Aran Nayebi
CMU course on generative models including LLMs, GANs, and diffusion models.
CMU 18-789: Deep Generative Modeling
Beidi Chen, Xun Huang
CMU course on generative models including LLMs, VAEs, and diffusion models.
CMU 10-708: Probabilistic Graphical Models
Andrej Risteski, Albert Gu
CMU course that focuses on probabilistic modeling (including some deep generative models from a more theoretical perspective).
Stanford CS228: Probabilistic Graphical Models
Stefano Ermon
Stanford course that focuses on probabilistic modeling.
The Principles of Diffusion Models - Chapter 1: Deep Generative Modeling
Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon
Deep Learning - Chapter 3: Probability and Information Theory
Ian Goodfellow, Yoshua Bengio, Aaron Courville
Deep Learning - Chapter 20: Deep Generative Models
Ian Goodfellow, Yoshua Bengio, Aaron Courville
An Introduction to Variational Autoencoders
Diederik P Kingma, Max Welling
Tutorial paper on VAE
Tutorial on Variational Autoencoders
Carl Doersch
Another tutorial paper on VAE

Papers

Auto-Encoding Variational Bayes
Diederik P Kingma, Max Welling
The foundational VAE paper
Generative Adversarial Networks
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
The foundational GAN paper

Lecture 2: Denoising Diffusion Models

Tutorials

The Principles of Diffusion Models - Chapter 2: Variational Perspective: From VAEs to DDPMs
Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon
What are Diffusion Models?
Lilian Weng
Comprehensive blog post on diffusion
Understanding Diffusion Models: A Unified Perspective
Calvin Luo
Unifies VAEs, hierarchical VAEs, and diffusion models under a single framework.

Papers

Denoising Diffusion Probabilistic Models
Jonathan Ho, Ajay Jain, Pieter Abbeel
The foundational DDPM paper
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli
Original diffusion paper
Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras, Miika Aittala, Timo Aila, Samuli Laine
In-depth investigation on the design space of diffusion models

Lecture 4: Score-Based SDEs

Tutorials

Generative Modeling by Estimating Gradients of the Data Distribution
Yang Song
Blog post introduction from the score-based generative modeling perspective
The Principles of Diffusion Models - Appendix A: Crash Course on Differential Equations
Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon
Refresher on differential equations
The Principles of Diffusion Models - Chapter 3: Score-Based Perspective: From EBMs to NCSN
Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon
The Principles of Diffusion Models - Chapter 4: Diffusion Models Today: Score SDE Framework
Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon
Generative Modeling by Estimating Gradients of the Data Distribution
Andy Jones
Blog post on score matching

Papers

Estimation of Non-Normalized Statistical Models by Score Matching
Aapo Hyvärinen
Original score matching paper
A Connection Between Score Matching and Denoising Autoencoders
Pascal Vincent
Original desnoising score matching paper
Generative Modeling by Estimating Gradients of the Data Distribution
Yang Song, Stefano Ermon
The paper that proposed annealed Langevin dynamics and pointed out many common pitfalls of score-based models
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole
The foundational paper that unifies score-based models and diffusion models using SDEs

Lecture 5: Flow Matching

Tutorials

Flow Matching Guide and Code
Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky T. Q. Chen, David Lopez-Paz, Heli Ben-Hamu, Itai Gat
Comprehensive guide to flow matching with code examples and applications.
The Principles of Diffusion Models - Chapter 5: Flow-Based Perspective: From NFs to Flow Matching
Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon
MIT 6.S184: Introduction to Flow Matching and Diffusion Models
Peter Holderrieth, Ezra Erives
MIT class on diffusion and flow matching
An Introduction to Flow Matching
Tor Fjelde, Emile Mathieu, Vincent Dutordoir
Blog post introduction
Flow Matching: A visual introduction
Peter Roelants
Blog post with visualizations and code demos
Flow With What You Know
Scott H. Hawley
Blog post with visualizations, code demos and great intuition from physics

Papers

Flow Matching for Generative Modeling
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, Matt Le
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Xingchao Liu, Chengyue Gong, Qiang Liu
Building Normalizing Flows with Stochastic Interpolants
Michael S. Albergo, Eric Vanden-Eijnden

Lecture 6: The Design Space of Diffusion Models & Solvers for Fast Sampling

Papers

Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, Stefano Ermon
First fast deterministic sampling paper for diffusion models
DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
Fast high-order ODE solver for diffusion models
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
Improved solver with guided sampling support
Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras, Miika Aittala, Timo Aila, Samuli Laine
Systematic analysis of diffusion model design choices
Improved Denoising Diffusion Probabilistic Models
Alex Nichol, Prafulla Dhariwal
Improved DDPM with learned variance and cosine noise schedule
Variational Diffusion Models
Diederik P. Kingma, Tim Salimans, Ben Poole, Jonathan Ho
Continuous-time diffusion with learned noise schedule
Progressive Distillation for Fast Sampling of Diffusion Models
Tim Salimans, Jonathan Ho
Introduces v-prediction parameterization and progressive distillation

Lecture 7: Guidance & Controllable Generation

Papers

Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal, Alex Nichol
Introduces classifier guidance for conditional generation
Classifier-Free Diffusion Guidance
Jonathan Ho, Tim Salimans
Guidance method that combines conditional and unconditional models
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon
Training-free image editing by hijacking the diffusion process with user inputs
RePaint: Inpainting using Denoising Diffusion Probabilistic Models
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc Van Gool
Time traveling technique
Diffusion Posterior Sampling for General Noisy Inverse Problems
Hyungjin Chung, Jeongsol Kim, Michael T. McCann, Marc L. Klasky, Jong Chul Ye
Using off-the-shelf discriminative models for diffusion guidance
Manifold Preserving Guided Diffusion
Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, Stefano Ermon
Guidance method that preserves the data manifold during sampling, making the guided sampling faster and better
FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model
Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, Jian Zhang
Talks about which range of the time steps matter more for controllable generation
The Riemannian Geometry of Deep Generative Models
Hang Shao, Abhishek Kumar, P. Thomas Fletcher
Discovered that we can access the tangent space of the data manifold via the Jacobians of the decoder in an autoencoder
Improving Diffusion Models for Inverse Problems using Manifold Constraints
Hyungjin Chung, Byeongsu Sim, Dohoon Ryu, Jong Chul Ye
Discovered that diffusion manifolds are layered bubble shells

Lecture 9: SOTA Diffusion/Flow Models for Text-to-Image Generation

Papers

High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
The Stable Diffusion paper; introduces latent diffusion models that perform the diffusion process in the latent space of a pretrained autoencoder
Neural Discrete Representation Learning
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu
The VQ-VAE paper; introduces vector quantized variational autoencoders for learning discrete latent representations
Taming Transformers for High-Resolution Image Synthesis
Patrick Esser, Robin Rombach, Björn Ommer
The VQ-GAN paper; combines VQ-VAE with adversarial training and autoregressive transformers for high-resolution image synthesis
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
The CLIP paper; learns visual representations from natural language supervision, widely used as the text encoder in text-to-image models
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transfer Transformer
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
The T5 paper; a text-to-text transformer used as the text encoder in many text-to-image models including Stable Diffusion
Scalable Diffusion Models with Transformers
William Peebles, Saining Xie
The DiT paper; replaces the U-Net backbone with a transformer, establishing the foundation for modern diffusion transformer architectures
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, Robin Rombach
The Stable Diffusion 3 paper; combines rectified flow matching with a multimodal DiT architecture (MMDiT)
FLUX.1
Black Forest Labs
Rectified flow transformer for text-to-image generation; no official paper, open-weight model released August 2024
FLUX.2: Frontier Visual Intelligence
Black Forest Labs
32B flow matching transformer that couples a vision language model with a rectified flow transformer; supports multi-reference generation and up to 4MP output
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Z-Image Team, Tongyi
Efficient 6B-parameter model built on a single-stream DiT architecture; trained in only 314K H800 GPU hours
HunyuanImage 3.0 Technical Report
Tencent Hunyuan Foundation Model Team
Native multimodal MoE model with 80B+ total parameters (13B activated per token); unifies multimodal understanding and generation within an autoregressive framework
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy
Trains a single transformer over mixed-modality sequences by combining next-token prediction loss with diffusion loss
Nano Banana (Gemini 2.5 Flash Image)
Google DeepMind
Google's native multimodal image generation model built into Gemini; no official paper, released August 2025
Introducing 4o Image Generation
OpenAI
GPT-4o's native autoregressive image generation capability; no official paper, released March 2025

Lecture 10: Distillation, Consistency Models & Flow Maps

Papers

Progressive Distillation for Fast Sampling of Diffusion Models
Tim Salimans, Jonathan Ho
Introduces progressive distillation to reduce sampling steps by repeatedly halving the number of steps
Consistency Models
Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever
Proposes consistency models that map any point on the ODE trajectory to the trajectory's origin, enabling single-step generation
Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon
Generalizes consistency models to learn the full trajectory of the probability flow ODE
How to build a consistency model: Learning flow maps via self-distillation
Nicholas M. Boffi, Michael S. Albergo, Eric Vanden-Eijnden
Provides a theoretical framework for building consistency models via flow map learning and self-distillation
Align Your Flow: Scaling Continuous-Time Flow Map Distillation
Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis
Scales continuous-time flow map distillation for high-quality few-step generation
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
Xinyue Ai, Yutong He, Albert Gu, Ruslan Salakhutdinov, J. Zico Kolter, Nicholas M. Boffi, Max Simchowitz
Joint distillation framework that enables both fast sampling and likelihood evaluation in flow-based models

Lecture 12: Discrete Diffusion & Masked Diffusion

Tutorials

Discrete Diffusion (SEDD) blog post
Aaron Lou
Written by the first author of SEDD
MDLM project page and video tutorial
Subham Sekhar Sahoo
Blog post and video walkthrough by the MDLM authors
Notes on D3PMs
Christopher Beckham
Detailed walkthrough of D3PM derivations

Papers

Structured Denoising Diffusion Models in Discrete State-Spaces
Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, Rieck van den Berg
The D3PM paper; introduces a family of discrete diffusion models with structured transition matrices including absorbing (mask), uniform, and embedding-based diffusion
A Continuous Time Framework for Discrete Denoising Models
Andrew Campbell, Joe Benton, Valentin De Bortoli, Thomas Rainforth, George Deligiannidis, Arnaud Doucet
Extends discrete diffusion to continuous time using Continuous Time Markov Chains (CTMCs), enabling more principled training and sampling
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
Aaron Lou, Chenlin Meng, Stefano Ermon
The SEDD paper; introduces score entropy as a training objective for discrete diffusion by estimating ratios of the data distribution, analogous to score matching in continuous diffusion
Simple and Effective Masked Diffusion Language Models
Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, Volodymyr Kuleshov
Proposes MDLM, a simple masked diffusion language model with an efficient training objective and absorbing-state noise schedule that matches or outperforms autoregressive models on language benchmarks
Simplified and Generalized Masked Diffusion for Discrete Data
Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, Michalis K. Titsias
Unifies and simplifies masked diffusion models, showing that a simple masked diffusion objective generalizes prior work and yields strong performance on text generation
LLaDA: Large Language Diffusion with mAsking
Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, Chongxuan Li
A masked diffusion language model trained from scratch at scale that matches LLaMA3 8B in instruction following, demonstrating the viability of discrete diffusion for LLMs

Lecture 13: Discrete Flow Matching & Edit Flow

Papers

A Continuous Time Framework for Discrete Denoising Models
Andrew Campbell, Joe Benton, Valentin De Bortoli, Thomas Rainforth, George Deligiannidis, Arnaud Doucet
Extends discrete diffusion to continuous time using Continuous Time Markov Chains (CTMCs), enabling more principled training and sampling
Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design
Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, Tommi Jaakkola
Extends flow matching to discrete state-spaces using CTMCs; demonstrates applications in protein co-design
Discrete Flow Matching
Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, Yaron Lipman
Proposes a flow matching framework for discrete data that enables flexible noise schedules and competitive language modeling performance
Edit Flows: Flow Matching with Edit Operations
Marton Havasi, Brian Karrer, Itai Gat, Ricky T. Q. Chen
Introduces edit flows, which define flows on discrete state-spaces via structured edit operations (insert, delete, substitute), enabling more expressive discrete generative models
OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows
John Nguyen, Marton Havasi, Tariq Berrada, Luke Zettlemoyer, Ricky T. Q. Chen
Builds on edit flows to enable unified mixed-modal (text and image) generation and editing within a single model
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Marianne Arriola, Aaron Gokaslan, Justin T. Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo, Volodymyr Kuleshov
Introduces block diffusion, which interpolates between autoregressive and masked diffusion LMs by denoising blocks of tokens, enabling flexible context length and KV-cache compatibility
Simple and Effective Masked Diffusion Language Models
Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, Volodymyr Kuleshov
Proposes MDLM, a simple masked diffusion language model with an efficient training objective and absorbing-state noise schedule that matches or outperforms autoregressive models on language benchmarks