Schedule Overview
This is a 7-week half-semester course (Mini 3) meeting Tuesdays and Thursdays for 80 minutes each. The schedule below is tentative and subject to changes.
Office Hours
Kelly hosts regular office hours every week in-person in Gates and virtually on Discord.
- In-person: Wednesdays 1:00 PM - 2:00 PM, Gates 8th Floor common area near the printer
- Virtual: Fridays 11:00 AM - 12:00 PM, Discord
Krish also hosts regular office hours in-person in Gates.
- In-person: Tuesdays 4:00 PM - 5:00 PM, Gates 8th Floor common area near the printer
Lecture Schedule
Below is the tentative schedule of the course (subject to changes).
| Lecture | Date | Topic | Resources | Deliverables |
|---|---|---|---|---|
| 1 | 01/13 | Basics of Probabilistic & Generative Modeling | 📖 View readings | |
| 2 | 01/15 | Denoising Diffusion Models | 📖 View readings | |
| 3 | 01/16 | Sponsor Lecture (Modal): How to train & serve your models on Modal | ||
| 4 | 01/20 | Score-Based Models | 📖 View readings |
|
| 5 | 01/22 | Flow Matching | 📖 View readings |
|
| 6 | 01/27 | The Design Space of Diffusion Models & Solvers for Fast Sampling | 📖 View readings | |
| 7 | 01/29 | Guidance & Controllable Generation | 📖 View readings |
|
| 8 | 02/03 | Guest Lecture: Q&A with Max Simchowitz, Diffusion & Flow for Robotics, Control & Decision Making |
| |
| 9 | 02/05 | SOTA Diffusion/Flow Models for Text-to-Image Generation | 📖 View readings |
|
| 10 | 02/10 | Distillation, Consistency Models & Flow Maps | 📖 View readings |
|
| 11 | 02/12 | Guest Lecture: Linqi (Alex) Zhou from Luma AI |
| |
| 12 | 02/17 | Discrete Diffusion & Masked Diffusion |
| |
| 13 | 02/19 | Discrete Flow Matching & Edit Flow |
| |
| 14 | 02/24 | No Class |
| |
| 15 | 02/26 | Final Poster Presentation |
|
Readings and Resources by Lecture
Below are the related papers and tutorials for each lecture. All readings are optional and meant to be additional resources for you to deepen your understanding. The reading list will be updated throughout the class.
Lecture 1: Basics of Probabilistic & Generative Modeling
Tutorials
- Stanford CS236: Deep Generative Models
Stanford course on generative models including VAEs, GANs, EBMs, normalizing flows, diffusion models, and autoregressive models - CMU 10-423/10-623: Generative AI
CMU course on generative models including LLMs, GANs, and diffusion models. - CMU 18-789: Deep Generative Modeling
CMU course on generative models including LLMs, VAEs, and diffusion models. - CMU 10-708: Probabilistic Graphical Models
CMU course that focuses on probabilistic modeling (including some deep generative models from a more theoretical perspective). - Stanford CS228: Probabilistic Graphical Models
Stanford course that focuses on probabilistic modeling. - The Principles of Diffusion Models - Chapter 1: Deep Generative Modeling
- Deep Learning - Chapter 3: Probability and Information Theory
- Deep Learning - Chapter 20: Deep Generative Models
- An Introduction to Variational Autoencoders
Tutorial paper on VAE - Tutorial on Variational Autoencoders
Another tutorial paper on VAE
Papers
- Auto-Encoding Variational Bayes
The foundational VAE paper - Generative Adversarial Networks
The foundational GAN paper
Lecture 2: Denoising Diffusion Models
Tutorials
- The Principles of Diffusion Models - Chapter 2: Variational Perspective: From VAEs to DDPMs
- What are Diffusion Models?
Comprehensive blog post on diffusion - Understanding Diffusion Models: A Unified Perspective
Unifies VAEs, hierarchical VAEs, and diffusion models under a single framework.
Papers
- Denoising Diffusion Probabilistic Models
The foundational DDPM paper - Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Original diffusion paper - Elucidating the Design Space of Diffusion-Based Generative Models
In-depth investigation on the design space of diffusion models
Lecture 4: Score-Based SDEs
Tutorials
- Generative Modeling by Estimating Gradients of the Data Distribution
Blog post introduction from the score-based generative modeling perspective - The Principles of Diffusion Models - Appendix A: Crash Course on Differential Equations
Refresher on differential equations - The Principles of Diffusion Models - Chapter 3: Score-Based Perspective: From EBMs to NCSN
- The Principles of Diffusion Models - Chapter 4: Diffusion Models Today: Score SDE Framework
- Generative Modeling by Estimating Gradients of the Data Distribution
Blog post on score matching
Papers
- Estimation of Non-Normalized Statistical Models by Score Matching
Original score matching paper - A Connection Between Score Matching and Denoising Autoencoders
Original desnoising score matching paper - Generative Modeling by Estimating Gradients of the Data Distribution
The paper that proposed annealed Langevin dynamics and pointed out many common pitfalls of score-based models - Score-Based Generative Modeling through Stochastic Differential Equations
The foundational paper that unifies score-based models and diffusion models using SDEs
Lecture 5: Flow Matching
Tutorials
- Flow Matching Guide and Code
Comprehensive guide to flow matching with code examples and applications. - The Principles of Diffusion Models - Chapter 5: Flow-Based Perspective: From NFs to Flow Matching
- MIT 6.S184: Introduction to Flow Matching and Diffusion Models
MIT class on diffusion and flow matching - An Introduction to Flow Matching
Blog post introduction - Flow Matching: A visual introduction
Blog post with visualizations and code demos - Flow With What You Know
Blog post with visualizations, code demos and great intuition from physics
Lecture 6: The Design Space of Diffusion Models & Solvers for Fast Sampling
Papers
- Denoising Diffusion Implicit Models
First fast deterministic sampling paper for diffusion models - DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps
Fast high-order ODE solver for diffusion models - DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models
Improved solver with guided sampling support - Elucidating the Design Space of Diffusion-Based Generative Models
Systematic analysis of diffusion model design choices - Improved Denoising Diffusion Probabilistic Models
Improved DDPM with learned variance and cosine noise schedule - Variational Diffusion Models
Continuous-time diffusion with learned noise schedule - Progressive Distillation for Fast Sampling of Diffusion Models
Introduces v-prediction parameterization and progressive distillation
Lecture 7: Guidance & Controllable Generation
Papers
- Diffusion Models Beat GANs on Image Synthesis
Introduces classifier guidance for conditional generation - Classifier-Free Diffusion Guidance
Guidance method that combines conditional and unconditional models - SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Training-free image editing by hijacking the diffusion process with user inputs - RePaint: Inpainting using Denoising Diffusion Probabilistic Models
Time traveling technique - Diffusion Posterior Sampling for General Noisy Inverse Problems
Using off-the-shelf discriminative models for diffusion guidance - Manifold Preserving Guided Diffusion
Guidance method that preserves the data manifold during sampling, making the guided sampling faster and better - FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model
Talks about which range of the time steps matter more for controllable generation - The Riemannian Geometry of Deep Generative Models
Discovered that we can access the tangent space of the data manifold via the Jacobians of the decoder in an autoencoder - Improving Diffusion Models for Inverse Problems using Manifold Constraints
Discovered that diffusion manifolds are layered bubble shells
Lecture 9: SOTA Diffusion/Flow Models for Text-to-Image Generation
Papers
- High-Resolution Image Synthesis with Latent Diffusion Models
The Stable Diffusion paper; introduces latent diffusion models that perform the diffusion process in the latent space of a pretrained autoencoder - Neural Discrete Representation Learning
The VQ-VAE paper; introduces vector quantized variational autoencoders for learning discrete latent representations - Taming Transformers for High-Resolution Image Synthesis
The VQ-GAN paper; combines VQ-VAE with adversarial training and autoregressive transformers for high-resolution image synthesis - Learning Transferable Visual Models From Natural Language Supervision
The CLIP paper; learns visual representations from natural language supervision, widely used as the text encoder in text-to-image models - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transfer Transformer
The T5 paper; a text-to-text transformer used as the text encoder in many text-to-image models including Stable Diffusion - Scalable Diffusion Models with Transformers
The DiT paper; replaces the U-Net backbone with a transformer, establishing the foundation for modern diffusion transformer architectures - Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
The Stable Diffusion 3 paper; combines rectified flow matching with a multimodal DiT architecture (MMDiT) - FLUX.1
Rectified flow transformer for text-to-image generation; no official paper, open-weight model released August 2024 - FLUX.2: Frontier Visual Intelligence
32B flow matching transformer that couples a vision language model with a rectified flow transformer; supports multi-reference generation and up to 4MP output - Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Efficient 6B-parameter model built on a single-stream DiT architecture; trained in only 314K H800 GPU hours - HunyuanImage 3.0 Technical Report
Native multimodal MoE model with 80B+ total parameters (13B activated per token); unifies multimodal understanding and generation within an autoregressive framework - Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Trains a single transformer over mixed-modality sequences by combining next-token prediction loss with diffusion loss - Nano Banana (Gemini 2.5 Flash Image)
Google's native multimodal image generation model built into Gemini; no official paper, released August 2025 - Introducing 4o Image Generation
GPT-4o's native autoregressive image generation capability; no official paper, released March 2025
Lecture 10: Distillation, Consistency Models & Flow Maps
Papers
- Progressive Distillation for Fast Sampling of Diffusion Models
Introduces progressive distillation to reduce sampling steps by repeatedly halving the number of steps - Consistency Models
Proposes consistency models that map any point on the ODE trajectory to the trajectory's origin, enabling single-step generation - Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion
Generalizes consistency models to learn the full trajectory of the probability flow ODE - How to build a consistency model: Learning flow maps via self-distillation
Provides a theoretical framework for building consistency models via flow map learning and self-distillation - Align Your Flow: Scaling Continuous-Time Flow Map Distillation
Scales continuous-time flow map distillation for high-quality few-step generation - Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
Joint distillation framework that enables both fast sampling and likelihood evaluation in flow-based models