The course is planned to cover foundational model training pipelines, mechanistic interpretability, RLHF and goal misgeneralization, safety evaluations and red teaming, scalable oversight and control, and policy and career pathways in AI safety.
The format will emphasize hands-on notebooks, live demos, and paper-driven discussion to help students build both conceptual understanding and practical skills.