Learning resources for AI safety
Curated material spanning introductions, technical papers, policy work, newsletters, and fellowships.
Non-Technical Introduction to AI Safety • Newsletters, Podcasts, and Blogs • Research Fellowships • Technical Papers • Policy Papers
Non-Technical Introduction to AI SafetyFor a high-level, non-technical overview of arguments for caution around advanced AI systems, start here.
Blogs and YouTube
- Planned ObsolescenceBlog by Ajeya Cotra and Kelsey Piper
- Cold TakesBlog by Holden Karnofsky
- Is Power-Seeking AI an Existential Risk?By Joe Carlsmith
- Robert Miles (YouTube)Accessible AI safety explainers
AI Safety in the News
- A.I. Poses Risk of Extinction, Industry Experts WarnNew York Times
- Geoffrey Hinton tells us why he is now scared of the tech he helped buildMIT Technology Review
- The Aliens Have Landed, and We Created ThemBloomberg Opinion
Newsletters, Podcasts, and BlogsStay up to date with the latest developments in AI safety, policy, and governance through these newsletters, podcasts, and blogs.
Newsletters and Blogs
- Transformer NewsWeekly briefing on the power and politics of transformative AI
- Import AINewsletter by Jack Clark (Anthropic co-founder)
- Rising TideNewsletter by Helen Toner on navigating advanced AI
- HyperdimensionalBlog by Dean Ball on emerging tech and governance
- Geopolitics of AGINewsletter by RAND on strategic implications of advanced AI
- HLS AI AssociationHarvard Law School AI and policy community
- Epoch AI NewsletterResearch and weekly commentary on AI trends
- SemiAnalysisIn-depth semiconductor and AI industry analysis
- AI Futures ProjectNonprofit research group forecasting the future of AI
- Nikola JurkovicAISST alum writing on AI safety topics
- METR SubstackResearch updates from Model Evaluation & Threat Research
- Astral Codex TenBlog by Scott Alexander
- ObsoleteAI journalism newsletter
- Anthropic Alignment Science BlogTechnical AI safety research from Anthropic
Podcasts
- 80,000 Hours PodcastIn-depth conversations on the world's most pressing problems
- Emerging Tech Policy PodcastNarrated articles from Emerging Tech Policy website
- Dwarkesh PodcastInterviews with leading thinkers by Dwarkesh Patel
Research FellowshipsResearch fellowships and programs for students and professionals interested in AI safety, governance, and policy research.
AI Safety and Governance Fellowships
- SPAR FellowshipPart-time, remote research fellowship that connects rising talent with experts in AI safety, policy, or biosecurity for 3-month research projects.
- Pivotal Research Fellowship9-week, in-person London fellowship focused on AI safety and governance research with mentorship, workshops, and stipend support.
- RAND CAST Fellowship (formerly TASP)Develops new generations of policy analysts and implementors at the intersection of technology and security issues. Fellows receive mentorship from RAND policy experts.
- LawAI Seasonal Research FellowshipsWinter and summer fellowships offering law students, professionals, and academics paid, cutting-edge AI law research with close mentorship from LawAI’s research staff.
- GovAI Summer and Winter FellowshipsStructured program designed to help researchers transition to working on AI governance full-time.
- ML Alignment and Theory Scholars (MATS)Independent research and educational seminar program connecting scholars with top mentors in AI alignment, governance, and security for a 12-week residential program.
- IAPS FellowshipFully funded, 3-month program for professionals from varied backgrounds at the Institute for AI Policy and Strategy.
- Vista AI Law and Policy FellowshipSponsors students and recent graduates for independent research with mentor guidance or as research assistants with law professors and AI policy experts.
- UChicago Existential Risk Laboratory Summer Research Fellowship10-week, in-person program for undergraduate and graduate students to produce high-impact research on emerging threats from AI and other existential risks.
- ERA Fellowship8 weeks of fully-funded AI safety research with weekly mentorship from expert researchers. Work on technical safety, governance, or technical AI governance projects.
- Astra FellowshipFully funded, 3–6 month, in-person program at Constellation’s Berkeley research center for AI safety research.
- Vitalik Buterin FellowshipsFunds PhD students and postdocs working on AI safety and/or US-China AI governance research, administered by the Future of Life Institute.
- Foundation for American Innovation Conservative AI Policy Fellowship8-week, fully-funded, work-compatible program designed for conservative policy professionals.
- PIBBSS Fellowship3-month interdisciplinary fellowship for researchers studying complex and intelligent behavior in natural and social systems, mathematics, philosophy, or engineering.
International Fellowships
- LASR Labs (London AI Safety Research Labs)13-week, in-person London technical AI safety research fellowship where participants work in teams on publication-oriented projects.
- EU Tech Policy FellowshipProgramme empowering ambitious graduates to launch European policy careers focused on emerging technology.
- Talos FellowshipThree-part program to accelerate European AI policy careers: 8-week online fundamentals course, 7-day Brussels policymaking summit, and optional 4–6 month paid placement at leading EU policy organizations.
Technical PapersIntended for researchers considering a transition to AI safety and advanced undergraduates who want to start technical work.
Mechanistic Interpretability
Mechanistic interpretability studies trained neural networks by reverse engineering the algorithms encoded in weights and activations.
- Anthropic Transformer Circuits Thread
- Indirect Object Identification (IOI) in GPT-2 Small
- Neel Nanda starter materials
Eliciting Latent Knowledge and Hallucinations
AI Evaluations and Standards
- Model Evaluations for Extreme RisksShevlane et al.
- GPT-4 System CardOpenAI
Goal Misgeneralization and Specification Gaming
- Goal Misgeneralization in Deep Reinforcement LearningLangosco et al.
- Goal MisgeneralizationShah et al.
- Specification GamingDeepMind
Emergent Abilities
- Emergent Abilities of Large Language ModelsWei et al.
- Are Emergent Abilities a Mirage?Schaeffer, Miranda, and Koyejo
Survey and General Reading
- Catastrophic Risks from AI
- Interpretability
- Adversaries
- Specification Learning
- Recommender Systems
- Embedded Agency
- AI Alignment Problem introductionNgo, Chan, and Mindermann
- Constitutional AI: Harmlessness from AI FeedbackAnthropic
Policy PapersFor students and practitioners interested in public policy, law, governance, and economics approaches to reducing AI risk.
Overviews and Surveys
- The Role of Cooperation in Responsible AI DevelopmentAskell et al., 2019
- AI Policy LeversFischer et al., 2021
- AI Chips: What They Are and Why They MatterKhan and Mann, 2020
- Towards Best Practices in AGI Safety and GovernanceSchuett et al., 2023
- 12 Tentative Ideas for U.S. AI PolicyMuehlhauser, 2023
Licensing, Auditing and Standards
- Auditing Large Language Models: A Three-Layered ApproachMokander et al., 2023
- Towards Trustworthy AI DevelopmentBrundage et al., 2020
- Nuclear Arms Control Verification and Lessons for AI TreatiesBaker, 2023
- Verifying Rules on Large-Scale Neural Network TrainingShavit, 2023
Misuse and Conflict
- How does the offense-defense balance scale?Garfinkel and Dafoe, 2019
- The Malicious Use of Artificial IntelligenceBrundage et al., 2018
- Protecting Society from AI MisuseAnderljung and Hazell, 2023
Structural Risk
- Thinking About Risks From AI: Accidents, Misuse and StructureZwetsloot and Dafoe, 2019
- The Windfall Clause: Distributing the Benefits of AIO'Keefe et al., 2020
- Algorithmic Black SwansKolt, 2023