Resources

Learning resources for AI safety

Curated material spanning introductions, technical papers, policy work, newsletters, and fellowships.

Non-Technical Introduction to AI Safety • Newsletters, Podcasts, and Blogs • Research Fellowships • Technical Papers • Policy Papers

Non-Technical Introduction to AI SafetyFor a high-level, non-technical overview of arguments for caution around advanced AI systems, start here.

Blogs and YouTube

Planned ObsolescenceBlog by Ajeya Cotra and Kelsey Piper
Cold TakesBlog by Holden Karnofsky
Is Power-Seeking AI an Existential Risk?By Joe Carlsmith
Robert Miles (YouTube)Accessible AI safety explainers

AI Safety in the News

A.I. Poses Risk of Extinction, Industry Experts WarnNew York Times
Geoffrey Hinton tells us why he is now scared of the tech he helped buildMIT Technology Review
The Aliens Have Landed, and We Created ThemBloomberg Opinion

Newsletters, Podcasts, and BlogsStay up to date with the latest developments in AI safety, policy, and governance through these newsletters, podcasts, and blogs.

Newsletters and Blogs

Transformer NewsWeekly briefing on the power and politics of transformative AI
Import AINewsletter by Jack Clark (Anthropic co-founder)
Rising TideNewsletter by Helen Toner on navigating advanced AI
HyperdimensionalBlog by Dean Ball on emerging tech and governance
Geopolitics of AGINewsletter by RAND on strategic implications of advanced AI
HLS AI AssociationHarvard Law School AI and policy community
Epoch AI NewsletterResearch and weekly commentary on AI trends
SemiAnalysisIn-depth semiconductor and AI industry analysis
AI Futures ProjectNonprofit research group forecasting the future of AI
Nikola JurkovicAISST alum writing on AI safety topics
METR SubstackResearch updates from Model Evaluation & Threat Research
Astral Codex TenBlog by Scott Alexander
ObsoleteAI journalism newsletter
Anthropic Alignment Science BlogTechnical AI safety research from Anthropic

Podcasts

80,000 Hours PodcastIn-depth conversations on the world's most pressing problems
Emerging Tech Policy PodcastNarrated articles from Emerging Tech Policy website
Dwarkesh PodcastInterviews with leading thinkers by Dwarkesh Patel

Research FellowshipsResearch fellowships and programs for students and professionals interested in AI safety, governance, and policy research.

AI Safety and Governance Fellowships

SPAR FellowshipPart-time, remote research fellowship that connects rising talent with experts in AI safety, policy, or biosecurity for 3-month research projects.
Pivotal Research Fellowship9-week, in-person London fellowship focused on AI safety and governance research with mentorship, workshops, and stipend support.
RAND CAST Fellowship (formerly TASP)Develops new generations of policy analysts and implementors at the intersection of technology and security issues. Fellows receive mentorship from RAND policy experts.
LawAI Seasonal Research FellowshipsWinter and summer fellowships offering law students, professionals, and academics paid, cutting-edge AI law research with close mentorship from LawAI’s research staff.
GovAI Summer and Winter FellowshipsStructured program designed to help researchers transition to working on AI governance full-time.
ML Alignment and Theory Scholars (MATS)Independent research and educational seminar program connecting scholars with top mentors in AI alignment, governance, and security for a 12-week residential program.
IAPS FellowshipFully funded, 3-month program for professionals from varied backgrounds at the Institute for AI Policy and Strategy.
Vista AI Law and Policy FellowshipSponsors students and recent graduates for independent research with mentor guidance or as research assistants with law professors and AI policy experts.
UChicago Existential Risk Laboratory Summer Research Fellowship10-week, in-person program for undergraduate and graduate students to produce high-impact research on emerging threats from AI and other existential risks.
ERA Fellowship8 weeks of fully-funded AI safety research with weekly mentorship from expert researchers. Work on technical safety, governance, or technical AI governance projects.
Astra FellowshipFully funded, 3–6 month, in-person program at Constellation’s Berkeley research center for AI safety research.
Vitalik Buterin FellowshipsFunds PhD students and postdocs working on AI safety and/or US-China AI governance research, administered by the Future of Life Institute.
Foundation for American Innovation Conservative AI Policy Fellowship8-week, fully-funded, work-compatible program designed for conservative policy professionals.
PIBBSS Fellowship3-month interdisciplinary fellowship for researchers studying complex and intelligent behavior in natural and social systems, mathematics, philosophy, or engineering.

International Fellowships

LASR Labs (London AI Safety Research Labs)13-week, in-person London technical AI safety research fellowship where participants work in teams on publication-oriented projects.
EU Tech Policy FellowshipProgramme empowering ambitious graduates to launch European policy careers focused on emerging technology.
Talos FellowshipThree-part program to accelerate European AI policy careers: 8-week online fundamentals course, 7-day Brussels policymaking summit, and optional 4–6 month paid placement at leading EU policy organizations.

Technical PapersIntended for researchers considering a transition to AI safety and advanced undergraduates who want to start technical work.

Mechanistic Interpretability

Mechanistic interpretability studies trained neural networks by reverse engineering the algorithms encoded in weights and activations.

Survey and General Reading

Catastrophic Risks from AI
Interpretability
Adversaries
Specification Learning
Recommender Systems
Embedded Agency
AI Alignment Problem introductionNgo, Chan, and Mindermann
Constitutional AI: Harmlessness from AI FeedbackAnthropic

Policy PapersFor students and practitioners interested in public policy, law, governance, and economics approaches to reducing AI risk.

Overviews and Surveys

The Role of Cooperation in Responsible AI DevelopmentAskell et al., 2019
AI Policy LeversFischer et al., 2021
AI Chips: What They Are and Why They MatterKhan and Mann, 2020
Towards Best Practices in AGI Safety and GovernanceSchuett et al., 2023
12 Tentative Ideas for U.S. AI PolicyMuehlhauser, 2023

Licensing, Auditing and Standards

Auditing Large Language Models: A Three-Layered ApproachMokander et al., 2023
Towards Trustworthy AI DevelopmentBrundage et al., 2020
Nuclear Arms Control Verification and Lessons for AI TreatiesBaker, 2023
Verifying Rules on Large-Scale Neural Network TrainingShavit, 2023

Misuse and Conflict

How does the offense-defense balance scale?Garfinkel and Dafoe, 2019
The Malicious Use of Artificial IntelligenceBrundage et al., 2018
Protecting Society from AI MisuseAnderljung and Hazell, 2023

Structural Risk

Thinking About Risks From AI: Accidents, Misuse and StructureZwetsloot and Dafoe, 2019
The Windfall Clause: Distributing the Benefits of AIO'Keefe et al., 2020
Algorithmic Black SwansKolt, 2023