Job Description

AI Safety Data Scientist – New Grad – Anthropic

Anthropic is the AI safety company building Claude — the most capable and safest large language model in the world. Founded by former OpenAI researchers including Dario Amodei and Daniela Amodei, Anthropic's mission is the responsible development of AI for the long-term benefit of humanity. Claude powers hundreds of applications for millions of users, and Anthropic is at the forefront of Constitutional AI, interpretability research, and AI alignment. We are hiring New Grad AI Safety Data Scientists in San Francisco to analyze Claude's behavior, measure safety properties, and develop quantitative frameworks for evaluating AI alignment.

Responsibilities

Design and run large-scale experiments evaluating Claude's capability benchmarks, safety properties, and alignment across diverse task categories
Develop statistical frameworks to measure and track AI safety metrics — including harmful output rates, refusal quality, and helpfulness-harmlessness tradeoffs
Analyze human feedback datasets from Anthropic's RLHF training pipeline to identify patterns in human preferences and evaluate label quality
Build data pipelines processing Claude's conversation logs to surface failure modes, capability regressions, and unexpected behaviors at scale
Collaborate with Anthropic's interpretability and policy teams to translate quantitative safety findings into model improvements and deployment guidelines
Conduct red-teaming data analysis — measuring the effectiveness of adversarial prompting techniques and the robustness of Constitutional AI guardrails

Requirements

Bachelor's or Master's degree in Statistics, Computer Science, Machine Learning, or Cognitive Science
Strong statistical and probabilistic reasoning skills for experimental design and hypothesis testing
Proficiency in Python for data analysis (pandas, numpy, scipy, matplotlib/seaborn)
Experience with ML model evaluation, benchmarking, or NLP data analysis
Genuine commitment to AI safety and understanding of LLM behavior, alignment, and risks

Benefits

Among the most competitive compensation packages in AI with Anthropic equity
Work at the frontier of AI safety research — the most important challenge in technology
Comprehensive medical, dental, and vision benefits with 100% premium coverage
401(k) with Anthropic matching
San Francisco headquarters with Anthropic's mission-driven, research-first culture

Job Details

Salary	$42 – $62 / month
Job Type	Full-time
Work Mode	Hybrid
Location	San Francisco, CA
Apply Before	Jul 20, 2026

Important: We never charge any fee at any stage of the hiring process. If anyone asks for money, report it to [email protected].

Apply on Company Website

Similar Jobs

No similar jobs found.