Connecting job seekers with genuine opportunities — always free to apply Post a Job Free →

AI Safety Data Scientist – New Grad

Anthropic

Full-time Hybrid United States

Job Description

AI Safety Data Scientist – New Grad – Anthropic

Anthropic is the AI safety company building Claude — the most capable and safest large language model in the world. Founded by former OpenAI researchers including Dario Amodei and Daniela Amodei, Anthropic's mission is the responsible development of AI for the long-term benefit of humanity. Claude powers hundreds of applications for millions of users, and Anthropic is at the forefront of Constitutional AI, interpretability research, and AI alignment. We are hiring New Grad AI Safety Data Scientists in San Francisco to analyze Claude's behavior, measure safety properties, and develop quantitative frameworks for evaluating AI alignment.

Responsibilities

  • Design and run large-scale experiments evaluating Claude's capability benchmarks, safety properties, and alignment across diverse task categories
  • Develop statistical frameworks to measure and track AI safety metrics — including harmful output rates, refusal quality, and helpfulness-harmlessness tradeoffs
  • Analyze human feedback datasets from Anthropic's RLHF training pipeline to identify patterns in human preferences and evaluate label quality
  • Build data pipelines processing Claude's conversation logs to surface failure modes, capability regressions, and unexpected behaviors at scale
  • Collaborate with Anthropic's interpretability and policy teams to translate quantitative safety findings into model improvements and deployment guidelines
  • Conduct red-teaming data analysis — measuring the effectiveness of adversarial prompting techniques and the robustness of Constitutional AI guardrails

Requirements

  • Bachelor's or Master's degree in Statistics, Computer Science, Machine Learning, or Cognitive Science
  • Strong statistical and probabilistic reasoning skills for experimental design and hypothesis testing
  • Proficiency in Python for data analysis (pandas, numpy, scipy, matplotlib/seaborn)
  • Experience with ML model evaluation, benchmarking, or NLP data analysis
  • Genuine commitment to AI safety and understanding of LLM behavior, alignment, and risks

Benefits

  • Among the most competitive compensation packages in AI with Anthropic equity
  • Work at the frontier of AI safety research — the most important challenge in technology
  • Comprehensive medical, dental, and vision benefits with 100% premium coverage
  • 401(k) with Anthropic matching
  • San Francisco headquarters with Anthropic's mission-driven, research-first culture

Job Details

Salary $42 – $62 / month
Job Type Full-time
Work Mode Hybrid
Location San Francisco, CA
Apply Before Jul 20, 2026
Important: We never charge any fee at any stage of the hiring process. If anyone asks for money, report it to [email protected].
Similar Jobs

No similar jobs found.