Applied Safety Research Engineer, Safeguards

Anthropic

New York, CA

Category Engineering

Job Description

Anthropic is looking for a research-oriented engineer to develop methods for safety evaluations of AI models. The role sits at the intersection of applied ML research and engineering, requiring design of experiments to improve evaluation quality, research on model safety behavior, and collaboration with policy experts to translate real-world harm patterns into measurable evaluations.

Requirements

Design and run experiments to improve evaluation quality
Research how different factors impact model safety behavior
Analyze evaluation coverage to identify gaps and inform where we need better measurement
Productionize successful research into evaluation pipelines
Collaborate with Policy and Enforcement to translate real-world harm patterns into measurable evaluations
Build tooling that enables policy experts to create and iterate on evaluations
Surface findings to research and training teams to drive upstream model improvements

Benefits

Competitive compensation and benefits
Optional equity donation matching
Generous vacation and parental leave
Flexible working hours
Lovely office space

]]>