Data Engineer – New Grad – Scale AI
Scale AI is the data platform powering the AI revolution — providing high-quality AI training data annotation, evaluation, and RLHF (Reinforcement Learning from Human Feedback) for the world's leading AI labs and enterprises. Scale's customers include OpenAI, Anthropic, Meta AI, the US Department of Defense, and 500+ enterprises training AI models across autonomous vehicles, natural language processing, computer vision, and robotics. Scale processes billions of data annotations annually, making it the most critical data infrastructure company for the AI era. With $1.5B raised at a $14B+ valuation, Scale AI is at the center of the most transformative technology moment in decades. We are hiring New Grad Data Engineers to build the data pipelines powering Scale's AI training data platform.
Responsibilities
- Build Scale's AI training data ingestion pipelines — processing raw customer datasets (images, text, video, LiDAR point clouds) into Scale's annotation task management platform
- Develop Scale's quality assurance data pipelines — implementing statistical sampling, annotator performance scoring, and consensus-based gold label generation for AI training datasets
- Implement Scale's RLHF data pipeline — processing human preference feedback from Scale's evaluator network for reinforcement learning from human feedback model training
- Build Scale's model evaluation data infrastructure — generating benchmark datasets, adversarial test cases, and red-teaming prompts for LLM safety and capability evaluation
- Develop Scale's enterprise customer data onboarding pipelines — securely ingesting, deduplicating, and anonymizing sensitive customer datasets for proprietary AI model training
- Build internal analytics datasets tracking annotator quality, task throughput, and data pipeline SLAs across Scale's global annotation operations
Requirements
- Bachelor's degree in Computer Science, Data Engineering, or Machine Learning
- Strong Python and SQL skills for data pipeline development
- Understanding of ML data concepts: training data, annotations, data quality, and model evaluation
- Familiarity with cloud data platforms (AWS, GCP) and distributed data processing (Spark, Dask)
- Passion for AI development and understanding of how data quality impacts model performance
Benefits
- Highly competitive salary with Scale AI pre-IPO equity at $14B+ valuation
- Work at the epicenter of AI development — powering models used by 500M+ people
- Medical, dental, and vision benefits
- 401(k) with Scale AI matching
- San Francisco headquarters with hybrid flexibility and AI-native engineering culture