Connecting job seekers with genuine opportunities — always free to apply Post a Job Free →

Data Engineer – New Grad

OpenAI

Full-time Onsite United States

Job Description

Data Engineer – New Grad – OpenAI

OpenAI is the world's leading AI safety and research company — creator of ChatGPT, GPT-4, DALL-E, Sora, and Whisper. With 200+ million weekly ChatGPT users and $3B+ in annual revenue, OpenAI is building artificial general intelligence (AGI) to benefit all of humanity. OpenAI's data infrastructure is uniquely critical: the quality, scale, and diversity of training data directly determines the capabilities of the world's most powerful AI models. Data engineers at OpenAI work on some of the most high-stakes data pipelines ever built — processing petabytes of web data, human feedback, and model-generated content that shape the frontier of AI intelligence. We are hiring New Grad Data Engineers to build the data infrastructure enabling OpenAI's mission.

Responsibilities

  • Build OpenAI's pre-training data pipelines — processing and filtering web-scale text corpora (Common Crawl, books, code, scientific papers) for large language model training
  • Develop OpenAI's RLHF data pipeline — collecting, processing, and quality-scoring human preference feedback used to align GPT models with human values through reinforcement learning
  • Implement OpenAI's data deduplication and contamination detection pipelines — identifying and removing duplicate and low-quality content from training datasets at petabyte scale
  • Build OpenAI's model evaluation data infrastructure — generating and managing benchmark datasets for capability evaluation across coding, reasoning, and safety dimensions
  • Develop OpenAI's usage analytics data platform — processing ChatGPT user interaction logs for product analytics, content policy enforcement, and model improvement signals
  • Implement data privacy pipelines ensuring compliance with GDPR, CCPA, and OpenAI's responsible data use commitments across training and product data systems

Requirements

  • Bachelor's degree in Computer Science, Data Engineering, or Machine Learning
  • Strong Python skills for large-scale data processing (PySpark, Dask, Ray)
  • SQL proficiency and experience with cloud data platforms (GCP BigQuery, Snowflake)
  • Understanding of ML data concepts: pre-training data, RLHF, data quality, and model evaluation
  • Genuine passion for AI safety and the responsible development of artificial general intelligence

Benefits

  • Among the most competitive compensation packages in the technology industry with OpenAI equity
  • Work at the most consequential technology company in the world
  • Comprehensive medical, dental, and vision benefits with 100% premium coverage
  • 401(k) with OpenAI matching
  • San Francisco Mission District headquarters and OpenAI's collaborative, mission-driven culture

Job Details

Salary $40 – $60 / month
Job Type Full-time
Work Mode Onsite
Location San Francisco, CA
Apply Before Jul 19, 2026
Important: We never charge any fee at any stage of the hiring process. If anyone asks for money, report it to [email protected].
Similar Jobs

No similar jobs found.