Lead Infrastructure and Reliability Engineer (Systems & Scale)

Luma AI
Palo Alto, CA
Job Description
We are hiring a Lead Infrastructure and Reliability Engineer to define the direction of our Infrastructure Engineering team. The role requires deep expertise in Linux and distributed systems, experience operating GPU / accelerator clusters, and strong fluency in Kubernetes and modern open-source infrastructure.

Requirements

  • Deep expertise in Linux and distributed systems
  • Experience operating GPU / accelerator clusters in real production environments
  • Strong fluency in Kubernetes and modern open-source infrastructure
  • Comfortable debugging across hardware → kernel → runtime → orchestration
  • Strong leadership and partnership skills

Benefits

  • Competitive salary
  • Equity
  • 401(k) matching
  • Health, dental, and vision insurance
  • Paid time off
]]>