Member of Technical Staff, Inference & Serving Infra

Inception
Palo Alto, CA
Category Engineering
Job Description
We are looking for engineers and scientists to design, optimize, and scale the systems that power our diffusion LLMs. Your work will make inference faster, more cost-effective, and more reliable.

Requirements

  • Build and optimize high-performance model serving systems for low-latency inference for diffusion LLMs
  • Extend orchestration frameworks (e.g., Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving.
  • Collaborate with ML researchers to translate theoretical requirements into practical system designs

Benefits

  • Competitive salary
  • Equity in a rapidly growing startup
  • Access to the latest GPU hardware and cloud resources
  • Flexible vacation and paid time off (PTO)
  • Health, dental, and vision insurance
  • A collaborative and inclusive culture
]]>