Member of Technical Staff, Inference & Serving Infra

Inception

Palo Alto, CA

Category Engineering

Job Description

We are looking for engineers and scientists to design, optimize, and scale the systems that power our diffusion LLMs. Your work will make inference faster, more cost-effective, and more reliable.

Requirements

Build and optimize high-performance model serving systems for low-latency inference for diffusion LLMs
Extend orchestration frameworks (e.g., Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving.
Collaborate with ML researchers to translate theoretical requirements into practical system designs

Benefits

Competitive salary
Equity in a rapidly growing startup
Access to the latest GPU hardware and cloud resources
Flexible vacation and paid time off (PTO)
Health, dental, and vision insurance
A collaborative and inclusive culture

]]>