Member of Technical Staff, Backend Systems

Inception
Palo Alto, CA
Job Description
Inception creates the world’s fastest, most efficient AI models. We are seeking experienced engineers to own the systems that serve our diffusion LLMs in production, optimizing for latency, throughput, cost, and reliability.

Requirements

  • Design, build, and operate scalable model serving infrastructure for our diffusion LLMs
  • Optimize inference pipelines for latency, throughput, and cost efficiency across GPU hardware
  • Implement and manage load balancing, autoscaling, and traffic routing for model endpoints
  • Build systems for model versioning, canary deployments, and zero-downtime rollouts
  • Develop monitoring, alerting, and observability tooling to ensure SLA compliance and rapid incident response
  • Collaborate with ML researchers to translate model advances into production-ready serving improvements
  • Benchmark and evaluate serving frameworks and hardware configurations to inform infrastructure decisions

Benefits

  • Competitive salary
  • Equity in a rapidly growing startup
  • Access to the latest GPU hardware and cloud resources
  • Flexible vacation and paid time off (PTO)
  • Health, dental, and vision insurance
  • A collaborative and inclusive culture
]]>