Member of Technical Staff, Model Hosting

Inception

Palo Alto, CA

Category Information Technology

Job Description

Inception creates the world’s fastest, most efficient AI models. We seek experienced engineers to own the systems that serve our diffusion LLMs in production.

Requirements

Design, build, and operate scalable model serving infrastructure for our diffusion LLMs
Optimize inference pipelines for latency, throughput, and cost efficiency across GPU hardware
Implement and manage load balancing, autoscaling, and traffic routing for model endpoints
Build systems for model versioning, canary deployments, and zero-downtime rollouts
Develop monitoring, alerting, and observability tooling to ensure SLA compliance and rapid incident response

Benefits

Competitive salary
Equity in a rapidly growing startup
Access to the latest GPU hardware and cloud resources
Flexible vacation and paid time off (PTO)
Health, dental, and vision insurance
A collaborative and inclusive culture

]]>