Model Serving Engineer

Bright Vision Technologies
Any Location, IL
Remote
Job Description
Bright Vision Technologies is seeking a Model Serving Engineer to design, build, and operate high-performance, highly reliable inference platforms for serving large machine learning models in production.

Requirements

  • Design and operate model serving platforms supporting diverse workloads including LLMs, vision models, and recommendation systems.
  • Optimize inference performance using continuous batching, paged attention, speculative decoding, and request multiplexing.
  • Implement multi-tenant routing, rate limiting, and quality-of-service policies across model endpoints.
  • Build autoscaling and capacity management systems that balance latency, throughput, and cost.
  • Tune GPU utilization, memory management, and KV cache strategies for LLM serving workloads.
  • Integrate model serving with API gateways, identity systems, and observability platforms.
  • Implement caching, prompt deduplication, and response reuse strategies where appropriate.
  • Drive end-to-end observability including latency histograms, queue dynamics, GPU utilization, and error tracking.
  • Develop deployment workflows including canary releases, shadow testing, and automated rollback.
  • Operate incident response for high-availability AI services and drive durable reliability improvements.
  • Collaborate with ML and product teams to support new model releases and capability rollouts.
  • Implement security controls including request signing, content filtering, and abuse detection at the serving layer.
  • Document operational procedures, performance characteristics, and tuning guidance for internal teams.
  • Stay current with AI serving research and translate advances into production capabilities.

Benefits

  • Competitive base salary commensurate with experience, plus benefits.
]]>