CUDA Kernel Engineer

PRAGMATIKE
Cambridge, MA
Category Data Analyst
Job Description
We are searching for a CUDA Kernel Engineer who has hands-on experience developing and optimizing NVIDIA CUDA kernels from scratch. You will work on the GPU performance layer powering large-scale, high-throughput AI systems used by Fortune 500 customers.

Requirements

  • Design, implement, and optimize custom CUDA kernels for NVIDIA GPUs
  • Profile GPU workloads using tools such as Nsight Compute, Nsight Systems, nvprof, and CUDA-MEMCHECK
  • Analyze and eliminate performance bottlenecks including warp divergence, uncoalesced memory access, register pressure, and PCIe transfer overhead
  • Improve GPU memory pipelines (global, shared, L2, texture memory) and ensure proper memory coalescing
  • Collaborate closely with AI systems, model acceleration, and backend distributed systems teams
  • Contribute to GPU architecture decisions, kernel libraries, and internal performance-engineering best practices

Benefits

  • Competitive salary & equity options
  • Sign-on bonus
  • Health, Dental, and Vision
  • 401k
]]>