CUDA Kernel Engineer

PRAGMATIKE
Cambridge, NY
Category Research
Job Description
We are searching for a CUDA Kernel Engineer who has hands-on experience developing and optimizing NVIDIA CUDA kernels from scratch. You will work on the GPU performance layer powering large-scale, high-throughput AI systems used by Fortune 500 customers.

Requirements

  • Design, implement, and optimize custom CUDA kernels for NVIDIA GPUs, with a focus on maximizing occupancy, memory throughput, and warp efficiency.
  • Profile GPU workloads using tools such as Nsight Compute, Nsight Systems, nvprof, and CUDA‐MEMCHECK.
  • Analyze and eliminate performance bottlenecks including warp divergence, uncoalesced memory access, register pressure, and PCIe transfer overhead.
  • Improve GPU memory pipelines (global, shared, L2, texture memory) and ensure proper memory coalescing.
  • Collaborate closely with AI systems, model acceleration, and backend distributed systems teams.
  • Contribute to GPU architecture decisions, kernel libraries, and internal performance-engineering best practices.

Benefits

  • Competitive salary & equity options
  • Sign-on bonus
  • Health, Dental, and Vision
  • 401k
]]>