Role OverviewWe are building the next generation of AI-driven game experiences, running generative models on-device, right where the players are — on phones, tablets, laptops, and desktops. As a Senior Machine Learning Engineer for On-Device & Mobile AI, you will take state-of-the-art multi-modal models — transformers, diffusion networks, and vision-language models (VLMs) — and make them run fast, small, and reliably on mobile and constrained hardware.
What You Will Do
Inference & On-Device Optimization: Own the optimization pipeline for the models you ship: model export, graph transformation, operator fusion, memory-layout planning, and hardware-specific tuning across NPU, mobile GPU, and desktop/laptop GPU. Apply quantization (INT4/INT8/FP16), weight sharing, structured/unstructured pruning, and knowledge distillation to hit hard latency, memory, and power budgets — and validate them against quality bars.
Why It Might Be a Fit
This role is for an engineer who is energized by the gap between a research model and a shipping, on-device product. If you enjoy profilers, frame captures, op-fusion, and shaving milliseconds and megabytes, this is your role.
Requirements
- 5+ years in software/ML engineering, with meaningful time focused on on-device / edge inference or real-time, performance-critical systems
- Production deployment of transformer- and/or diffusion-based models (e.g., ViT, Stable Diffusion, CLIP/SigLIP-style encoders) on mobile, desktop, or embedded hardware — shipped, not just prototyped
- Hands-on experience with at least one major inference runtime (ONNX Runtime / ORT Web, CoreML, TFLite, ExecuTorch) and a working understanding of operator fusion, memory layout, and runtime scheduling
- Low-level performance engineering: solid command of at least one GPU/compute API — WebGPU/WGSL, Metal, Vulkan, D3D12, or CUDA — and the profiling tools to go with it
- Working knowledge of model-optimization techniques — quantization (INT4/INT8/FP16), weight sharing, pruning, and distillation — and the judgment to apply them to hit latency and memory budgets
- Understanding of target hardware: mobile SoCs (Apple Neural Engine, Qualcomm Hexagon/Adreno, ARM Mali) and/or desktop/laptop GPUs (Apple Silicon, NVIDIA, AMD, Intel)
- Strong Python for export pipelines and training-side tooling; familiarity with the core languages of a browser-native runtime (TypeScript/JavaScript, WGSL) is a plus
- Working fluency with the models you deploy — enough to read an architecture, modify it for deployment, and reason about accuracy trade-offs
- A collaborative working style: clear communication, reliable delivery, and a willingness to support and learn from teammates
Benefits
- Comprehensive health, life, and disability insurance
- Commutte subsidy
- Employee stock ownership
- Competitive retirement/pension plans
- Generous vacation and personal days
- Support for new parents through leave and family-care programs
- Office food snacks
- Mental Health and Wellbeing programs and support
- Employee Resource Groups
- Global Employee Assistance Program
- Training and development programs
- Volunteering and donation matching program
]]>