[Remote] Staff Machine Learning Engineer
Note: The job is a remote job and is open to candidates in USA. Unity Technologies is the world’s leading game engine, and they are seeking a Staff Machine Learning Engineer to optimize and deploy AI-driven game experiences on mobile and constrained hardware. The role involves hands-on work with state-of-the-art models, focusing on performance, integration, and collaboration with research scientists.
Responsibilities
- Own the optimization pipeline for the models you ship: model export, graph transformation, operator fusion, memory-layout planning, and hardware-specific tuning across NPU, mobile GPU, and desktop/laptop GPU
- Apply quantization (INT4/INT8/FP16), weight sharing, structured/unstructured pruning, and knowledge distillation to hit hard latency, memory, and power budgets — and validate them against quality bars
- Do low-level performance work: write and tune WebGPU compute shaders (WGSL) and, where relevant, native kernels (Metal, Vulkan/SPIR-V compute, CUDA); profile with browser and platform tools (Chrome/Dawn GPU traces, PIX, Instruments/Metal System Trace, Snapdragon Profiler, Nsight, RenderDoc), and eliminate bottlenecks at the op and memory-bandwidth level
- Apply efficiency techniques — dynamic resolution, token reduction, cross-frame caching/reuse, reduced-step diffusion samplers — as engineering levers to meet budgets on target SKUs
- Work with WebGPU-targeted inference runtimes (ONNX Runtime Web, Transformers.js, WebLLM, TensorFlow.js) alongside native options (CoreML, ONNX Runtime, TFLite, ExecuTorch), and extend or build glue code where off-the-shelf options fall short of our diffusion and VLM workloads
- Build parts of the integration between the ML runtime and the game engine: real-time scheduling, memory pooling, zero-copy buffer sharing between the inference and render paths, and frame-budget management alongside the renderer
- Build supporting engineering for your components: model packaging and asset pipelines, on-device fallbacks and SKU-aware capability tiers, crash/quality telemetry, and automated on-device benchmarking in CI
- Partner with research scientists to turn novel CV and multi-modal architectures into implementations that are deployable, debuggable, and fast on device
- Provide a feedback loop into research: surface hardware constraints, op-support gaps, and cost models early so model design and deployment converge
- Track breakthroughs in efficient inference (efficient attention, distillation, reduced-step diffusion) and assess them pragmatically: what actually moves latency/memory/power on our target devices
- Contribute to engineering best practices, code-review standards, performance-regression gates, and on-device benchmarking methodology
- Support a culture of measurement: track KPIs for latency, quality, memory, and power for the systems you work on, across the device matrix
- Partner with platform engineers, product managers, and runtime teams to align your work with device-SKU constraints and product roadmaps
- Share knowledge and mentor junior and mid-level engineers through code review, pairing, and design discussion
Skills
- 5+ years in software/ML engineering, with meaningful time focused on on-device / edge inference or real-time, performance-critical systems
- Production deployment of transformer- and/or diffusion-based models (e.g., ViT, Stable Diffusion, CLIP/SigLIP-style encoders) on mobile, desktop, or embedded hardware — shipped, not just prototyped
- Hands-on experience with at least one major inference runtime (ONNX Runtime / ORT Web, CoreML, TFLite, ExecuTorch) and a working understanding of operator fusion, memory layout, and runtime scheduling
- Low-level performance engineering: solid command of at least one GPU/compute API — WebGPU/WGSL, Metal, Vulkan, D3D12, or CUDA — and the profiling tools to go with it. You can read a frame capture and a kernel trace and reason about where the time and memory go
- Working knowledge of model-optimization techniques — quantization (INT4/INT8/FP16), weight sharing, pruning, and distillation — and the judgment to apply them to hit latency and memory budgets. You use them effectively as engineering tools
- Understanding of target hardware: mobile SoCs (Apple Neural Engine, Qualcomm Hexagon/Adreno, ARM Mali) and/or desktop/laptop GPUs (Apple Silicon, NVIDIA, AMD, Intel)
- Strong Python for export pipelines and training-side tooling; familiarity with the core languages of a browser-native runtime (TypeScript/JavaScript, WGSL) is a plus
- Working fluency with the models you deploy — enough to read an architecture, modify it for deployment, and reason about accuracy trade-offs
- A collaborative working style: clear communication, reliable delivery, and a willingness to support and learn from teammates
- Experience shipping world-model, neural-rendering, or real-time generative pipelines (NeRF, 3DGS, real-time diffusion, or similar) on device
- Hands-on experience deploying models through WebGPU (e.g., ONNX Runtime Web WebGPU EP, Transformers.js, WebLLM, or TensorFlow.js) including writing/tuning WGSL compute shaders
- Game-engine or real-time-graphics background (Unity, Unreal, or a custom engine; Metal/Vulkan/D3D/OpenGL ES render pipelines) especially integrating compute workloads alongside a renderer
- Contributions to open-source ML inference frameworks, runtimes, or GPU/compute libraries especially in the WebGPU ecosystem (Dawn, wgpu, ORT Web, Transformers.js, WebLLM)
- Familiarity with compiler stacks (MLIR, TVM, IREE, XLA) for custom kernel generation and graph optimization
- Experience with on-device benchmarking infrastructure, performance-regression CI, and device-farm matrices
- Proficiency in C++/Objective-C/Swift for runtime integration
Benefits
- Comprehensive health, life, and disability insurance
- Commute subsidy
- Employee stock ownership
- Competitive retirement/pension plans
- Generous vacation and personal days
- Support for new parents through leave and family-care programs
- Office food snacks
- Mental Health and Wellbeing programs and support
- Employee Resource Groups
- Global Employee Assistance Program
- Training and development programs
- Volunteering and donation matching program
Company Overview