[Remote] Staff Machine Learning Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Unity Technologies is the world’s leading game engine, and they are seeking a Staff Machine Learning Engineer to optimize and deploy AI-driven game experiences on mobile and constrained hardware. The role involves hands-on work with state-of-the-art models, focusing on performance, integration, and collaboration with research scientists.

Responsibilities

Own the optimization pipeline for the models you ship: model export, graph transformation, operator fusion, memory-layout planning, and hardware-specific tuning across NPU, mobile GPU, and desktop/laptop GPU
Apply quantization (INT4/INT8/FP16), weight sharing, structured/unstructured pruning, and knowledge distillation to hit hard latency, memory, and power budgets — and validate them against quality bars
Do low-level performance work: write and tune WebGPU compute shaders (WGSL) and, where relevant, native kernels (Metal, Vulkan/SPIR-V compute, CUDA); profile with browser and platform tools (Chrome/Dawn GPU traces, PIX, Instruments/Metal System Trace, Snapdragon Profiler, Nsight, RenderDoc), and eliminate bottlenecks at the op and memory-bandwidth level
Apply efficiency techniques — dynamic resolution, token reduction, cross-frame caching/reuse, reduced-step diffusion samplers — as engineering levers to meet budgets on target SKUs
Work with WebGPU-targeted inference runtimes (ONNX Runtime Web, Transformers.js, WebLLM, TensorFlow.js) alongside native options (CoreML, ONNX Runtime, TFLite, ExecuTorch), and extend or build glue code where off-the-shelf options fall short of our diffusion and VLM workloads
Build parts of the integration between the ML runtime and the game engine: real-time scheduling, memory pooling, zero-copy buffer sharing between the inference and render paths, and frame-budget management alongside the renderer
Build supporting engineering for your components: model packaging and asset pipelines, on-device fallbacks and SKU-aware capability tiers, crash/quality telemetry, and automated on-device benchmarking in CI
Partner with research scientists to turn novel CV and multi-modal architectures into implementations that are deployable, debuggable, and fast on device
Provide a feedback loop into research: surface hardware constraints, op-support gaps, and cost models early so model design and deployment converge
Track breakthroughs in efficient inference (efficient attention, distillation, reduced-step diffusion) and assess them pragmatically: what actually moves latency/memory/power on our target devices
Contribute to engineering best practices, code-review standards, performance-regression gates, and on-device benchmarking methodology
Support a culture of measurement: track KPIs for latency, quality, memory, and power for the systems you work on, across the device matrix
Partner with platform engineers, product managers, and runtime teams to align your work with device-SKU constraints and product roadmaps
Share knowledge and mentor junior and mid-level engineers through code review, pairing, and design discussion

Skills

5+ years in software/ML engineering, with meaningful time focused on on-device / edge inference or real-time, performance-critical systems
Production deployment of transformer- and/or diffusion-based models (e.g., ViT, Stable Diffusion, CLIP/SigLIP-style encoders) on mobile, desktop, or embedded hardware — shipped, not just prototyped
Hands-on experience with at least one major inference runtime (ONNX Runtime / ORT Web, CoreML, TFLite, ExecuTorch) and a working understanding of operator fusion, memory layout, and runtime scheduling
Low-level performance engineering: solid command of at least one GPU/compute API — WebGPU/WGSL, Metal, Vulkan, D3D12, or CUDA — and the profiling tools to go with it. You can read a frame capture and a kernel trace and reason about where the time and memory go
Working knowledge of model-optimization techniques — quantization (INT4/INT8/FP16), weight sharing, pruning, and distillation — and the judgment to apply them to hit latency and memory budgets. You use them effectively as engineering tools
Understanding of target hardware: mobile SoCs (Apple Neural Engine, Qualcomm Hexagon/Adreno, ARM Mali) and/or desktop/laptop GPUs (Apple Silicon, NVIDIA, AMD, Intel)
Strong Python for export pipelines and training-side tooling; familiarity with the core languages of a browser-native runtime (TypeScript/JavaScript, WGSL) is a plus
Working fluency with the models you deploy — enough to read an architecture, modify it for deployment, and reason about accuracy trade-offs
A collaborative working style: clear communication, reliable delivery, and a willingness to support and learn from teammates
Experience shipping world-model, neural-rendering, or real-time generative pipelines (NeRF, 3DGS, real-time diffusion, or similar) on device
Hands-on experience deploying models through WebGPU (e.g., ONNX Runtime Web WebGPU EP, Transformers.js, WebLLM, or TensorFlow.js) including writing/tuning WGSL compute shaders
Game-engine or real-time-graphics background (Unity, Unreal, or a custom engine; Metal/Vulkan/D3D/OpenGL ES render pipelines) especially integrating compute workloads alongside a renderer
Contributions to open-source ML inference frameworks, runtimes, or GPU/compute libraries especially in the WebGPU ecosystem (Dawn, wgpu, ORT Web, Transformers.js, WebLLM)
Familiarity with compiler stacks (MLIR, TVM, IREE, XLA) for custom kernel generation and graph optimization
Experience with on-device benchmarking infrastructure, performance-regression CI, and device-farm matrices
Proficiency in C++/Objective-C/Swift for runtime integration

Benefits

Comprehensive health, life, and disability insurance
Commute subsidy
Employee stock ownership
Competitive retirement/pension plans
Generous vacation and personal days
Support for new parents through leave and family-care programs
Office food snacks
Mental Health and Wellbeing programs and support
Employee Resource Groups
Global Employee Assistance Program
Training and development programs
Volunteering and donation matching program

Company Overview

Unity (NYSE: U) is the world’s leading platform for creating and operating real-time 3D (RT3D) content. It was founded in undefined, and is headquartered in Singapore, SG, with a workforce of 5001-10000 employees. Its website is http://www.unity3d.com.

Apply Now

[Remote] Staff Machine Learning Engineer

More open positions

[Remote] Workday Finance Business Analyst

[Remote] Principal Site Reliability Engineer

[Remote] Sr Data Analyst, Enrollment Research and Data Analytics (Remote)

[Remote] Senior Business Development Mgr - North Region

[Remote] Android Software Engineer - AI Trainer

Remote Fractional Controller - (SHO1056882)

[Remote] Information Security Analyst

Strategic Account Manager

Data Scientist

Netflix At Home Jobs(Data Entry) $24/Hour

Director, Education Project

Commercial Fitness Solutions Sales Representative US West

Payroll/HRIS Specialist

Medical Billing (Claims) Supervisor

Bilingual Case Management Specialist (Remote, Spanish Speaking)

Clinical Nurse Auditor, HEDIS •Remote •

Experienced Part-Time Remote Customer Service Representative – Driving Exceptional Client Experiences at careerzynith

[Remote] TAS Account Executive

Practice Onboarding Specialist ( India Remote)

Network Systems Engineering Manager (Pre-Sales)

Supervisor Instructional Design and Development job at AdventHealth in Altamonte Springs, FL