[Remote] Staff Network Site Reliability Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Nebius is leading a new era in cloud infrastructure for the global AI economy. They are seeking a Network Site Reliability Engineer to build and run the network infrastructure, ensuring reliability and operational efficiency as the company scales.

Responsibilities

Define and own reliability goals for network services and critical paths (SLIs/SLOs, availability targets, error budgets where it makes sense)
Drive reliability improvements across the whole network: not only services, but also site readiness, inter-site connectivity (DCI), and operational standards
Own incident response for your areas, lead investigations/postmortems, and turn failures into durable fixes (not repeated firefighting)
Build and evolve observability: actionable metrics/logs/traces, alerting, and faster debug loops during and after incidents
Design safer change workflows: automation, CI/CD, test/staging environments, canarying, rollbacks, and auditability for network changes
Work closely with network engineers and platform teams to embed operability into designs and keep operations practical and fast

Skills

Strong production Linux fundamentals and a structured approach to debugging complex systems
Solid understanding of networking basics and how real networks fail (control plane vs data plane, latency/loss, failure domains, etc.)
Hands-on experience operating high-availability systems and improving them over time (not just 'keeping lights on')
Ability to write and maintain software/automation (Go is common for us; Python is also welcome)
Experience with modern infrastructure tooling (e.g., IaC, CI/CD, container platforms) and comfort automating operational workflows
Experience with high-throughput traffic processing: load balancers, tunneling/decap, NAT64, or similar datapath-heavy systems
Low-level networking performance/debug background (eBPF/XDP, DPDK, perf/ftrace, kernel networking internals)
Experience building network-safe delivery pipelines (testing labs, staged rollouts, automated verification, drift detection)
Background with large-scale network observability/telemetry (e.g., routing/flow telemetry, regression detection at scale)

Benefits

Competitive compensation
Career growth and learning opportunities
Flexibility and ownership
Collaborative and innovative culture
Opportunity to work on impactful AI projects
International environment and talented teams

Company Overview

The Nebius AI Cloud brings powerful full-stack infrastructure for AI developers and practitioners across startups, enterprises and science institutes to build and deploy generative AI applications and rapidly deliver scientific breakthroughs by training and running ML models within a secure, high-performance, and cost-optimized cloud environment. It was founded in 2022, and is headquartered in Amsterdam, NL, with a workforce of 1001-5000 employees. Its website is https://nebius.com.

Apply Now

[Remote] Staff Network Site Reliability Engineer

More open positions

[Remote] Senior Data Engineer

[Remote] Software Engineer

[Remote] SkillBridge Intern - Product Manager

[Remote] Product Manager, SMB Growth

[Remote] SkillBridge Intern - Software Engineer

[Remote] Senior Security Engineer

Senior Counsel - Emerging Companies

[Remote] Chess Teacher in Kansas City, MO (Private) | TeachMe.To

Licensed P&C Insurance Professional - Sales and Service (Signing Bonus)

Front-End Developer (React JS & React Native)

General Healthcare Practitioner – AI Trainer - Freelance - 8-20 hrs/week - Remote

Analyst - Credit (SP)

Experienced Full Stack Customer Service Representative – Online Retail Support

Director, Medical Safety Lead - DSPV, Gene Therapy

Territory Manager (Busselton, Beef)

Senior AI Product Manager - R01565065

B2B Email Marketing Systems Technician (Clinical Research, Contract)

Director of Product Management - Commercial Vehicles / Off-Highway

[Remote] Lead Functional Consultant/Business Analyst

BCBA - Remote

EdTech lead