[Remote] Software Engineer, NVIDIA OpenShell
Note: The job is a remote job and is open to candidates in USA. NVIDIA is defining the next era of computing by tapping into the unlimited potential of AI. The Software Engineer role within the OpenShell team involves working on a sophisticated platform that supports autonomous AI agents, focusing on full stack development, network security, and system observability.
Responsibilities
- Work across the full stack of a distributed systems platform, from crafting gRPC contracts to building secure sandbox runtimes
- Implement and harden network security features, including policy enforcement, L4/L7 proxies, and secure inter-service communication using mTLS
- Develop core platform components such as inference routing, ensuring model provider adapters, credential management, and protocol translation integrate seamlessly with the sandbox and gateway
- Build reliable configuration and control plane systems that handle state divergence, implement reconciliation loops, and support safe merging and hot-reloading policies
- Own the operability experience by creating effective CLI tools, managing release automation, and instrumenting all systems for observability with structured logging and distributed tracing
Skills
- Minimum of a Bachelor's degree in Computer Science, Electrical Engineering, or a related technical field, or equivalent experience
- 8+ years of meaningful experience
- Proficiency in systems programming, including building and debugging long-running services, async runtimes, and handling OS-level integration
- Deep knowledge of distributed systems/control planes, including reasoning about state divergence, building reconciliation loops, and designing crash recovery paths
- Experience with Container/Sandbox Internals, managing isolated workloads, process lifecycle, capabilities, and network namespaces
- Familiarity with gRPC and Protobuf, including crafting machine-to-machine APIs with clean streaming semantics and version safety
- Experience operating and extending workloads on Kubernetes, including working with compute drivers, image management, and detailed debugging
- Ability to secure inter-service communication using mTLS, gateway registration flows, and non-browser identity verification
- Proficiency in instrumenting systems with structured logging, health checks, and distributed tracing for production observability
- Familiarity with virtualization technologies and alternative runtimes, such as microVMs (e.g., libkrun)
- Experience improving operator experience through CLI/TUI development, status reporting, and clear error messages
- Comfort working at cross-language boundaries, specifically between Rust, Python, protobuf codegen, and shell scripting
Benefits
- You will also be eligible for equity and [benefits](https://www.nvidia.com/en-us/benefits/).
Company Overview
Company H1B Sponsorship