[Remote] Senior HPC Support Engineer - Ethernet and AI Infrastructure
Note: The job is a remote job and is open to candidates in USA. NVIDIA is a leading technology company specializing in AI and networking solutions. They are seeking a highly motivated Senior HPC Support Engineer to provide comprehensive solutions for sophisticated installations and technical support for their networking products, acting as the primary point of contact for customers.
Responsibilities
- Ability to resolve sophisticated customer concerns and technical issues through meticulous research, reproductions, and solving problems for customers installing our products and supporting systems using Linux Operating Systems (multi-distro), with the focus on NVIDIA Ethernet Switching technologies and our End-to-End Solutions such as NVIDIA Spectrum-X
- Responding to customer product support inquiries via telephone, email or conference calls
- Resolving customer issues during installation, operation, maintenance or product application or interoperability with other vendors
- Participate in multi-functional team meetings and giving feedback to engineering and marketing regarding product requirements, customer experience, support tools, etc
- As a technical resource develop, re-define and document standard methodologies to provide to internal teams(Support/R&D) for support process and improvements
Skills
- 5+ years in providing in-depth Customer Support and debugging for hardware and software products
- An academic degree from an accredited university or college in Networking, Computer Science/Engineering, or Electrical/IT (or equivalent experience)
- Shown use of established AI technologies in day-to-day job responsibilities
- Established knowledge of Enterprise platform and systems engineering who understands Linux triage, knowledge about servers and can resolve hardware and/or OS internal issues
- Intellectual curiosity, positive attitude, flexibility, analytical ability, self-motivation, and team-oriented including professional-level communication skills, interpersonal skills, with the ability to maintain and lead the overall resolution for any critical issue raised by our customer, under all circumstances
- Profound knowledge and experience (solving) in Networking Technology, protocols and routing including TCP, UDP, Ethernet, IP, L2, L3 (ARP, STP, LACP, MLAG, IGMP, PIM, BGP, OSPF), on Enterprise Level
- Linux OS including System Administration and Networking (LFCS / RHCSA)
- Able to debug networking protocols using tools such as TCPDUMP and Wireshark or similar packet generation and analysis tools
- Deep understanding of at least two of the following: data centers, servers, distributed systems, virtualization, deep learning frameworks, containers/containerization (i.e. Docker, Kubernetes)
- Adoption of AI solutions like Cursor, Gemini, ChatGPT, Copilot, Glean, etc. in your daily work routine
- Experience in solving problems in large-scale networking and AI Infrastructure environments with overlay technologies (BGP, OSPF, VXLAN, EVPN), RoCE and QoS Concepts
- Linux, Networking and NVIDIA AI Infrastructure and Operations Certifications such as CCONP, CCIE, JNCIE-DC/ENT, RHCE, LFCS, NCP-AII/AIO/AIN
- Shell Scripting (Python, bash, Ansible, yaml, etc…)
- Effective and comprehensive fixing / debugging methodology
Benefits
- Equity
- Benefits
Company Overview
Company H1B Sponsorship