[Remote] AI Principal Consultant
Note: The job is a remote job and is open to candidates in USA. Computacenter is a leading independent technology partner, trusted by large corporate and public sector organizations. They are looking for an AI Principal Consultant to join their professional services team, focusing on designing and implementing large-scale networking projects and providing expertise in high-performance computing and data center environments.
Responsibilities
- Partner with business leaders to deliver services that support company objectives and that are consistent with Winning Together values
- Design, architect, and implement distributed InfiniBand networks for high-performance computing (HPC) and data center environments
- Providing ethernet and routing expertise to customers during project delivery to design, architect and test ethernet networking solutions
- Designing, implementing, and optimizing high-performance fabric architectures for our data center and infrastructure projects
- Support operational and reliability aspects of large-scale Al clusters, focusing on performance at scale, training stability, real-time monitoring, logging, and alerting
- Your primary focus would be on understanding the Al workload and how it interacts with other parts of the system like networking, storage, deep learning frameworks, data cleaning tools, etc
- Work on multi-functional teams to provide ethernet network expertise to server infrastructure builds, accelerated computing workloads and GPU enabled AI applications
- Implementing tasks related to network configuration and validation for data centers
- Create methods of procedure and deployment documents
- Embrace and support Computacenter’s mission and core values
Skills
- Bachelor's degree in Information Technology, Engineering, or related field (or equivalent experience)
- Strong understanding of NVIDIA technologies including GX Cloud, NVIDIA AI Enterprise AI Software, Base Command Manager, NEMO and NVIDIA Inference Microservices
- Deep understanding of Kubernetes‑based GPU scheduling, GPU virtualization concepts (fractional GPUs, MIG awareness), and policy‑driven resource allocation in multi‑tenant clusters
- Experience optimizing cluster‑level GPU utilization, workload throughput, and job latency using Run:AI in conjunction with NVIDIA GPU platforms
- Strong routing hands-on experience including BGP, VxLAN and EVPN
- Cluster management technologies knowledge and BCM (Base Command Manager)
- Legally eligible to work in the United States
- Experience with IT service delivery lifecycle and methodologies
- Demonstrated experience designing, deploying, or operating Run:AI–based GPU orchestration platforms in production environments
- Ability to design in-depth, complex technical solutions
- In-depth knowledge of IT Infrastructure technology and its business application
- Excellent communication and presentation skills, with the ability to present at large internal and external audiences and at board level
Benefits
- Competitive compensation plans
- Long-term career opportunities
- Benefit plans to contribute to your good health
- Benefit plans to contribute to your future financial security
- Benefit plans to contribute to your peace of mind
Company Overview