[Remote] Senior Platform Engineer
Note: The job is a remote job and is open to candidates in USA. Quantiphi is an award-winning, AI-First digital engineering and consulting company focused on delivering high-impact Services and Solutions that help organizations solve what truly matters. The Senior Platform Engineer will design, optimize, and scale infrastructure for GenAI and LLM workloads, collaborating closely with data science, MLOps, and application teams to deliver cutting-edge AI solutions.
Responsibilities
- Design and implement scalable infrastructure for LLM and GenAI workloads across multi-GPU environments
- Perform GPU profiling, benchmarking, and performance optimization for distributed training workloads
- Manage and schedule compute-intensive jobs using Slurm-based clusters and OpenShift/Kubernetes environments
- Enable and optimize the NVIDIA GPU stack (CUDA, cuDNN, NCCL, Triton, RAPIDS, etc.)
- Collaborate with cross-functional teams to deploy models in research and production environments
- Build and support GenAI pipelines (fine-tuning, RAG, multi-modal inferencing, LLMOps)
- Develop reusable infrastructure templates using tools like Terraform and Helm
- Contribute to internal innovation (PoCs, workshops) and support client-facing delivery engagements
Skills
- Strong experience with Slurm and distributed training environments
- Hands-on expertise with Red Hat OpenShift and/or Kubernetes
- Deep knowledge of the NVIDIA GPU ecosystem (CUDA, cuDNN, NCCL, Nsight, Triton/TensorRT)
- Strong foundation in Linux systems, performance tuning, and multi-GPU optimization
- Experience deploying GenAI workloads (LLM fine-tuning, RAG pipelines, multi-modal systems)
- Familiarity with Infrastructure-as-Code tools (Terraform, Ansible)
- Experience with cloud GPU environments (GCP, Azure, AWS, OCI) and/or on-prem GPU clusters
- Experience with NVIDIA NIMs, DGX systems, or GPU-accelerated containers
- Knowledge of LLMOps frameworks and MLOps integration
- Familiarity with vector databases and retrieval systems for RAG architectures
- Comfortable working in client-facing environments and collaborating with AI solution teams
- Experience working with FHIR R4, HL7 v2, or SMART on FHIR
- Integration with EHR systems (e.g., Epic)
- Understanding of HIPAA compliance and healthcare data privacy
- Exposure to clinical workflows, CDS Hooks, or patient-facing applications
- Experience building clinical decision support systems or healthcare interoperability solutions
Benefits
- Make an impact at one of the world’s fastest-growing AI-first digital engineering companies.
- Upskill and discover your potential as you solve complex challenges in cutting-edge areas of technology alongside passionate, talented colleagues.
- Work where innovation happens - work with disruptive innovators in a research-focused organization with 60+ patents filed across various disciplines.
- Stay ahead of the curve immerse yourself in breakthrough AI, ML, data, and cloud technologies and gain exposure working with Fortune 500 companies.
Company Overview
Company H1B Sponsorship