[Remote] Platform Engineer
Note: The job is a remote job and is open to candidates in USA. Sand Technologies is a global Physical AI company using data and AI to enhance critical industries. They are seeking a Platform Engineer who will own the Kubernetes platform, ensuring reliability and performance for health data systems used by ministries across various countries.
Responsibilities
- Kubernetes platform ownership: Design, deploy, and operate production Kubernetes clusters across AWS EKS and on-premises / bare-metal environments — owning cluster lifecycle, upgrades, multi-cluster topology, RBAC, networking, ingress, storage, and workload security
- Platform engineering: Build golden paths and self-service capabilities — Helm charts, operators/CRDs, reusable manifests, and internal tooling — that make it fast and safe for product and delivery teams to ship onto the platform
- GitOps & automation: Implement declarative, GitOps-driven delivery (Argo CD / Flux) and treat all infrastructure as code so environments are reproducible, auditable, and recoverable
- Reliability & SRE: Define SLOs and error budgets, lead incident response and blameless post-mortems, drive capacity and autoscaling strategy, and continuously improve the resilience and security posture of critical workloads
- Observability: Build and own monitoring, logging, tracing, and alerting (Prometheus, Grafana, OpenTelemetry, and similar) so the health of every deployed system is clear and actionable
- CI/CD pipelines: Develop and maintain CI/CD pipelines that streamline build, test, and deployment across services and environments
- R&D and problem-solving: Investigate infrastructure-level challenges, prototype solutions, and bring them into production
- Debugging & issue resolution: Troubleshoot and resolve infrastructure, networking, and cluster issues promptly to protect system integrity and performance
Skills
- Bachelor's or Master's degree in Computer Science, Information Technology, or a related field — or equivalent practical experience
- Minimum of 5 years in cloud/platform engineering, with significant hands-on time operating Kubernetes in production
- Proven track record of designing and running scalable, secure, and reliable cloud- and container-based platforms
- Experience managing on-premise and bare-metal deployments of Kubernetes clusters
- Excellent problem-solving skills, with the ability to research deeply and develop innovative solutions to complex infrastructure challenges
- Strong communication skills and a genuine ability to collaborate across cross-functional, distributed teams
- Willingness to travel across the African continent to support our in-country teams when needed
Company Overview