[Remote] Site Reliability Engineer (SRE) Kubernetes Platform (FedRAMP High / IL5) - Grade 8
Note: The job is a remote job and is open to candidates in USA. Dice is seeking a Site Reliability Engineer to support the development and operation of their Kubernetes-based platform in regulated environments. The role involves improving reliability, scalability, and compliance while working closely with senior engineers and technical leaders.
Responsibilities
- Contribute to the design, implementation, and operation of Kubernetes platforms in FedRAMP High / IL5 environments
- Support day-to-day reliability and performance of platform services, including monitoring and alerting
- Implement automation and tooling to improve operational efficiency and reduce manual effort
- Work with senior engineers to define and track SLIs, SLOs, and error budgets
- Assist in maintaining compliance and security requirements, including support for audits and continuous monitoring
- Contribute to infrastructure as code and CI/CD pipeline improvements
- Collaborate with cross-functional teams (Security, Platform, Application teams) to resolve issues and deliver platform capabilities
- Participate in on-call rotations supporting customer requests and paging alerts
Skills
- 4-6 years of experience in SRE, DevOps, or platform engineering roles
- Experience with Kubernetes in production environments
- Familiarity with cloud platforms (AWS, Azure, or similar; GovCloud experience a plus)
- Solid understanding of Linux systems, networking, and containerization
- Experience with Infrastructure as Code (e.g., Terraform)
- Proficiency in scripting or programming (e.g., Python, Go)
- Exposure to observability tools (Prometheus, Grafana, logging systems)
- Experience working in FedRAMP High or DoD IL5 environments
- Exposure to CI/CD systems and deployment automation (e.g., ArgoCD)
- Familiarity with container security practices and tools
- Experience supporting regulated or audited systems
Company Overview
Company H1B Sponsorship