[Remote] Senior Cloud Services Engineer – Plex
Note: The job is a remote job and is open to candidates in USA. Rockwell Automation is a global technology leader focused on helping the world’s manufacturers be more productive, sustainable, and agile. They are seeking a Senior Cloud Services Engineer to support their Plex Cloud Operations team, focusing on maintaining and scaling their Kubernetes-based platform while ensuring high availability, security, and performance.
Responsibilities
- Maintain and improve our Kubernetes platform, ensuring high availability and scalability
- Implement infrastructure/configuration as code to automate operations. (Terraform, Ansible, Helm, Flux, Kustomize)
- Enhance observability and logging using OpenTelemetry and Elastic
- Building automated solutions that enable resiliency and self-healing of applications
- Managing Server Operating Systems (Windows and Linux)
- Managing Web Servers (IIS 10)
- Troubleshoot production incidents, perform root cause analysis, and drive reliability improvements
- Evaluate and implement cloud-native technologies to enhance platform efficiency
- Collaborate with security teams to ensure best practices for container security and compliance
- Work with multi-cluster management solutions such as Rancher, Cluster API (CAPI), or other Kubernetes fleet management tools
- Manage Kubernetes infrastructure on Azure and vSphere
- Participate in an on-call rotation to support platform operations and respond to incidents
Skills
- Bachelor's Degree or equivalent years of relevant work experience
- Legal authorization to work in the U.S. We will not sponsor individuals for employment visas, now or in the future, for this job opening
- Maintain and improve our Kubernetes platform, ensuring high availability and scalability
- Implement infrastructure/configuration as code to automate operations. (Terraform, Ansible, Helm, Flux, Kustomize)
- Enhance observability and logging using OpenTelemetry and Elastic
- Building automated solutions that enable resiliency and self-healing of applications
- Managing Server Operating Systems (Windows and Linux)
- Managing Web Servers (IIS 10)
- Troubleshoot production incidents, perform root cause analysis, and drive reliability improvements
- Evaluate and implement cloud-native technologies to enhance platform efficiency
- Collaborate with security teams to ensure best practices for container security and compliance
- Work with multi-cluster management solutions such as Rancher, Cluster API (CAPI), or other Kubernetes fleet management tools
- Manage Kubernetes infrastructure on Azure and vSphere
- Participate in an on-call rotation to support platform operations and respond to incidents
- Typically requires 5+ years of relevant professional experience in a cloud infrastructure, platform engineering, or operations role
- 3+ years managing multi-cluster Kubernetes environments. (Rancher & Cluster API)
- Hands-on experience with Azure and vSphere as Kubernetes infrastructure providers
- Experience with Linux administration and container runtimes (Docker, containerd)
- Solid understanding of RBAC, security policies, and secrets management in Kubernetes
- Proficiency with Terraform and Ansible
- Familiarity with observability tools (OpenTelemetry, Elastic, PRTG, and Dynatrace)
- Public Cloud experience (Microsoft Azure or Amazon Web Services)
- Knowledge of .Net website functionality
- Load balancer experience (F5 LTM, Azure Load Balancer)
- Understanding of IPv4/IPv6, FTP, HTTP, SSL/TLS, HTML, XML
- The ability to participate in an on-call rotation for platform support
- Prior experience in SRE or Platform Engineering roles
- Degree in Computer Science or related area
Benefits
- Health Insurance including Medical, Dental and Vision
- 401k
- Paid Time off
- Parental and Caregiver Leave
- Flexible Work Schedule where you will work with your manager to enjoy a work schedule that can be flexible with your personal life.
Company Overview