[Remote] Site Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. Aorzon Technologies Inc is seeking an experienced Site Reliability Engineer (SRE) to join their high-performing engineering team. The role involves building and maintaining highly available, scalable, and secure cloud infrastructure.
Responsibilities
- Design, implement, and maintain highly available production systems
- Automate infrastructure provisioning using Infrastructure as Code (Terraform, CloudFormation, etc.)
- Build and optimize CI/CD pipelines for rapid and reliable deployments
- Monitor application and infrastructure health using observability tools
- Troubleshoot production incidents and perform root cause analysis
- Improve system reliability, performance, scalability, and security
- Collaborate closely with Software Engineering, DevOps, and Security teams
- Implement disaster recovery, backup, and high availability strategies
Skills
- 5+ years of Site Reliability Engineering / DevOps experience
- Strong experience with AWS, Azure, or Google Cloud Platform (GCP)
- Kubernetes, Docker, Helm, and container orchestration
- Terraform, Ansible, or other Infrastructure as Code tools
- CI/CD tools (GitHub Actions, Jenkins, GitLab CI, Azure DevOps)
- Linux system administration and shell scripting (Bash/Python)
- Monitoring & Observability: Prometheus, Grafana, Datadog, Splunk, ELK, New Relic, or Dynatrace
- Experience with incident management, production support, and on-call rotations
- Strong networking fundamentals (DNS, Load Balancing, TCP/IP, HTTP, SSL)
- Experience supporting microservices architecture
- Knowledge of service mesh (Istio/Linkerd)
- Experience with Kafka, Redis, RabbitMQ, or other messaging platforms
- Security best practices and cloud compliance experience
Company Overview
Company H1B Sponsorship