[Remote] Site Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. Harbor Compliance is a leading technology platform for entity compliance, helping businesses and nonprofits manage licensing and legal requirements. The Site Reliability Engineer is responsible for designing and managing Linux infrastructure, collaborating with cross-functional teams to ensure system performance and reliability.
Responsibilities
- Design and execute a comprehensive infrastructure strategy that proactively supports evolving business requirements and operational excellence
- Own the predictable delivery of high-complexity technical solutions through deep automation using Kubernetes and sophisticated CI/CD pipelines
- Maintain superior portal availability and system health by implementing advanced observability and distributed tracing strategies
- Lead high-severity incident response efforts and drive systemic improvements through insightful, blameless postmortem analysis
- Architect failure-resilient and self-healing infrastructure systems to ensure continuous operational stability and zero data loss
- Serve as the internal subject matter expert to influence software architecture decisions toward maximum scalability and performance
- Facilitate regular knowledge-sharing and training sessions to elevate technical standards and process predictability across the entire technology department
- Direct security initiatives and design secure networking strategies to maintain a high-standard protection framework for all client data and assets
Skills
- 4–7 years of professional experience building and managing resilient, modern infrastructure within a fast-paced environment
- Expert-level proficiency in managing and troubleshooting Linux-based servers across multiple distributions
- Advanced capability in developing modular, reusable infrastructure templates using tools such as Terraform and Ansible
- Proven success in managing containerized workloads at scale using Kubernetes and Helm
- Extensive experience configuring and optimizing high-performance database environments, specifically MySQL
- Demonstrated ability to build robust, secure CI/CD deployment pipelines that include automated rollback and quality gates
- Strong technical documentation skills, including the creation of architectural diagrams, detailed specifications, and operational playbooks
- Ability to lead cross-functional projects independently while mentoring junior engineers and driving team-wide initiatives
- Deep understanding of observability platforms such as New Relic, Datadog, or Prometheus to measure and improve system reliability
- Expertise in designing secure cloud networking strategies including firewalls, VPNs, and identity management best practices
- Advanced scripting and programming proficiency in Python or similar languages to automate complex operational workflows
- Strategic insight into infrastructure ROI and the ability to align technical roadmaps with broad business priorities
- Practical knowledge of disaster recovery planning and the execution of failure-resilient system designs
Benefits
- Health benefits
- Flexible paid time off
- Parental leave
- Fertility and adoption assistance
- 401(k)
- Educational reimbursement
Company Overview