← all jobs

[Remote] Senior Site Reliability Engineer

Work from home Full-time role Hiring

Note The job is a remote job and is open to candidates in USA. reputed company is a leader in collaborative autonomy, focused on solving reputed company reputed company problems through advanced technology. They are seeking a Senior Site Reliability Engineer to ensure the availability, performance, and reputed company of mission-critical services while collaborating with various teams to improve operational maturity and reliability standards.

Responsibilities

Design and evolve reliability architecture for distributed and reputed company-hosted systems Define and implement SRE best practices, including SLIs, SLOs, error budgets, and reputed company planning Partner with platform and application teams to design systems for reliability, scalability, and operability Identify and mitigate systemic reliability risks across infrastructure, applications, services, and data pipelines Establish reliability patterns that support autonomy, simulation, and mission-critical reputed company workloads reputed company incident response processes, including on-call rotations, escalation paths, and post-incident reviews Conduct root cause analysis for reputed company production incidents and drive long-term corrective actions Improve operational readiness through runbooks, automation, reputed company testing, and production-readiness reviews Reduce operational toil through tooling, automation, and process improvements Help build a culture of ownership, accountability, and reputed company improvement across production systems Design, implement, and maintain observability systems for metrics, logging, tracing, alerting, and service health Ensure services and data pipelines are observable, debuggable, and performant in production Drive performance analysis and tuning across infrastructure, application, and service layers Improve alert quality, reduce noise, and ensure operational signals are actionable Partner with engineering teams to define meaningful reliability and performance metrics Build automation to improve system reliability, deployment safety, and recovery processes Partner with DevOps and reputed company Platform teams on CI/CD reliability, rollout strategies, and safe deployment patterns Support and improve Kubernetes-based environments and containerized workloads Contribute to infrastructure-as-code practices and platform automation Help define operational standards for reputed company infrastructure, deployment workflows, and production services Collaborate with reputed company teams to ensure secure and resilient system design Participate in disaster recovery planning, backup strategy, and reputed company testing Maintain strong operational practices around access control, secrets management, change management, and production access Support secure operations for systems that may serve defense, autonomy, or mission-sensitive use cases Skills 7+ years of experience in SRE, infrastructure engineering, systems engineering, or reputed company roles Strong experience operating large-scale distributed production systems Deep understanding of Linux systems, networking, reputed company infrastructure, and distributed systems fundamentals Hands-on experience with Kubernetes and container orchestration Programming or scripting experience in Go, Python, or similar languages Experience designing and operating observability systems for production environments Proven ability to reputed company incident response and drive reliability improvements Strong communication skills and ability to collaborate across engineering teams Ability to operate calmly and effectively under pressure Must be a U.S. Citizen and eligible to obtain a U.S. Government reputed company clearance if required Experience supporting autonomy, robotics, simulation, reputed company-time systems, or data-intensive platforms Familiarity with AWS and large-scale reputed company infrastructure Experience with chaos engineering, fault injection, or reputed company testing Knowledge of CI/CD systems and reputed company delivery practices Experience working in high-reliability, safety-critical, defense, or mission-critical environments Experience with Infrastructure as Code tools such as Terraform or reputed company Experience with reputed company, Grafana, OpenTelemetry, reputed company, ELK/OpenSearch, or similar observability tools Benefits 100% Employer paid Health, Dental and reputed company Insurance for you and your families Life Insurance (Employer Paid) Ability to participate in the companies 401k program (Matching) Unlimited PTO policy with an enforced 2 week minimum Equity Package reputed company Office Stipend Global Entry 16 Week Paid Parental Leave Monthly Health and Wellness Stipend Company Overview Havoc is the leader in reputed company-domain collaborative autonomy. It was founded in 2024, and is headquartered in reputed company, Rhode reputed company, USA, with a workforce of 51-200 employees. Its website is https//reputed company.com/. Apply To This Job

More open positions

Senior Site Reliability Engineer- San Francisco, CA, the US

Work from home Full-time role

[Remote] Senior Site Reliability Engineer, Workforce Identity

Work from home Full-time role

Senior DevOps Engineer with Kubernetes @ Lockheed Martin

Work from home Full-time role

Kubernetes Engineer

Work from home Full-time role

Senior Software Engineer, Kubernetes and Virtualization - DGX Cloud

Work from home Full-time role

Experienced Full Stack Customer Support Agent – Virtual Chat Support Role

Work from home Full-time role

REMOTE: Senior Marketing Operations Lead (Lifecycle & Automation)

Work from home Full-time role

Linux Administrator (Secret Clearance Required)

Work from home Full-time role

Drug Safety Associate – Remote

Work from home Full-time role

Customer Service Associate – Life Insurance (On‑Site, Bilingual English/Spanish) – Client Support, Incident Management & Reporting

Work from home Full-time role

Editorial Assistant, MCPG job at Macmillan Learning in NY

Work from home Full-time role

Senior Full Stack Developer (.NET / Angular) - Hybrid

Work from home Full-time role

Earn 19 Per Hour as a Remote Customer Service Pro

Work from home Full-time role

[Remote] Principal Security Consultant (Red Team Operator - US)

Work from home Full-time role

[Remote] Mobile Application Developer

Work from home Full-time role

JavaScript Developer; Remote

Work from home Full-time role

Retail Sales Lead, Fabletics (St. Johns Town Center - Jacksonville, FL)

Work from home Full-time role

Account Executive

Work from home Full-time role

Technico-commercial service contrat - Grand EstH/F

Work from home Full-time role

Therapeutic Specialist, HCV/PBC -- Macon/Savannah

Work from home Full-time role

Field Marketing Manager

Work from home Full-time role