[Remote] Staff Site Reliability Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Bolt.new by StackBlitz is a company that specializes in next-gen, AI-powered app building technology. They are seeking a Staff Site Reliability Engineer to embed with product and platform teams, ensuring reliability is designed into projects from the start and influencing how the organization approaches reliability across multiple teams.

Responsibilities

Embed With Teams Early: Partner with development teams throughout the project lifecycle, from design and architecture reviews through launch readiness. Bringing an SRE perspective before code is written, not after it breaks. Shepherd projects to completion with reliability designed in
Define Production-Readiness Standards: Establish and evolve the design reviews, launch checklists, and operational acceptance criteria that projects pass through, and own how teams adopt them across the org
Make Reliability Measurable: Define meaningful SLIs, SLOs, and error budgets in collaboration with product and engineering, and help teams use them to make real prioritization decisions
Build the Paved Roads: Create the frameworks, tooling, and golden paths across AWS, GCP, and Azure, with Terraform as the common backbone, that make the reliable way the easy way for every engineer
Cross-Team Leadership: Partner across engineering, product, and design to align reliability work with business objectives. Influence roadmaps, resolve technical disagreements, identify process and technical debt across the organization, and propose solutions that accelerate velocity for multiple teams. Mentor senior and mid-level engineers, raising the bar for operational excellence everywhere
Mature Our Incident Practice: Lead by influence on incident management and blameless postmortems, turning failure modes and operational signals into systematic, durable improvements
Represent Us Externally: Build relationships with our cloud and infrastructure provider teams to influence roadmaps and unlock early access to new capabilities, and represent StackBlitz in customer trust conversations and the broader reliability community
On-call rotation: Every SRE shares our on-call rotation, currently one week per month

Skills

Multi-Cloud Fluency: General fluency across AWS, GCP, and Azure matters more to us than deep specialization in any one, we run across all three. Terraform is our common infrastructure-as-code layer everywhere
Our Stack: Comfort supporting and contributing to TypeScript (frontend and backend) and Ruby on Rails (backend) services. We're opinionated about our stack, and you'll work alongside it daily
SRE / Production Engineering Experience: Significant experience as an SRE, production/platform engineer, or software engineer with a deep reliability focus, including time operating at scale
Software Engineering Excellence: Strong software engineering fundamentals; you write production-quality code and can go deep with the teams you partner with, balancing immediate needs against long-term maintainability
Technical Leadership & Influence: A track record of changing how teams work, not just how systems run, leading across team boundaries without formal authority
Strategic Execution: Ability to take ambiguous, high-scope problems and drive them to completion with minimal oversight
Systems Thinking: Ability to identify process, communication, and technical debt across the organization and propose solutions that accelerate velocity for multiple teams
Data-Driven Leadership: Experience building measurement and evaluation frameworks, identifying patterns in operational data, and translating findings into organizational improvements
Strong verbal and written English communication skills are required, as this role involves frequent collaboration with team members, stakeholders, customers, and external audiences where English is the primary working language
Experience standing up or maturing an SRE practice at a growth-stage company
Background working as an embedded SRE or partnering closely with product teams
Experience designing chaos/resilience testing or progressive delivery practices

Company Overview

Bolt.new is an AI development platform that offers building, running, editing, and deploying services for full-stack applications. It is a sub-organization of Bolt.new. It was founded in 2017, and is headquartered in San Francisco, California, USA, with a workforce of 11-50 employees. Its website is https://bolt.new.

Apply Now

[Remote] Staff Site Reliability Engineer

More open positions

[Remote] Applications Engineer

[Remote] Freelance Data Scraping Engineer (Python)

[Remote] Research Data Analyst II - MaineHealth Institute for Research

[Remote] Epic Clinical Analyst Senior - Cupid

[Remote] Senior Software Engineer

Network Operations Team Lead

SAP BP/SD FUNCTIONAL CONSULTANT

Remote Weekend Intake Coordinator (LPN)

Metadata Librarian job at Library Systems & Services - LSSI in Silver Spring, MD

Talent Operations Manager - Remote

CAD Designer (SolidWorks) - Product Design & Documentation

Senior Strategy & Planning Manager

Part-Time Transcription Jobs – Work from Home with Flexible Hours

[Remote] Remote Client Services | Work from Home

[Remote] Business Solutions Analyst II - Quality

Acquisitions Editor/Senior Acquisitions Editor - Arts & Crafts

[Hiring] Senior Clinical Research Manager @Heart Rhythm Clinical Research Solutions

[Remote] Key Account Executive

[Remote] Senior Accountant

Sr Principal Domain Architect

Senior Specialist, Systems Engineering