← all jobs

Senior Site Reliability Engineer (Remote USA)

Work from home Full-time role Hiring

OUR STORY TechInsights is the information Platform for the semiconductor industry. Regarded as the most trusted source of actionable, in-depth intelligence related to semiconductor innovation and surrounding markets, TechInsights’ content informs decision makers and professionals whose success depends on accurate knowledge of the semiconductor industry—past, present, or future. Over 650 companies and 150,000 users access the TechInsights Platform, the world’s largest vertically integrated collection of unmatched reverse engineering, teardown, and market analysis in the semiconductor industry. This collection includes detailed circuit analysis, imagery, semiconductor process flows, device teardowns, illustrations, costing and pricing information, forecasts, market analysis, and expert commentary. TechInsights’ customers include the most successful technology companies who rely on TechInsights’ analysis to make informed business, design, and product decisions faster and with greater confidence. For more information, visit www.techinsights.com. WHY WORK WITH US

  • Company-sponsored training and development opportunities
  • Comprehensive benefits package (health, dental, vision, wellness, 401K Matching, annual fitness reimbursement)
  • Flexible vacation policy
  • Community involvement opportunities through charitable alliances: https://www.techinsights.com/community-involvement
  • Wellness resources and support
  • Inclusive environment that prioritizes diversity, equity, and accessibility
  • High-growth company driven by high performance
  • Expected salary range: $149,100 - $157,800 USD

THE OPPORTUNITY: TechInsights is building the reliability and AI operations foundation for its next chapter — an AI-first intelligence platform that runs the most demanding semiconductor intelligence workflows in the world. We're looking for a Senior Site Reliability Engineer who wants to own that foundation. This is a senior individual contributor role at the technical leadership tier of our Site Reliability Engineering team. You'll own strategic reliability initiatives end-to-end: setting technical direction, defining SLOs and error budgets across our production platform, designing reliability patterns for the AI agent pipelines that power our platform's AI-first capabilities, and enabling our development and AI Engineering teams to build and ship with confidence. What sets this role apart is its scope. You're not just keeping the lights on — you're building the observability, Internal Developer Platform (IDP), and service catalog that a fast-scaling AI platform needs from day one. You'll be the reliability voice in architectural decisions, the engineer who closes the loop between agent failure modes and platform resilience, and the mentor who builds the team's capability rather than their own indispensability. If you have deep SRE experience and want to apply it to AI workloads — agent loop observability, blast radius management, LLM infrastructure reliability — this is the role where that expertise becomes a differentiator. This role is a remote role for candidates based in the United States. WHAT YOU’LL DO Platform Reliability & AI Operations

  • Own SLOs, SLIs, and error budgets for all production services; drive error budget discipline across engineering
  • Design reliability patterns for AI agent pipelines: LLM observability, tool-use tracking, failure detection, and graceful degradation
  • Architect for blast radius containment — agent failures must have bounded customer impact through isolation, circuit breaking, and rapid recovery
  • Mature our Canada Central/West active-active architecture toward 24-hour RTO with full regional failover
  • Lead incident response and post-incident reviews that produce durable fixes; maintain DR procedures through regular testing

Developer & AI Engineering Enablement

  • Serve as the primary reliability liaison to Software and AI Engineering, translating requirements into actionable standards
  • Partner with AI Engineering on compute provisioning, model serving, inference latency, and workload isolation
  • Own CI/CD pipeline strategy (Bitbucket Pipelines, GitHub Actions) — set standards, optimize deployment frequency, and ensure teams can ship confidently
  • Drive IDP adoption and enable teams on SRE practices: on-call readiness, SLO definition, runbook development, and self-service tooling
  • Represent reliability in architectural discussions; surface risk before it's committed to design

Observability, IDP & Service Catalog

  • Own the service catalog — a living inventory of all services, AI agents, dependencies, ownership, and SLOs
  • Operate Datadog as the single pane of glass for service health, infrastructure, and agentic pipeline telemetry
  • Extend observability to AI workloads: LLM latency, token consumption, agent completion rates, and pipeline throughput
  • Build golden path templates in Backstage and/or Atlassian Compass so teams ship reliably without routine SRE involvement
  • Apply AIOps in

More open positions

Site Reliability Engineer (FULLY REMOTE-Graveyard Shift)

Work from home Full-time role

Principal Site Reliability Engineer - ARINCDirect (Remote)

Work from home Full-time role

[Remote] Senior Site Reliability Engineer — Government & Sovereign Cloud

Work from home Full-time role

Urgently Need Site Reliability Engineer (Remote) in Saint Paul, MN

Work from home Full-time role

Site Reliability Engineer II, tvScientific

Work from home Full-time role

Hydrogeologist

Work from home Full-time role

Business Development Representative - Iberia

Work from home Full-time role

SAP Integration Suite (CPI) Developer

Work from home Full-time role

Subject Matter Expert – Finance (Japanese) – Remote

Work from home Full-time role

Telehealth Nurse (RN) - Atrium Health Wake Call Center, Weekender, PT

Work from home Full-time role

Territory Business Manager - Infinia - Hyderabad HQ

Work from home Full-time role

Partner Marketing Manager, UK

Work from home Full-time role

Experienced Director of Customer Onboarding – Remote Opportunity at careerzynith

Work from home Full-time role

AI Product Manager

Work from home Full-time role

Join careerzynith as an Apple Home Advisor and Make a Lasting Impact on Customer Experiences

Work from home Full-time role

NTAEL Remote Instructor (Part-Time) Pool - 2025-26 School Year

Work from home Full-time role

Manager, Solutions Architect – Cloud Infrastructure

Work from home Full-time role

Certified Nursing Assistant

Work from home Full-time role

Immediate Hiring – Remote Data Entry Specialist (Part‑Time, Flexible Hours) – Competitive Pay & Growth at careerzynith

Work from home Full-time role

Data Engineering Lead (AWS)

Work from home Full-time role

[Remote] Sr. Software Engineer, Threat Intelligence

Work from home Full-time role