← all jobs

[Remote] Senior Site Reliability Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. You.com is building the AI Search Infrastructure that powers modern AI systems. As a Site Reliability Engineer, you will own parts of the reliability, observability, and incident response posture for You.com’s production services, ensuring uptime and developing tools for incident management.

Responsibilities

  • Instrument services end-to-end using OpenTelemetry metrics and structured logging to ensure every critical path is measurable
  • Develop and maintain SRE standards and patterns (instrumentation guidelines, incident playbooks, service templates) that engineering teams adopt by default in new and existing services
  • Build internal tooling and automation in Python, Bash and Terraform to improve deployment safety, reliability, and operational efficiency
  • Design and maintain actionable dashboards that surface real user impact, not vanity metrics, for service owners and leadership
  • Tune alerting rules continuously to maximize signal-to-noise ratio; tie alerts to SLO-based error-budget burn rates rather than arbitrary thresholds
  • Own reliability incident response end-to-end: detection, triage, communication, escalation, resolution, and stakeholder updates
  • Track and run blameless postmortems that focus on systemic contributing factors, not individual fault, producing actionable remediation items with owners and deadlines
  • Track remediation follow-through as a first-class metric. Ensure postmortem action items are completed, not just documented
  • Continuously improve MTTD and MTTR by feeding incident learnings back into monitoring, runbooks, and automation
  • Collaborate with Customer Success and ensure we by feed incident learnings back into monitoring, runbooks, and automation
  • Define meaningful SLOs for all production services grounded in critical user journeys, historical performance data, and business requirements
  • Eliminate alert fatigue by auditing, categorizing, and deprecating noisy or non-actionable alerts on a regular cadence
  • Help manage incident management processes and playbooks

Skills

  • 2+ years of full-time experience in an SRE or similar role
  • 3+ years of experience working in AWS with EKS and Github (GHA) & CI/CD
  • Strong hands-on experience with Git, Python, and Bash. Comfortable building production-grade automation and tooling
  • Experience establishing SRE practices across multiple teams (SLO definitions, alert hygiene, postmortem culture)
  • Built or maintained Prometheus-based monitoring with dashboards they have in Grafana
  • Demonstrated experience scoping and delivering infrastructure projects from proposal through production deployment
  • Demonstrated experience managing incidents and response to service outage
  • Hands-on experience integrating AI with SRE efforts to improve reliability, development and velocity
  • Demonstrated track record of collaborating with teams to define SLOs, instrument services against measurable SLIs, and operationalize error-budget burn-rate alerting that teams use independently to balance risk and delivery speed

Benefits

  • Hubs in San Francisco and New York City offering regular in-person gatherings and co-working sessions
  • Flexible PTO with U.S. holidays observed and a week shutdown in December to rest and recharge*
  • A competitive health insurance plan covers 100% of the policyholder and 75% for dependents*
  • 12 weeks of paid parental leave in the US*
  • 401k program, 3% match - vested immediately!*
  • $500 work-from-home stipend to be used up to a year of your start date*
  • $600 technology stipend to support a portion of our hybrid/remote team's cell phone and internet expenses*
  • $1,200 per year Health & Wellness Allowance to support your personal goals*
  • *Certain perks and benefits are limited to full-time employees only

Company Overview

  • You.com is a personalized AI search engine that delivers customized recommendations and allows natural conversation with its AI chatbot. It was founded in 2020, and is headquartered in Palo Alto, California, USA, with a workforce of 51-200 employees. Its website is https://you.com.
  • More open positions

    [Remote] Business Development Manager - US

    Work from home Full-time role

    [Remote] Customer Success Engineer (Americas)

    Work from home Full-time role

    [Remote] Senior Presales Engineer - Series B Cloud Security Start Up Vendor

    Work from home Full-time role

    [Remote] Account Manager – Performance Additives (Coatings Industry)

    Work from home Full-time role

    [Remote] Founding Product Designer

    Work from home Full-time role

    Remote Public Health Data Analyst Jobs in New York

    Work from home Full-time role

    Various Positions

    Work from home Full-time role

    Sr. Procurement Manager- IT Sourcing

    Work from home Full-time role

    Visual Designer

    Work from home Full-time role

    Program Officer supporting the University of Auckland, New Zealand - Remote in the USA

    Work from home Full-time role

    ? 15h Left: Administrative Assistant Admin Work From Home - Part Time Focus Grou

    Work from home Full-time role

    Rheology Sales Specialist

    Work from home Full-time role

    Business Manager

    Work from home Full-time role

    Remote Data Entry Specialist for Teens – Work From Home Opportunity with Flexible Hours and Comprehensive Paid Training

    Work from home Full-time role

    Clinical Trial Liaison

    Work from home Full-time role

    Physician - Telemammography - Remote - Nationwide

    Work from home Full-time role

    Experienced Full Stack Data Entry Specialist – Logistics and Supply Chain Operations

    Work from home Full-time role

    On Call Nurse-D, Per Diem

    Work from home Full-time role

    Senior Accounts Payable Analyst

    Work from home Full-time role

    Senior Software Engineer

    Work from home Full-time role

    Customer Service Representative

    Work from home Full-time role