← all jobs

[Remote] Senior Network Reliability Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Group 1001 is a consumer-centric, technology-driven family of insurance companies focused on delivering outstanding value and operational performance. They are seeking a Senior Network Reliability Engineer to build a Site Reliability Engineering practice with a network scope, applying SRE principles to enhance the firm's network platform and ensure reliability across multi-cloud environments.

Responsibilities

  • Treat reliability as an engineered property. Define SLOs and error budgets for the network platform — DNS resolution, edge availability, mesh ingress success, cross-region path health — and use them to gate changes, not just to color dashboards. Lead postmortems with a focus on permanent remediation, not pattern-recognition. Alert on symptoms users feel, not on causes that may or may not produce impact
  • Move network state into code. Use Terraform (or Pulumi), Ansible, and Python to replace CLI-driven configuration with declarative, version-controlled, peer-reviewed change running through Infra CI/CD. This applies equally to the edge tier (Cloudflare), security platforms (Zscaler ZIA/ZPA, ZTNA policies, next-gen firewalls), the cloud network fabric (Transit Gateway, Cloud WAN, VPCs, Route53, IPAM), and increasingly the Kubernetes and service-mesh layer
  • Build network policy as intent, not rule lists. Express what flows are permitted, what segments are isolated, what egress is inspected, what zones share DNS — and engineer the compilers that turn that intent into per-vendor configuration. Use Policy as Code (OPA/Rego, Sentinel, Cilium NetworkPolicy) to catch invariant violations at plan time, not apply time
  • Infrastructure as Code (IaC): Design, deploy, and manage network infrastructure using Terraform or Ansible, moving the firm away from manual configuration to a code-first approach
  • Engineer the cloud network platform. Operate and extend our multi-account AWS Landing Zone — Cloud WAN segmentation, Transit Gateway peering, IPAM-driven CIDR allocation, shared private DNS, cross-account telemetry pipelines. Build the platform abstractions that make a new account or service land correctly with policy and connectivity composed from declarative inputs
  • Extend platform thinking into the container tier. Kubernetes networking, service mesh (Istio, Linkerd, Consul Connect), eBPF-based observability and policy (Cilium, Hubble), and the integration points where mesh-level authz meets cloud-tier identity. Recognize that an "internal" service is one logical hop on a chain of policy enforcement points and engineer for that explicitly
  • Improve telemetry and observability with intent. Build alerts as structured payloads with runbook links, suspected blast radius, and dependency-aware suppression. Author both system-health dashboards for operators and end-user monitoring dashboards that reflect actual user experience. Use Grafana, Elastic, Open Telemetry where each fits
  • Mentor and grow the team. Provide technical guidance to junior engineers, foster a culture of learning, and work out loud across Platform Engineering so the patterns you build cross-pollinate to adjacent domains
  • Handle hardware when required. Provide maintenance and configuration support for routers, switches, and firewalls at data centers and offices when needed — bringing code-first practices to physical hardware where possible (templating, change validation, zero-touch provisioning) and direct hands-on competence where it isn't
  • Incident Response: Serve as an escalation point for network issues, some complex and some basic but not yet covered by runbooks. Troubleshooting with a focus on root cause analysis and permanent remediation with a documentation-first mindset
  • Reduce toil and hand off cleanly. Repetitive operational tasks are scoped engineering problems with measurable payoff. Author runbooks and SOPs that the NOC can execute confidently; package routine work for L1/L2 handoff so engineering interrupt drops over time. Coordinate across Data Platforms, NOC/SOC, and Cyber Security so reliability practices spread instead of staying siloed

Skills

  • Deep understanding of TCP/IP, BGP, OSPF, VPNs, and SD-WAN architecture
  • Proven experience with Terraform (state management, modules) and Ansible (playbooks, roles) – or similar – in a production environment
  • Proficiency in Python for automation and API interaction, or similar
  • Hands-on experience with Cloudflare, zScaler, and/or enterprise firewalls
  • Experience configuring monitoring tools (e.g., Datadog, Prometheus, Grafana) to create meaningful alerts and dashboards
  • Service mesh experience (Istio, Linkerd, Consul Connect, Cilium)
  • EBPF-based observability (Hubble, Pixie)
  • AWS Multi-account landing zone tooling experience (AFT, Control Tower, or equivalent)
  • Policy as Code experience (OPA/Rego, Sentinel, Cilium NetworkPolicy)
  • A strong belief that a job isn't done until the documentation in written
  • A mindset that actively seeks to automate repetitive tasks
  • Willingness to handle physical hardware tasks when required while maintaining a software-centric engineering mindset

Benefits

  • Comprehensive health, dental, and vision insurance plan options
  • Basic and Supplemental Life Insurance
  • Short and Long-Term Disability
  • Employee Assistance Program
  • Wellness programs
  • 401K plan, with matching contributions by the Company

Company Overview

  • Group 1001 is a collective that empowers companies to create positive growth. Our insurance and annuities are easy to understand and accessible to all. It was founded in 2013, and is headquartered in Zionsville, Indiana, USA, with a workforce of 1001-5000 employees. Its website is https://group1001.com/.
  • Company H1B Sponsorship

  • Group 1001 has a track record of offering H1B sponsorships, with 1 in 2021, 1 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • More open positions

    [Remote] Dell Business Development Lead - Server Platforms

    Work from home Full-time role

    [Remote] Sailpoint Administrator

    Work from home Full-time role

    [Remote] Machine Learning Engineer

    Work from home Full-time role

    [Remote] Data Scientist, Core Data - PhD (2026)

    Work from home Full-time role

    [Remote] Staff Data Scientist

    Work from home Full-time role

    Senior Site Reliability Engineer, Remote Job

    Work from home Full-time role

    Medical Translators job at Lilt, Inc. in US National

    Work from home Full-time role

    Business Development Representative - Technology

    Work from home Full-time role

    Business Systems Analyst Prin

    Work from home Full-time role

    Remote Customer Service Representative – Join the careerzynith Team!

    Work from home Full-time role

    Remote Data Entry Specialist - Entry Level Opportunity | Work From Home Position | No Experience Required

    Work from home Full-time role

    Lead Generation Representative (outbound tele-sales/appointment scheduling)

    Work from home Full-time role

    Experienced Full Stack Data Entry Specialist – Web & Cloud Application Development

    Work from home Full-time role

    Software Engineer (Pleno) - AI Automation

    Work from home Full-time role

    Staff/Senior Security Engineer - DeFi

    Work from home Full-time role

    Chief of Staff

    Work from home Full-time role

    Sr. Sales Support Administrator

    Work from home Full-time role

    Experienced Customer Service Representative – Apple Products and Services

    Work from home Full-time role

    INTL UK/EU - Sr Atlassian Admin

    Work from home Full-time role

    Instructional Designer; Remote, Hybrid, or Onsite

    Work from home Full-time role

    Account Executive — Google SecOps & Cybersecurity Sales

    Work from home Full-time role