[Remote] Principal Site Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. DraftKings Inc. is a technology company that is at the forefront of innovation in sports betting and gaming. As a Principal Site Reliability Engineer, you will shape the long-term strategy for the infrastructure, driving architectural direction and ensuring the reliability and scalability of our platforms.
Responsibilities
- Define and execute the long-term strategy for our Kubernetes platform across Google Kubernetes Engine, Amazon Elastic Kubernetes Service, RKE2, and on-premise environments, ensuring reliability, scalability, and operational consistency
- Drive architectural decisions across critical infrastructure, including cluster lifecycle management, networking, identity and access management, observability, autoscaling, capacity planning, and cost optimization
- Lead large-scale platform initiatives across multiple engineering teams, establishing technical direction, engineering standards, and measurable outcomes that improve platform reliability and developer experience
- Establish and evolve reliability practices by defining service level objectives, service level indicators, and error budget frameworks that align platform performance with business priorities
- Build automation-first infrastructure through Infrastructure as Code, GitOps workflows, self-healing systems, and internal platform tooling that improve engineering velocity and reduce operational overhead
- Champion the responsible adoption of AI-powered engineering capabilities that improve operational efficiency, accelerate incident response, and enhance developer productivity
- Lead critical platform incidents, drive post-incident improvements, and strengthen platform resilience through automation, capacity planning, and operational excellence
- Mentor senior engineers, influence technical strategy across the organization, and elevate engineering excellence through architecture reviews, coaching, and technical leadership
Skills
- A Bachelor's Degree in Computer Science or a related technical field
- At least 8 years of experience designing, operating, and scaling distributed cloud and on-premise infrastructure, including at least 3 years operating at the Staff, Principal, or equivalent technical leadership level
- Proven experience leading large-scale infrastructure or platform initiatives that require cross-functional alignment and long-term technical ownership
- Deep expertise with Kubernetes, including cluster architecture, networking, storage, security, operators, lifecycle management, and large-scale production operations
- Extensive experience building and operating production infrastructure in AWS and Google Cloud Platform using Infrastructure as Code technologies such as Terraform, Pulumi, or similar tools
- Strong software development experience in Go, Python, or both, with expertise in GitOps, continuous integration and continuous delivery, observability, distributed systems, Linux, and reliability engineering principles
- Experience incorporating AI-powered tools into engineering workflows while applying sound judgment around reliability, security, and operational risk
- Exceptional communication and leadership skills with a proven ability to mentor engineers, influence technical strategy, and drive engineering excellence
- Experience working in regulated industries
- Experience in hybrid cloud environments
- Contributing to open-source projects
- Holding cloud certifications
Benefits
- Bonus
- Equity
- Benefits as applicable
- As a regulated gaming company, you may be required to obtain a gaming license issued by the appropriate state agency as a condition of employment. Don’t worry, we’ll guide you through the process if this is relevant to your role.
Company Overview
Company H1B Sponsorship