Site Reliability Engineering, Senior
Overview
Medallia is the pioneer and market leader in Experience Management. Our award-winning SaaS platform, Medallia Experience Cloud, leads the market in the management of experiences, insights, and actions for candidates, customers, employees, patients, and residents alike. We believe that every experience is a memory that can last a lifetime. Experiences shape the way people feel about a company. And they greatly influence how likely people are to advocate, contribute, and stay. At Medallia, we are committed to creating a world where organizations are loved by their customers and their employees. We empower exceptional people to create extraordinary experiences together. Bring your whole self. The Role and Team The Site Reliability Engineering organization at Medallia brings together the infrastructure and applications that power a highly reliable global SaaS platform. As a Senior Site Reliability Engineer, you will play a key role in designing, operating, and evolving the platforms and services that power Medallia's global production environment. You will work across engineering teams to improve reliability, scalability, performance, and operational maturity while driving automation and platform improvements at scale. This role is expected to provide technical leadership, influence engineering best practices, and help shape the future direction of our cloud-native infrastructure and operational strategy. We are looking for engineers who think beyond day-to-day operations and continuously seek ways to increase engineering leverage. Successful candidates will act as force multipliers by building automation, self-service capabilities, platform solutions, and AI-assisted workflows that enable teams to operate more efficiently and reliably at scale. Please note this role participates in a rotating on-call schedule supporting production systems and services. Engineering Leverage At Medallia, we believe great engineers amplify the impact of themselves and those around them. Successful Senior SREs build systems, platforms, standards, and automation that enable multiple teams to move faster, operate more reliably, and scale efficiently. As AI capabilities continue to evolve, Senior SREs are expected to evaluate, adopt, and promote AI-assisted engineering practices that improve productivity, accelerate delivery, and reduce operational burden across the organization. This role is based remotely in Pune. Candidates for this position are required to reside within the Pune metropolitan area. Relocation support is not available at this time.
Responsibilities
Design, build, and operate highly available, scalable, and secure production platforms. Partner with software engineering teams to improve application reliability, scalability, performance, and operational readiness. Lead complex incident investigations, root cause analyses, and reliability improvement initiatives. Design and implement automation, self-service capabilities, and platform solutions that reduce operational toil. Leverage AI-assisted engineering tools and automation platforms to accelerate troubleshooting, improve productivity, and reduce operational overhead. Identify opportunities to streamline operational processes through automation, AI-enabled workflows, and platform engineering practices. Drive adoption of SRE principles, reliability standards, and operational best practices across engineering organizations. Develop and maintain infrastructure-as-code, deployment automation, and operational tooling. Support and improve CI/CD and GitOps-based deployment workflows. Design observability strategies using monitoring, logging, tracing, and alerting platforms. Participate in architecture reviews and provide guidance on scalability, resiliency, and operational excellence. Mentor junior engineers and contribute to the technical growth of the broader engineering organization. Act as a force multiplier by creating reusable solutions, self-service capabilities, and engineering standards that increase the effectiveness of multiple teams. Drive adoption of AI-assisted engineering workflows and operational automation across the organization. Drive engineering leverage initiatives that improve the productivity, reliability, and effectiveness of multiple engineering teams. Influence the broader engineering organization through platform thinking, standardization, and operational simplification.
Qualifications
Minimum Qualifications 5+ years of experience leading reliability, platform engineering, infrastructure, or cloud operations initiatives in production environments. Demonstrated experience operating and supporting large-scale production environments. Demonstrated experience with Kubernetes and containerized workloads in production environments. Demonstrated experience with cloud infrastructure platforms such as AWS, OCI, or GCP. Demonstrated Linux systems administration and troubleshooting skills. Demonstrated experience developing automation and tooling using Python, Go, Bash, or similar languages. Demonstrated experience with infrastructure-as-code technologies such as Terraform. Demonstrated experience designing and supporting CI/CD and GitOps workflows. Demonstrated understanding of networking fundamentals including DNS, load balancing, TLS/SSL, routing, and service networking. Demonstrated experience troubleshooting distributed systems and leading production incident response efforts. Demonstrated track record of reducing operational complexity through automation, platform engineering, or process transformation initiatives. Proven ability to influence technical decisions across teams and drive engineering improvements beyond direct ownership. Ability to participate in an on-call rotation supporting production systems. Professional working proficiency in written and spoken English.
Preferred Qualifications
Experience with GitOps platforms such as ArgoCD. Experience operating multi-region or hybrid-cloud environments. Experience with observability platforms such as Prometheus, Grafana, Loki, OpenTelemetry, or similar technologies. Experience designing and operating platform engineering solutions and self-service infrastructure. Experience supporting high-scale SaaS environments. Understanding of release strategies such as canary, blue/green, progressive delivery, and feature flag-based deployments. Experience with capacity planning, performance engineering, and resilience testing. Familiarity with security, compliance, and regulatory requirements in production environments. Experience using AI-assisted development, automation, or operational tooling to improve engineering productivity and service reliability. Experience applying AI-assisted engineering workflows to improve productivity, reliability, or operational efficiency at scale. Experience designing platform engineering solutions that enable self-service and increase engineering leverage. Experience mentoring engineers and leading cross-functional technical initiatives. Demonstrated passion for automation, process improvement, operational excellence, and engineering scalability. Strong communication, collaboration, and stakeholder management skills. What Success Looks Like Successful Senior SREs at Medallia: Continuously reduce operational toil through automation and platform improvements. Improve service reliability through engineering-driven solutions rather than manual processes. Build platforms and self-service capabilities that enable engineering teams to move faster while maintaining reliability and security. Act as force multipliers for engineering teams through tooling, documentation, standards, and reusable solutions. Leverage AI-assisted workflows to increase productivity and accelerate delivery without compromising reliability. Raise the operational maturity of the systems and teams they support. Influence technical decisions beyond their immediate area of ownership. Leave behind systems and processes that scale without requiring proportional increases in operational effort. Build capabilities that improve the productivity and effectiveness of entire engineering organizations. Eliminate classes of operational problems rather than repeatedly solving individual instances. At Medallia, we celebrate diversity and recognize the value it brings to our customers and employees. Medallia is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age (40 and over), disability, genetic information, veteran status or military service, or any other status protected by state or local law. Individuals with a disability who need an accommodation to apply please contact us at [email protected]. For information regarding how Medallia collects and uses personal information, please review our Privacy Policies. Applications will be accepted for 30 days from the date this role was posted or until the role has been filled.