[Remote] Senior Site Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. Oracle is seeking a Site Reliability Engineer to help build and operate reliable, scalable cloud-native platforms and services that support Oracle Health's next-generation healthcare technology initiatives. The role involves collaborating with various teams to design resilient systems, improve service health, and support modernization efforts across healthcare workflows and AI-driven capabilities.
Responsibilities
- Design, build, test, and operate reliable cloud infrastructure, platform capabilities, and services on Oracle Cloud Infrastructure and legacy deployment models
- Partner with software engineering teams to develop scalable, resilient services, APIs, integrations, and distributed systems
- Forecast capacity needs, analyze service trends, and take proactive steps to ensure systems can support current and future workloads
- Monitor service health, availability, latency, performance, and capacity using observability and reporting tools
- Participate in incident response, troubleshooting, root cause analysis, postmortems, and follow-up remediation
- Develop automation, scripts, and tooling to support provisioning, deployment, monitoring, metrics collection, mitigation, and remediation
- Support CI/CD, DevOps, infrastructure automation, and operational readiness practices
- Investigate and debug issues across applications, infrastructure, services, and dependencies to help teams meet service level objectives
- Identify performance bottlenecks and reliability risks, then recommend and implement improvements
- Collaborate with product managers, architects, engineers, security, operations, and customer teams to deliver secure, customer-focused healthcare solutions
- Support modernization efforts involving cloud-native architectures, healthcare interoperability, large-scale healthcare data platforms, and AI-enabled capabilities
- Communicate service health, operational risks, capacity concerns, and the potential impact of infrastructure, feature, or tooling changes
- Contribute to documentation, runbooks, incident records, operational standards, and knowledge sharing
- Participate in on-call rotations and operational support for production services
Skills
- Experience in building and operating reliable cloud infrastructure, platform capabilities, and services on Oracle Cloud Infrastructure and legacy deployment models
- Ability to partner with software engineering teams to develop scalable, resilient services, APIs, integrations, and distributed systems
- Experience in forecasting capacity needs, analyzing service trends, and taking proactive steps to ensure systems can support current and future workloads
- Proficiency in monitoring service health, availability, latency, performance, and capacity using observability and reporting tools
- Participation in incident response, troubleshooting, root cause analysis, postmortems, and follow-up remediation
- Development of automation, scripts, and tooling to support provisioning, deployment, monitoring, metrics collection, mitigation, and remediation
- Support for CI/CD, DevOps, infrastructure automation, and operational readiness practices
- Ability to investigate and debug issues across applications, infrastructure, services, and dependencies to help teams meet service level objectives
- Skill in identifying performance bottlenecks and reliability risks, then recommending and implementing improvements
- Collaboration with product managers, architects, engineers, security, operations, and customer teams to deliver secure, customer-focused healthcare solutions
- Support for modernization efforts involving cloud-native architectures, healthcare interoperability, large-scale healthcare data platforms, and AI-enabled capabilities
- Communication of service health, operational risks, capacity concerns, and the potential impact of infrastructure, feature, or tooling changes
- Contribution to documentation, runbooks, incident records, operational standards, and knowledge sharing
- Participation in on-call rotations and operational support for production services
Benefits
- Medical, dental, and vision insurance, including expert medical opinion
- Short term disability and long term disability
- Life insurance and AD&D
- Supplemental life insurance (Employee/Spouse/Child)
- Health care and dependent care Flexible Spending Accounts
- Pre-tax commuter and parking benefits
- 401(k) Savings and Investment Plan with company match
- Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
- 11 paid holidays
- Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
- Paid parental leave
- Adoption assistance
- Employee Stock Purchase Plan
- Financial planning and group legal
- Voluntary benefits including auto, homeowner and pet insurance
Company Overview
Company H1B Sponsorship