[Remote] Site Reliability Engineer (SRE) – Data Analytics & Observability
Note: The job is a remote job and is open to candidates in USA. Diverse Lynx is seeking a highly skilled Site Reliability Engineer (SRE) with a focus on Data Analytics, Observability, and Reporting to enhance enterprise production systems. The role involves applying SRE principles, developing operational dashboards, and integrating observability tools to improve system reliability and performance.
Responsibilities
- Apply SRE principles (SLIs, SLOs, error budgets) to improve system reliability
- Implement proactive monitoring, alerting, and self-healing capabilities
- Lead incident response, RCA, and postmortems
- Drive continuous improvement in availability, scalability, and resilience
- Design and deliver operational dashboards and reports using Power BI
- Leverage Splunk and Dynatrace to analyze logs, metrics, and traces
- Correlate data across platforms to identify trends, anomalies, and risk patterns
- Use Snowflake, Oracle, and MS SQL Server SQL to query, transform, and analyze operational datasets
- Build data models and curated datasets to support reporting and analytics
- Translate operational data into actionable insights for engineering and leadership
- Administer and optimize: Dynatrace (APM, Grail, DQL, synthetic monitoring)
- Create alerting strategies aligned to SLOs and business priorities
- Integrate observability tools with enterprise reporting and ITSM systems
- Develop and maintain Power BI dashboards, reports, and semantic models
- Integrate Power BI with Snowflake, Oracle, MS SQL Server, Splunk, and operational data sources
- Optimize query performance, data refresh, and dataset design
- Implement row-level security and governance controls
- Support enterprise reporting standards and governance
- Write and optimize SQL across: Snowflake (advanced analytics, semi-structured data), Oracle (PL/SQL, performance tuning, indexing strategies), MS SQL Server (T-SQL, stored procedures, query optimization)
- Perform cross-platform data analysis and reconciliation
- Support data modeling (views, marts, transformations) for analytics
- Troubleshoot data performance issues across heterogeneous platforms
- Partner with data engineering teams to improve data quality, lineage, and availability
- Develop automation using PowerShell (primary), Python, or REST APIs
- Build automation workflows for: Monitoring enhancements, Incident enrichment, Data extraction, transformation, and reporting
- Create self-service tooling for operations teams
- Integrate automation with ServiceNow, schedulers, and observability tools
- Integrate monitoring with ServiceNow (incident, event, change management)
- Automate ticket creation, enrichment, and routing workflows
- Ensure alignment with ITIL best practices
- Support and optimize Managed File Transfer (MFT) platforms
- Monitor and troubleshoot file transfer failures, protocol issues, and throughput
- Manage and support enterprise schedulers: Control-M, Stonebranch, Redwood
- Analyze batch workflows, dependencies, and SLA adherence
Skills
- Bachelor's degree or equivalent experience
- 5+ years in SRE, DevOps, or Production Support
- Strong knowledge of SRE principles and reliability engineering practices
- Hands-on experience with Dynatrace (APM, DQL, observability)
- Hands-on experience with Splunk (search, SPL, dashboards)
- Hands-on experience with Power BI (data modeling, DAX, performance tuning)
- Hands-on experience with SQL across multiple platforms: Snowflake, Oracle, MS SQL Server
- Hands-on experience with PowerShell automation and scripting
- Hands-on experience with ServiceNow integration
- Experience with Snowflake data platform
- Experience with Oracle and SQL Server databases in enterprise environments
- Experience with MFT tools (Axway, Globalscape, JSCAPE, Boomi MFT)
- Experience with file transfer protocols (SFTP, FTPS, HTTPS, AS2)
- Experience with enterprise schedulers (Control-M, Stonebranch, Redwood)
- Knowledge of cloud and hybrid architectures
- Experience integrating Power BI with Snowflake, Oracle, and SQL Server
- Strong understanding of cross-platform data architecture and ETL/ELT patterns
- Familiarity with Dynatrace Davis AI and automation workflows
- Advanced Splunk data modeling and ingestion optimization
- Exposure to Chaos Engineering (e.g., Gremlin)
- Certifications: Dynatrace, Splunk, Snowflake, Microsoft (Power BI / SQL Server), Oracle, ITIL
Company Overview
Company H1B Sponsorship