[Remote] AI/ML Engineer With Site Reliability Engineer -Focus) - Remote project
Note: The job is a remote job and is open to candidates in USA. Dice is seeking an experienced Machine Learning Engineer with a strong Site Reliability Engineering mindset to join their team. The role involves maintaining applications in Windows and Linux environments, managing on-premises servers, and working with Kubernetes clusters to ensure reliable deployment and monitoring of machine learning models.
Responsibilities
- Maintain and support machine learning applications running on Windows and Linux servers in on-premises environments
- Manage and troubleshoot Kubernetes clusters hosting ML workloads
- Collaborate with data scientists and engineers to deploy machine learning models reliably and efficiently
- Implement and maintain monitoring and alerting solutions using DataDog to ensure system health and performance
- Debug and resolve issues in production environments using Python and monitoring tools
- Automate operational tasks to improve system reliability and scalability
- Ensure best practices in security, performance, and availability for ML applications
- Document system architecture, deployment processes, and troubleshooting guides
Skills
- Proven experience working with Windows and Linux operating systems in production environments
- Hands-on experience managing on-premises servers and Kubernetes clusters and Docker containers
- Strong proficiency in Python programming
- Solid understanding of machine learning concepts and workflows
- Experience with machine learning model deployment and lifecycle management
- Familiarity with monitoring and debugging tools, e.g. DataDog
- Ability to troubleshoot complex issues in distributed systems
- Experience with CI/CD pipelines for ML applications
- Familiarity with AWS cloud platforms
- Background in Site Reliability Engineering or DevOps practices
- Strong problem-solving skills and attention to detail
- Excellent communication and collaboration skills
- We need an engineer who is also familiar with model development
Company Overview