[Remote] Senior ML Infrastructure Engineer - Embodied AI Scaling Foundations
Note: The job is a remote job and is open to candidates in USA. General Motors is redefining mobility by creating vehicles and experiences designed to enhance safety and connectivity. They are seeking a Senior ML Infrastructure Engineer to build critical infrastructure for machine learning engineers working on autonomous driving models, aiming to accelerate the machine learning development cycle and improve the performance of their vehicles.
Responsibilities
- Lead the design, implementation, and deployment of scalable platforms and tools that drive machine learning model training and evaluation workflows across GM
- Own complex technical projects end-to-end, making key architectural decisions and technical trade-offs. You will be a core contributor to team planning, design reviews, and code quality
- Take a holistic view of projects, considering their impact across multiple teams, and proactively drive technical prioritization. Collaborate closely with partner teams to ensure maximum benefit from the systems we build
- Help shape our team through technical interviewing with high, well-calibrated standards, and play an essential role in recruiting. Mentor and onboard junior engineers and interns, helping them grow their careers
Skills
- 3+ years of experience building large-scale distributed systems/applications or advanced ML Applications
- Proven track record of building robust frameworks with high-quality, long-lasting APIs
- Deep understanding and practical experience with machine learning algorithms
- Expertise in building reliable, highly performant, and cost-efficient systems leveraging modern cloud infrastructure
- Hands-on experience with the entire ML development lifecycle and MLOps practices
- Demonstrated ability to collaborate effectively across multiple teams and organizations
- Proficiency working with containerization and orchestration technologies (Docker, Kubernetes)
- A strong passion for self-driving technology and its transformative potential
- Exceptional coding skills in Python or C++
- BS, MS, or PhD in Computer Science, Math, or equivalent practical experience
- Experience with distributed training methodologies
- A background in optimizing model training performance
- Experience scaling model training across large clusters of GPUs/CPUs or other accelerators
- Familiarity with deep learning frameworks such as PyTorch, TensorFlow, etc
- A strong grasp of performance profiling and state-of-the-art training optimization algorithms, including their performance characteristics and effect on model convergence
- Experience with advanced build systems (Bazel, Buck, Blaze, or Cmake)
Benefits
- Medical
- Dental
- Vision
- Health Savings Account
- Flexible Spending Accounts
- Retirement savings plan
- Sickness and accident benefits
- Life insurance
- Paid vacation & holidays
- Tuition assistance programs
- Employee assistance program
- GM vehicle discounts
Company Overview
Company H1B Sponsorship