[Remote] Lead Machine Learning Engineer, Inference & Performance

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Egen is a fast-growing and entrepreneurial company with a data-first mindset. As a Senior AI Engineer, you will be responsible for the full lifecycle of AI features, focusing on optimizing performance and collaborating with clients to create scalable AI architectures.

Responsibilities

Optimize Inference: Build and tune production LLM serving with vLLM and SGLang—maximizing throughput and minimizing latency through batching, paged attention, quantization, and KV-cache strategies
Profile & Accelerate Training: Instrument and profile training runs to find bottlenecks, then resolve them with the right attention implementations (e.g. FlashAttention) tuned to the underlying hardware (H200, GB200)
Engineer for the Hardware: Apply a working understanding of GPU architecture and attention internals to choose the right approach per accelerator, rather than relying on defaults
Serve at Scale: Deploy and operate multiple models within shared GPU clusters on GKE, with autoscaling, efficient bin-packing, and graceful handling of mixed workloads
Drive Efficiency: Own GPU utilization as a first-class metric—measure it, improve throughput-per-dollar, and continuously raise the ceiling on what our fleet can deliver
Collaborate & Consult: Work directly with clients to understand performance, latency, and cost requirements, and translate them into pragmatic serving and training architectures

Skills

Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field
5+ years of experience in ML/AI engineering, with a meaningful portion focused on performance, infrastructure, or systems
Proven track record of deploying and optimizing models in a production environment
Demonstrated experience profiling and improving GPU utilization for training and/or inference
Mastery of Python and shell scripting; comfort reading and reasoning about lower-level (CUDA-adjacent) performance code is a strong plus
Hands-on experience with vLLM, SGLash, or comparable high-performance serving stacks
Solid grasp of GPU architecture, the fundamentals of LLM inference, and the attention mechanism—including where the bottlenecks live and how FlashAttention and similar techniques address them across hardware generations (H200, GB200)
Fluency with profiling tools to diagnose training and inference bottlenecks (compute-bound vs. memory-bound, kernel-level analysis)
Strong Kubernetes (GKE) experience—deploying and autoscaling multiple models on shared GPU clusters on Google Cloud
A strong software engineering foundation—you write clean, maintainable code, measure before optimizing, and understand the full SDLC
Ownership: You take pride in your work and see optimizations through from profile to production
Curiosity: Hardware and serving frameworks change fast; you are a lifelong learner who stays ahead of the curve
Rigor: You measure before you optimize and let data, not intuition, guide where you spend effort
Consultative Spirit: You enjoy interacting with clients and can translate technical complexity into business value
Ethics: You prioritize responsible AI development and data privacy
Experience with Classic Machine Learning (neural nets, training, tuning) is a strong plus
Knowledge of Data Engineering and SQL

Benefits

Comprehensive Health Insurance
Paid Leave (Vacation/PTO)
Paid Holidays
Sick Leave
Parental Leave
Bereavement Leave
401 (k) Employer Match
Employee Referral Bonuses

Company Overview

Egen is a technology services company with leading capabilities in cloud, data engineering, analytics, AI, and platform engineering. It was founded in 2000, and is headquartered in Naperville, Illinois, USA, with a workforce of 501-1000 employees. Its website is https://egen.ai.

Company H1B Sponsorship

Egen has a track record of offering H1B sponsorships, with 7 in 2026, 58 in 2025, 50 in 2024, 60 in 2023, 111 in 2022, 49 in 2021, 79 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply Now

[Remote] Lead Machine Learning Engineer, Inference & Performance

More open positions

[Remote] MQI Coaching Project Manager

[Remote] National Account Manager (EPC's/ Electrical Contractors Required)

[Remote] Senior Analyst, Customer Development

[Remote] Payroll Accountant Lead

[Remote] AVP, Account Management Retail

Experienced Data Entry Specialist – Remote Opportunity with careerzynith

Online Desk Agent - Entry Level

In-Home Personal Trainer

Outside Sales Representative

[Remote] Franchise Business Consultant - Arby's

Amazon Part Time Jobs $25 An Hour : Remote

Editor, Children's - Global Publishing Job at Paramount in New York

Part-Time Inbound Chat Specialist – Remote Customer Engagement & Lead Generation for Automotive Dealerships

[Remote] Data Engineer

Treasury Manager - Fully Remote

Senior Strategic Medical Writer - Remote

Patient Account Analyst II - Remote

[Remote] Director, Product Marketing

[Remote] Director of Operations, MSP Service Delivery

Revenue & AR Specialist

Operations Manager for Anime Production Studio