← all jobs

HPC Solutions Engineer

Work from home Full-time role Hiring

Our Company: Hydra Host is a startup focused on building the first true marketplace for compute. We believe people should have the power to make their own decisions about which compute they use without sacrificing trust for ease of use. We offer an alternative to the cloud hyperscaler model by directly matching independent data centers and compute providers to end users. Furthermore, we have much higher diversity in providers and hardware configurations than anyone on the market. That makes it a tougher challenge, but a bigger learning and innovation opportunity. For this we are seeking a HPC Solutions Engineer to join our team and own this arena. Our Philosophy: Hydra Host is a people-centric, mission-driven, remote-friendly, experimental company. Results > titles. We believe in individual freedom, self-expression, and preserving an online culture of heterodoxy that has driven massive human flourishing. Job Summary: As a High Performance Compute Solutions Engineer on our team you will report directly to the Co-Founder & CTO and work collaboratively with other team members. You will be responsible for bespoke scoping and execution of high end compute resource orchestration for premium clients with the mission of configuring and maintaining large scale GPU clusters to best serve them. This includes installing, configuring, and monitoring dependencies required by customers and working with a wide variety of teams and workloads to ensure they can effectively leverage their large scale clusters. Key Responsibilities: Work with customers in technical discovery to help define requirements and deliverables for their use cases and help them to effectively utilize distributed GPU computing resources. Identify and recommend the best tools for each customer, while building boilerplate and reference implementations that can be reused for subsequent customers. Manage GPU clusters and coordinate with IT to ensure efficient operation of NVIDIA GPUs, utilizing technologies such as InfiniBand for high-speed networking. Oversee the deployment and maintenance of machine learning environments using virtual storage solutions and distributed computing (HPC) tools such as SLURM. Automate infrastructure provisioning and management using tools such as Ansible and Terraform. Collaborate with data scientists and engineers to ensure seamless integration of ML models into production environments. Conduct performance tuning and optimization of systems to maximize throughput and reduce latency. Stay current with the latest industry trends in machine learning technologies and HPC to ensure the use of best practices in infrastructure setup and model development. Document and maintain operational procedures and system configurations. Required Qualifications: 7+ years of high performance compute, distributed machine learning, GPU computing, and/or system architecture experience. Proficient in managing NVIDIA GPU environments and a familiarity with GPU computing frameworks and libraries. Strong experience with high-speed networking technologies, specifically InfiniBand. Experience with HPC job schedulers, preferably SLURM. Expertise in automating environment setup and maintenance using Ansible and Terraform. Demonstrated ability in deploying and managing virtual storage solutions. Strong coding skills in Python and familiarity with machine learning libraries and frameworks. Excellent problem-solving, communication, and teamwork skills. Personal Skills: A strong communicator who is empathetic, personable and agreeable Conscientious, meticulous, organized, and self-motivated, with the ability to define goals and prioritize workloads. An effective team player with the ability to take ownership with a results-oriented growth mindset Benefits: You will work with the most diverse hardware configurations and locations available to anyone in the industry as well as cutting edge GPU use cases This role is fully remote with a high accountability and high agency culture This role offers a competitive salary, equity and benefits This role offers flexible PTO Salary: Negotiable

More open positions

Clinical Application Specialist, APT (Hyderabad Location)

Work from home Full-time role

Area Sales Manager - Champagne Bourgogne

Work from home Full-time role

Epic Beaker Analysts

Work from home Full-time role

Staff Solutions Engineer (Japan)

Work from home Full-time role

Investment Banking Associate, AI Internship | Applied AI x Corporate Finance

Work from home Full-time role

Automation Application Specialist

Work from home Full-time role

Client Specialist (Account Manager)

Work from home Full-time role

Substation Principal Engineer - Remote

Work from home Full-time role

Manager-Technology Implementation Specialist – Global Medical Information Contact Centers

Work from home Full-time role

Cerchiamo Insegnanti di Arte per Ripetizioni Scuole Medie

Work from home Full-time role

Event Manager - Remote (6144)

Work from home Full-time role

External Hiring|Operations Customer Expert II|International|17072026

Work from home Full-time role

[Remote] Mechanical Engineer

Work from home Full-time role

[Remote] Account Manager (EST Only)

Work from home Full-time role

Strategic Operations Manager, Accounts

Work from home Full-time role

Customer Service Representative – Remote 12 pm – 9 pm EST Shift – Frontline Athlete Support Specialist at careerzynith

Work from home Full-time role

QA Lead - AI Consulting Company - Colombia

Work from home Full-time role

Senior Software Engineer, Windows/Desktop Applications - Colorado Springs, CO, USA

Work from home Full-time role

Remote Insurance Underwriter

Work from home Full-time role

Remote Customer Service Representative – Global Travel Support & Booking Specialist (Remote) – Join careerzynith’s High‑Impact Customer Experience Team

Work from home Full-time role

Social Services Care Manager

Work from home Full-time role