[Remote] Research Engineer - AI Systems
Note: The job is a remote job and is open to candidates in USA. Yotta Labs is building the next generation multi-silicon AI cloud and runtime platform to power the world’s most demanding AI workloads. They are seeking a highly motivated AI Systems Research Engineer specializing in Trainium and GPU kernels to optimize AI applications and improve performance on their platform.
Responsibilities
- Design and implement high-performance kernels for Attention, MoE, GEMM, collective communication, and quantization
- Optimize kernels for NVIDIA, AMD, and AWS Trainium
- Develop custom operators and graph optimizations using Neuron SDK, PyTorch/XLA, Torch Dynamo, and Neuron Compiler
- Improve performance of vLLM, SGLang, TensorRT-LLM, and custom inference runtimes
- Design scalable distributed training and inference solutions across thousands of accelerators
- Contribute to open-source projects, publish technical findings and engage with the developer community
Skills
- Proficiency in AI programming languages such as Python and C++
- Deep understanding of GPU architecture and performance optimization
- Experience with CUDA, Triton, ROCm/HIP, or AWS Neuron
- Strong understanding of AI frameworks (e.g., PyTorch, Dynamo, LMCache), model architectures and profiling tools (e.g. Nsight, ROCm Profiler, or Neuron Profiler)
- Strong problem-solving skills and the ability to work in a collaborative, remote environment
- A background in computer science, engineering, or a related field is preferred
- Contributions to open-source AI infra projects like vLLM, SGLang, PyTorch, or Triton
- Experience with with FlashAttention, PagedAttention, MoE, RLHF, or distributed AI systems
- Publications in top-tier conferences like MLSys, OSDI, SOSP, NSDI, SC, HPCA, or ISCA
Benefits
- Competitive compensation with equity
- Enjoy a flexible, remote work environment that values innovation and autonomy
Company Overview
Company H1B Sponsorship