[Remote] AI Research Engineer (Multi-Modal Reinforcement Learning) - 100% Remote Worldwide
Note: The job is a remote job and is open to candidates in USA. Tether.io is pioneering a global financial revolution through their innovative digital finance solutions. They are seeking an AI Research Engineer to drive innovation in multi-modal reinforcement learning, focusing on optimizing decision-making and adaptive behavior across various data modalities to enhance AI performance in real-world challenges.
Responsibilities
- Conduct research on reinforcement learning algorithms for multimodal models, including diffusion-based approaches for image autoregressive models for multimodal understanding, and unified frameworks that integrate multiple modalities
- Design and build reinforcement learning infrastructure that supports scalable, distributed training across multimodal systems while maintaining efficiency and reliability
- Develop and refine reward modeling strategies that improve training stability, align model behavior with desired outcomes, and mitigate reward hacking and related failure modes
- Create and curate multimodal simulation environments and datasets to support robust training, evaluation, and benchmarking of reinforcement learning systems
- Design and conduct rigorous benchmarking and evaluation protocols to measure model performance, track progress against baselines, and validate improvements across multimodal tasks
- Analyze and optimize policy performance across modalities by identifying bottlenecks in training, credit assignment, and cross-modal alignment
- Investigate and develop next-generation reinforcement learning paradigms that more effectively learn from environment feedback, with the goal of achieving superior state-of-the-art (SOTA) performance
- Publish research findings in top-tier conferences such as ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV etc
Skills
- A Master's degree in Computer Science or a related field is required
- Proven experience running large-scale reinforcement learning experiments in multimodal and vision-centric systems, including online RL settings, with demonstrated impact on domain-specific decision-making and measurable improvements in policy performance
- Deep understanding of reinforcement learning algorithms and optimization methods applied to vision and multimodal learning problems, with a focus on improving policy stability, exploration, and sample efficiency in complex, high-dimensional environments involving images, video, and other modalities
- Strong proficiency in PyTorch and deep learning frameworks for vision and multimodal AI, with hands-on experience building end-to-end RL pipelines covering simulation, training, evaluation, and deployment in production-grade systems
- Demonstrated ability to apply empirical research to solve core RL challenges in multimodal and vision tasks, such as sample inefficiency, exploration-exploitation tradeoffs, and training instability, along with experience designing robust evaluation frameworks and iterating on algorithmic improvements to advance agent performance
- Proven track record of research publications in top-tier conferences such as ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV etc
- A PhD in Machine Learning, NLP, Computer Vision, or a closely related discipline is preferred, along with a strong track record of AI research and publications in top-tier conferences
Company Overview