[Remote] AI Research Engineer (Multi-Modal & Vision) - 100% Remote Worldwide
Note: The job is a remote job and is open to candidates in USA. Tether.io is pioneering a global financial revolution through innovative digital solutions. The AI Research Engineer will focus on training and optimizing vision-language models, driving innovation across the full model development lifecycle, and working with a high-caliber team to push the boundaries of multimodal AI in production environments.
Responsibilities
- Conduct end-to-end research and engineering on vision-language models, covering training, evaluation, and optimization across the full model development lifecycle
- Design and implement post-training pipelines including supervised fine-tuning, knowledge distillation, and reinforcement learning from human feedback
- Develop and maintain high-quality multimodal datasets, including data curation, filtering, and balancing for domain-specific tasks
- Drive model efficiency and deployability, adapting models for resource-constrained environments using compression and optimization techniques
- Design and implement evaluation frameworks and benchmarks to measure model performance, robustness, and real-world task success
- Build and scale training workflows across distributed GPU infrastructure
- Identify and resolve bottlenecks in training pipelines to achieve state-of-the-art model quality on target benchmarks
- Contribute to and leverage open-source ecosystems including models, datasets, and tooling to accelerate development
- Stay current with the latest research in multimodal learning and vision-language systems, translating relevant findings into practical improvements
- Publish research findings in top-tier AI conferences and journals where applicable
Skills
- Degree in Computer Science, Machine Learning, or a related field
- Strong experience with multimodal post-training workflows including supervised fine-tuning, knowledge distillation, and reinforcement learning from feedback
- Hands-on experience with parameter-efficient fine-tuning and distributed training frameworks
- Demonstrated ability to build and improve vision-language models with measurable results on standard benchmarks or real-world tasks
- Experience adapting models for resource-constrained environments
- Proven open-source contributions in multimodal AI on GitHub or HuggingFace
- MS/PhD preferred
- Publications at top AI conferences (NeurIPS, ICML, ICLR, CVPR, ECCV etc.)
Company Overview