[Remote] Senior, ML Engineer - Auto Tagger
Note: The job is a remote job and is open to candidates in USA. Torc Robotics is a leader in autonomous driving technology, focused on developing software for automated trucks. The Senior ML Engineer - Auto Tagger will architect data pipelines, develop algorithms for scenario tagging, and collaborate with cross-functional teams to enhance autonomous vehicle data curation.
Responsibilities
- Architect and optimize distributed data pipelines to process massive multi-sensor logs (camera, LiDAR, radar, kinematics), automatically extracting and cataloging safety-critical and long-tail driving events
- Develop and tune both heuristic-based and ML-assisted algorithms (including exploring Vision-Language Models or semantic vector search) to automatically classify and describe complex environmental and behavioral scenarios
- Extract and format scenario data utilizing the Pegasus layer standard (alongside opensource frameworks) to ensure semantic consistency and rigorous metadata integrity
- Manage the ingestion of tagged events into the observations database, enabling high-speed querying and retrieval for ML training, regression testing, and system validation
- Operate with broad autonomy to drive consensus across organizational boundaries
- Collaborate closely with downstream consumers in perception, simulation, and systems engineering to define what constitutes an 'interesting scenario' and operationalize a continuous data loop
- Guide, mentor, and elevate less-experienced engineers
- Lead design reviews, establish coding standards, and foster a culture of technical excellence and collaborative problem-solving
Skills
- BS or MS in Computer Science, Robotics, Engineering, or a STEM field, with 6+ years in data engineering, ML systems, or autonomous data curation
- Strong Python and SQL skills, with heavy experience processing massive time-series or unstructured datasets
- Hands-on machine learning and dataset curation experience, with a demonstrated history of implementing targeted datasets that measurably improve downstream model performance
- Hands-on experience using Databricks (or similar platforms) for large-scale analytics, interactive querying, and making massive vehicle datasets searchable
- Expertise in distributed compute frameworks (Ray, Spark, Beam) and cloud platforms (AWS, GCP, or Azure) for executing heavy data workloads
- Experience parsing complex data formats and applying scenario-description standards like Pegasus layers
- Exceptional ability to translate complex data engineering challenges into clear strategies for cross-functional stakeholders
- Proven track record of mentoring teams, driving system architecture, and defining engineering roadmaps
- Familiarity with foundational models, auto-labeling pipelines, or zero-shot classification for scenario extraction
- Experience with vLLM, SGLang, or similar frameworks for highly optimized, high-throughput model serving and inference
- Experience with semantic extraction and attribute mapping to help build out a robust semantic inference engine, moving beyond standard bounding-box object detection
- Familiarity with parsing robotics formats (ROS bags, MCAP) and optimizing high-performance columnar storage formats (Parquet, Arrow)
- Knowledge of how scenario data feeds into generative simulation workflows, neural rendering, or sensor fusion validation
- Experience building semantic retrieval systems or vector databases for automotive data
Benefits
- A competitive compensation package that includes a bonus component and stock options
- 100% paid medical, dental, and vision premiums for full-time employees
- 401K plan with a 6% employer match
- Flexibility in schedule and generous paid vacation (availableimmediately after start date)
- Company-wide holiday office closures
- AD+D and Life Insurance
Company Overview
Company H1B Sponsorship