[Remote] Clinical Data Manager (Senior)
Note: The job is a remote job and is open to candidates in USA. Bioptimus is building the first universal AI foundation model for biology to fuel breakthrough discoveries and accelerate innovation in biomedicine. They are seeking a Clinical Data Manager to bridge the gap between unstructured, real-world data and AI models, focusing on structuring clinical datasets and ensuring data quality.
Responsibilities
- Participate directly in technical conversations with external partners (hospitals, research institutions, CROs/CMOs). Dive into the details of diverse clinical data structures to understand how data is captured, stored, and extracted
- Translate ambiguous source data into harmonized, AI-ready assets
- Map and align diverse clinical data to industry-standard biomedical ontologies (e.g., SNOMED, ICD, etc…) with an emphasis on clinical oncology and immunology data
- Design, build, and maintain data dictionaries, schemas, and metadata models that align with STELA’s multimodal pipeline requirements, while ensuring integration with existing pipelines
- Establish, automate, and enforce data quality control (QC) and validation frameworks to check incoming partner data for integrity, completeness, and programmatic consistency
- Write production-grade Python code to automate data cleaning and harmonization tasks
- Practical understanding of how clinical data is generated in the real world (hospitals, trials, CROs). You understand the gaps between ideal protocols and messy clinical realities, and you know what red flags to look for in incoming data
- You know what questions to ask partners to get to the 'ground truth' of their data structures. Actively audit data to find missing variables, anomalies, and hidden biases
- Familiarity with cancer progression metrics (e.g., RECIST criteria, TNM staging, longitudinal treatment lines like immunotherapy vs. chemotherapy) so you can recognize what data is important
Skills
- Bachelor's or Master's degree in Life Sciences, Bioinformatics, Health Informatics, Computer Science, Statistics, or a related quantitative field. Equivalent practical industry experience is highly valued
- A few years (typically 3–5+) of hands-on experience in clinical data management or clinical data engineering within a CRO, CMO, pharma, or biotech environment. Proven track record of taking messy partner data and building reproducible, production-grade workflows
- High proficiency in Python and standard data science libraries (e.g., Pandas, NumPy) for data manipulation, cleaning, and validation
- Demonstrated commitment to code reproducibility, including strong experience with Git version control and building reusable data pipelines
- Familiarity with clinical data structures, electronic health records (EHR), case report forms (CRFs), and longitudinal clinical trial data
- Knowledge of standard clinical and biological ontologies, specifically those tailored to cancer/oncology and/or immunology datasets
- Ability to align on data delivery formats with a partner clinical teams
- Comfort working in a fast-paced startup environment where data schemas evolve and ingest requirements must be defined from scratch
- Experience with cloud computing platforms (AWS, GCP, etc…)
- Experience working directly with multimodal datasets (e.g., matching clinical records with omics or digital pathology imaging)
- Understanding of CDISC standards (SDTM/ADaM) combined with a modern tech-stack approach (beyond legacy SAS programming)
- Experience building or optimizing ETL pipelines for large-scale biobanks or multinational clinical consortia
Benefits
- Competitive compensation, equity, and flexibility (remote options).
Company Overview