[Remote] Principal AI Platform Engineer
Note: The job is a remote job and is open to candidates in USA. Lynx delivers modular, open standards–based software that transforms how high-assurance, mission-critical edge systems are built, deployed, and maintained. They are seeking a Principal AI Platform Engineer to own the AI platform as the engineering backbone for AI-assisted certification and engineering workflows, ensuring it is secure, stable, measurable, and extensible.
Responsibilities
- Define and enforce the platform standard for how AI tools use orchestration frameworks, prompt assets, tracing, and metadata
- Bring existing advanced tools into alignment with shared platform conventions while preserving important agentic or workflow-specific behavior
- Build and maintain Azure-based production infrastructure, including networking, identity, secrets, storage, database, monitoring, and deployment patterns
- Implement infrastructure as code and CI/CD for sandbox-to-production promotion
- Deepen LLMOps capabilities, including prompt versioning, golden datasets, automated evaluations, cost tracking, feedback loops, regression detection, and release controls
- Own secure integrations with CodeBeamer, GitHub, and event-driven APIs or webhooks
- Establish operational discipline through logging, alerting, rollback, test coverage, runbooks, rate limiting, and supportability
- Partner with engineering, IT, security, and compliance stakeholders to support auditable AI-assisted workflows
- Own and evolve the Platform AI to provide standard and secure approach to access AI assisted capabilities across the organization for certification workflows
- Mentor and coach other senior/intermediate engineers on team, provide technical guidance, and conduct architectural review for trade offs
- Help define technical trajectory of the platform and AI tools
Skills
- 10+ years of relevant experience
- Bachelor's Degree in engineering related discipline preferred
- Strong Python backend engineering and API integration experience
- Strong Azure platform experience, especially Container Apps, VNet/private endpoints, Entra ID, Managed Identity, Key Vault, PostgreSQL, ACR, and monitoring
- Hands-on experience with LLM application frameworks such as LangChain, LangGraph, or close equivalents
- Hands-on experience with LLM observability or evaluation tooling such as Langfuse or equivalent tracing and eval systems
- Experience building CI/CD and infrastructure as code with Terraform, Bicep, GitHub Actions, Azure DevOps, or comparable tools
- Experience securing internal platforms with RBAC, secrets management, service-to-service auth, webhook validation, rate limiting, and audit logging
- Ability to design reliable multi-step or agentic workflows, including retries, state handling, guardrails, and output validation
- Strong operational judgment around testing, rollback, monitoring, alerting, documentation, and runbooks
- Must be a US Citizen
- Experience in regulated, safety-critical, aerospace, defense, medical, or similarly controlled environments
- Familiarity with DO-178C-style traceability, auditability, formal review workflows, or human-in-the-loop approval requirements
- Experience integrating with CodeBeamer, GitHub Enterprise, Jira, or similar enterprise engineering systems
- Familiarity with C/C++ code analysis or test-generation workflows
- Experience with prompt governance, change control, and evaluation datasets
- Some comfort with internal-tool UI work such as React, though this should remain secondary to platform, backend, and infrastructure strength
Benefits
- Low-cost Medical / Dental / Vision coverage options
- 401K with generous employer match
- Responsible Paid Time Off + Paid Holidays
- Remote work opportunities based on role
- Employee Assistance Program (EAP)
- Career growth and professional development opportunities
Company Overview