[Remote] Staff AI Engineer
Note: The job is a remote job and is open to candidates in USA. HubSync is building the AI platform for the tax and accounting profession. As a Staff AI Engineer, you will be a technical leader for the Halo platform, responsible for setting architectural direction and ensuring the reliability of the systems built for CPA firms.
Responsibilities
- Set architectural direction for how we build agentic systems
- Define the patterns and guardrails the rest of the AI engineering team builds on
- Be accountable for the reliability and trustworthiness of what we ship
- Design and build the hardest parts of the system
- Move fluidly across backend, data, infrastructure, and product
- Raise the engineering bar of the people and vendor teams around you
- Partner directly with product, AI, and firm-facing teams
- Set the technical direction for agentic workflow orchestration
- Define the orchestration patterns the team reuses
- Own the durable run-record and state layer that makes long-running agents auditable and resumable
- Set the architecture for accuracy, coverage, and the validation layer for document intelligence at scale
- Stand up the eval and observability discipline as a platform capability
- Make non-deterministic agent output trustworthy for professionals who cannot accept errors
- Optimize the trade-offs across document types, complexity levels, and client tiers during peak tax-season volume
Skills
- 8+ years building and shipping backend systems in production environments where uptime and correctness matter, including several years operating them
- A track record of leading the design of complex, distributed, or high-scale systems from architecture through deployment and ongoing operations, with enterprise-grade features that users depend on today
- Hands-on production experience with agent orchestration frameworks (LangGraph or equivalent) and long-running, stateful, multi-step agentic workflows. You will set the orchestration and state patterns the team builds on, so you need to have shipped this class of system, not just read about it
- Demonstrated technical leadership beyond your own commits: setting patterns and standards, driving cross-team or cross-functional initiatives, mentoring engineers, and influencing decisions across an organization
- Deep experience with relational databases (PostgreSQL or equivalent): schema design, query optimization, data modeling, and migrations
- Hands-on work with event-driven architectures: message queues, async processing, and distributed job execution
- Production experience with AWS (Lambda, SQS, S3, ECS) or an equivalent cloud platform
- Comfort reading and writing both TypeScript and Python, or clear evidence you pick up a second language fast
- Experience across the full software delivery lifecycle: design, implementation, testing, deployment, monitoring, and incident response
- Sound judgment on build versus buy, and the ability to make and defend architectural trade-offs under real time and cost constraints
- Depth in multi-agent coordination specifically: designing supervision, routing, and hand-off between multiple cooperating agents
- Familiarity with RAG architectures, vector databases, or document processing pipelines
- Experience with multi-tenant SaaS architecture: schema isolation, tenant-scoped data, and access control
- Background in document intelligence: OCR, structured extraction from PDFs, and form understanding
- Experience standing up evaluation, observability, or quality systems for ML or LLM products (offline and online eval, regression detection, cost and accuracy attribution)
- Work in a regulated or high-trust domain (tax, finance, legal, healthcare) where output correctness is non-negotiable
- Open-source contributions, technical writing, or other public evidence of engineering depth
Company Overview