Senior Infrastructure Engineer
About Somnia The Somnia Foundation, backed by Improbable, specialises in building and operating high-performance decentralised technology. Its EVM L1, launched in 2025, has been benchmarked as the fastest on the planet: at over 1 million ERC-20 transactions per second. At Somnia you'll be working alongside a talented team working across the core blockchain along with novel protocols leveraging its capabilities: Somnia Agents adding the ability for smart contracts to leverage the latest open-source LLMs and interact with the outside world via headless browsers, powered by on-chain deterministic LLM inference and a decentralised network of agent runners Prophecy an information oracle that can combine parsing of public websites and public data sources into decentralised, auditable attestations about the real world, powering prediction markets, forecasting and new types of financial systems DreamDEX a high-performance DEX with a fully on-chain CLOB leveraging on Somnia's core capabilities, targeting centralised exchange characteristics Somnia L1 continued work on the high-performance EVM L1, including working on MEV resistance, privacy, automatic gas sponsorship and 'reactivity' where smart contracts can autonomously invoke themselves based on specific timing or event criteria
About the Role
We're looking for a Senior Infrastructure Engineer to build and run Somnia's key backend services: the L1 and node fleet, RPC and indexing layers, product backends, and developer-facing services teams depend on. An SRE-minded role: you make reliability measurable, rollouts safe, and infrastructure repeatable, so every team can move fast without breaking things. You treat infrastructure as a product: automate relentlessly, measure everything, and leverage AI to accelerate development, operations, and incident response.
Key Responsibilities
Define and maintain SLOs, SLIs, and error budgets, plus the observability—metrics, logs, traces and alerts—that catches regressions before users do. Build repeatable, self-service infrastructure through infrastructure-as-code, CI/CD and golden paths so teams can provision, deploy and recover without reinventing the wheel. Own rollouts end-to-end—progressive delivery, canaries, safe migrations and clean rollbacks. Operate the systems behind Somnia's nodes, validators, RPC and indexing, tuning for performance and cost across regions. Lead incident response and on-call, run blameless postmortems, and continuously harden the platform. Partner with product and protocol teams to design and operate production-ready services. You'll rotate between embedding with engineering teams and building the shared platform, tooling and operational standards that underpin the wider organisation.
Requirements
Must Have Strong experience operating production infrastructure at scale (cloud and/or bare metal), with deep Linux fundamentals. Experience with infrastructure-as-code such as Terraform or Pulumi, alongside configuration management. Experience running containers and orchestration platforms (Docker, Kubernetes) in production. Strong programming skills, ideally in Go and/or TypeScript, for building automation and internal tooling. Experience with observability stacks (Prometheus, Grafana, OpenTelemetry or equivalents). Experience operating and monitoring distributed systems, including capacity planning and performance tuning. Comfortable operating in high-stakes production environments and responding to incidents. Genuine interest in crypto and on-chain systems.
Nice to Have
Experience operating blockchain node infrastructure (validators, RPC, archive nodes) for an L1/L2. Experience with high-performance networking, low-latency systems or load balancing at scale. Multi-region and geo-distributed deployments with failover strategies. Security and key management (HSMs, secrets management, hardening). EVM tooling and the wider Web3 infrastructure ecosystem. What Success Looks Like Engineers can deploy safely and frequently with confidence. Platform reliability is measurable, with well-defined SLOs and continuously improving service health. Infrastructure is automated, repeatable and increasingly self-service. Incidents become less frequent, easier to diagnose and faster to resolve. Product teams spend more time shipping features and less time managing infrastructure. How We Work Agent maximalists: We are strong believers in agentic tooling being a massive accelerant to velocity beyond vibe coding prototypes: from equipping engineers with the latest tools through to agentic harnesses for security testing and end-to-end shipping fixes and small features. Ownership and autonomy: A lean team that optimises for individual impact and velocity. Engineers own features end-to-end, from idea to production, and are trusted to make the call to ship. Outcome-oriented: We take on hard technical problems, but the tech is always a means to an end. What matters is what ships and the impact it has, not cleverness for its own sake. Best tool for the job: We solve interesting problems in the simplest way possible. Mostly TypeScript for full-stack apps, C++ for high-performance systems, Go for infrastructure services and Solidity for smart contracts.