[Remote] Senior Network Operations Manager
Note: The job is a remote job and is open to candidates in USA. Black Mountain Dynamics is seeking a Senior Network Operations Manager to oversee the reliability and performance of a global enterprise network. This role involves leading the operations team, managing incidents, and ensuring continuous improvement in network operations.
Responsibilities
- Lead the operations team: Manage, mentor, and develop the Tier 1 and 2 NOC and network engineering staff supporting the account; own onboarding, performance, and career development
- Own staffing models, on-call rotations, and shift scheduling to guarantee continuous coverage of a round-the-clock operation without single points of failure
- Serve as the senior escalation owner and operational decision-maker; hold the team accountable for SLA attainment, quality, and adherence to standards
- Forecast workload and headcount needs, and make the case for resourcing to both Black Mountain Dynamics and client leadership
- Own the availability and performance targets for the enterprise network (99.99%+ uptime), and be accountable for the metrics behind them
- Drive down Mean Time to Repair through better tooling, telemetry, escalation paths, and post-incident action tracking
- Define what “good” looks like for monitoring, alerting, and telemetry, and ensure the team can see and act on network health proactively
- Produce clear operational reporting (availability, incident trends, SLA performance, risk) for client and internal leadership
- Own the Network Acceptance Testing (NAT) framework, ensuring newly deployed infrastructure meets security, scalability, and observability standards before production sign-off
- Direct the hypercare phase following new site launches and major upgrades; ensure anomalies are stabilized and infrastructure is cleanly handed over to steady-state operations
- Oversee the authoring and review of high-risk Methods of Procedure (MOPs) for installing, staging, and upgrading firewalls, core switches, wireless access points, and UPS systems
- Ensure high-availability network operations across critical facilities (e.g., automated data centers, localized data-ingress hubs, and fleet maintenance facilities), accounting for power, cooling, and structured-cabling constraints
- Ensure high-performance pipelines optimized for massive data ingress/egress (such as local vehicle/fleet data offloading) run without network bottlenecks
- Own the response to P1/P0 disruptions: coordinate the technical bridge, drive rapid service restoration, and communicate business impact to stakeholders in real time
- Own the change-management process for the account; chair or represent operations in change review, ensuring risk assessments minimize production downtime
- Run the problem-management program: ensure Post-Incident Reviews (PIRs) are completed, chronic architectural weaknesses are identified, and permanent remediation is tracked to closure
- Champion the shift from legacy, manual configuration toward automated, template-driven architectures to improve consistency and MTTR
- Own the library of runbooks, configuration baselines, and troubleshooting playbooks that uplift the capability of Tier 1/2 NOC agents
- Act as the primary operational point of contact for the client’s IT operations leadership; translate complex network issues into clear, business-impact summaries
- Manage relationships with hardware vendors, carriers, and support partners, holding them to their SLAs and escalating effectively
Skills
- 8+ years in network engineering, enterprise deployment, or high-velocity network operations, including 3+ years in a formal people-management or team-lead capacity
- Proven track record leading NOC or network operations teams in a 24/7, high-availability environment
- Demonstrated ownership of network operations within critical infrastructure carrying high-availability requirements (99.99%+ uptime)
- Strong hands-on background configuring and troubleshooting multi-vendor network devices via CLI and cloud-managed controllers (e.g., Cisco, Juniper, Arista, Palo Alto, Fortinet) — enough to lead engineers credibly and make sound architectural calls
- Practical, ownership-level command of ITIL Incident, Change, and Problem management
- Excellent verbal and written communication; able to translate technical detail into business-impact narratives for cross-functional and client stakeholders
- Familiarity with network automation tooling (e.g., Python, Ansible, Terraform, NetBox, Jinja2) and how to apply it to deploy and audit infrastructure at scale
- Working knowledge of BGP peering, OSPF, EVPN-VXLAN, stateful firewall policy, and complex traffic engineering
- Experience operating in data center, fleet, mission-critical, or autonomous / high-technology environments
- Experience delivering operations as an embedded contractor or through an MSP relationship
- B.S. in Computer Engineering, Electrical Engineering, Computer Science, or equivalent practical experience. Certifications such as CCNP/CCIE, PCNSE, JNCIP, ITIL, or PMP are a strong plus
Company Overview