[Remote] Senior Artificial Intelligence Engineer
Note: The job is a remote job and is open to candidates in USA. BlueAlly is a leading provider of IT services and solutions, helping organizations conquer IT complexity across various domains. They are seeking a Senior AI Engineer to design, build, and operate enterprise AI systems, leading workstreams independently and mentoring junior engineers while engaging with clients to deliver production AI outcomes.
Responsibilities
- Lead end-to-end design, build, and operation of AI systems on AI Factory platforms (HPE PCAI, Dell AI Factory, Nutanix Enterprise AI, and adjacent ecosystem layers) across multiple client engagements
- Engineer and tune LLM inference serving stacks — primary depth in vLLM with breadth across the inference ecosystem — for client latency, throughput, and cost targets
- Tune inference performance through KV cache management, paged attention, batching strategies, and Dynamo-based disaggregated serving
- Architect and operate MLOps pipelines covering model lifecycle, registries, deployment, rollback, and observability
- Design and engineer RAG applications on top of vector databases — chunking strategies, retrieval tuning, reranking, citation handling, and context-window management
- Build and tune prompt-engineering patterns at production scale — system prompts, structured output, tool and function calling
- Design and maintain LLM evaluation harnesses — golden sets, regression suites, and online quality metrics
- Engineer high-performance storage and networking for AI workloads — parallel filesystems, object storage tiers, and high-throughput, low-latency RDMA fabrics
- Operate Kubernetes clusters underpinning AI workloads — namespaces, RBAC, resource quotas, network policies, storage classes, and ingress
- Build and maintain container images, registries, and CI/CD pipelines for AI/ML services
- Implement monitoring, alerting, logging, and capacity planning across the AI stack
- Harden environments to meet client security and compliance requirements
- Lead troubleshooting across bare metal, BIOS/firmware, OS, containers, GPUs, frameworks, and models
- Engage directly with client stakeholders — technical and executive — to communicate status, root cause, options, and recommendations
- Mentor and code-review work from less senior engineers; raise the technical bar of every engagement you join
- Author runbooks, reference architectures, and knowledge base content; lead client knowledge transfer and enablement sessions
- Participate in on-call rotation and incident response for production AI workloads
- Contribute reusable patterns, tooling, and reference designs back to the practice
Skills
- Experience: 7+ years of software, data, or infrastructure engineering, with 3+ years specifically working with modern AI / LLM systems
- Software engineering: Production-quality Python at engineering level — testing, code review, version control fluency, and shipping code that other engineers depend on
- Linux engineering: Deep production Linux experience, including system internals, performance tuning, and troubleshooting
- Containers: Deep proficiency with Docker — image build, registry management, runtime tuning, and container security
- Hardware fundamentals: Strong server-platform skills including CPU/GPU topologies, PCIe, BMC management, BIOS/firmware lifecycle, and physical-to-logical troubleshooting
- AI Factory platforms: Hands-on experience deploying and operating one or more of HPE PCAI, Dell AI Factory, or Nutanix Enterprise AI
- Inference stack — vLLM: Production experience deploying, tuning, and operating vLLM
- Inference stack breadth: Working knowledge of multiple inference and model-serving frameworks beyond vLLM, with the ability to choose and tune the right tool for each workload
- High-performance storage and networking: Hands-on experience with high-throughput, low-latency storage and network fabrics for AI workloads — including RDMA-class interconnects, parallel/object storage tiers, KV cache management, and Dynamo-style disaggregated serving
- MLOps: Practical experience operating MLOps tooling and patterns — model registries, deployment pipelines, GitOps, lineage, and rollback
- Vector databases and RAG: Hands-on experience deploying, tuning, and integrating vector databases and RAG pipelines, including the application-level engineering that sits on top of them
- Prompt engineering and tool use: Production experience designing system prompts, structured output, function calling, and tool-using LLM patterns
- Evaluation methodology: Demonstrated experience designing LLM evaluation harnesses — golden sets, regression suites, and quality/cost metrics
- Client-facing skills: Demonstrated ability to engage directly with client stakeholders — running working sessions, presenting recommendations, and translating technical detail for non-technical audiences
- Communication: Strong written and verbal communication — clear reference architectures, runbooks, and incident reports
- Mentorship: Track record of mentoring more junior engineers and raising team technical quality through code review and pairing
- Networking fundamentals: TCP/IP, DNS, load balancing, VLANs, and firewall administration
- Multi-client delivery: Comfort working across multiple concurrent client environments and managing competing priorities under SLA
- GPU operations: Experience with GPU drivers, CUDA toolchains, GPU partitioning (MIG/vGPU), and GPU-level monitoring
- NVIDIA AI Enterprise: Deployment and operations experience with the NVAIE software stack
- Ray: Familiarity with Ray for distributed training and inference scaling
- Kubernetes: Working knowledge of Kubernetes administration — Helm, ingress, RBAC, storage classes
- Identity and access: Integrating SSO and enterprise identity (LDAP, AD, OIDC/SAML), secrets management, tenant isolation
- Fine-tuning: Familiarity with LoRA/QLoRA/PEFT and supervised fine-tuning workflows
- Token economics: Experience optimizing inference cost — caching, prompt caching, model routing, and distillation
- MSP / multi-tenant operations: Service-provider experience including chargeback/showback and tenant isolation patterns
- Compliance frameworks: SOC 2, HIPAA, FedRAMP, FISMA, or CMMC environments
- Public cloud and hybrid: Working experience with one or more public clouds and hybrid architectures
- Infrastructure as Code: Terraform, Ansible, Helm, or similar
Company Overview
Company H1B Sponsorship