Home Job Details
N
Information Technology 🏢 Full Time ⭐️ Verified

Lead AI Infrastructure Engineer - 2026 Readiness

Nexus Future Systems
San Francisco
Estimated Salary
USD 180.000 – USD 280.000
New
Live Update
1 Juli 2026
Deadline
1 Jul 2027

Job Description

We are Nexus Future Systems, a pioneering research organization dedicated to architecting the technological backbone for the Artificial General Intelligence (AGI) era. As we accelerate towards our 2026 roadmap, we are seeking a visionary Lead AI Infrastructure Engineer to design, deploy, and scale the high-performance computing environments required for next-generation models.

In this role, you will bridge the gap between cutting-edge AI research and robust, scalable engineering. You will be responsible for ensuring our infrastructure can handle exascale computing demands, optimizing deep learning workflows, and implementing resilience strategies for future-proof systems.

Why Join Us?

  • Work on the frontier of AI development.
  • Competitive equity package and salary.
  • Flexible remote-first culture with headquarters in SF.

Responsibilities

  • Architect and manage scalable, multi-region Kubernetes clusters optimized for training large language models (LLMs).
  • Design high-throughput, low-latency inference pipelines for real-time AI applications.
  • Implement and enforce rigorous security and compliance protocols for sensitive AI data.
  • Collaborate with data scientists to optimize model training efficiency and reduce compute costs.
  • Plan for and integrate emerging technologies such as quantum-ready hardware interfaces and edge computing nodes.
  • Mentor a team of infrastructure engineers and define technical roadmaps for 2026 and beyond.

Qualifications

  • 8+ years of experience in backend engineering, DevOps, or Site Reliability Engineering.
  • Strong proficiency in Python, Go, or Rust, with deep knowledge of containerization (Docker, Kubernetes).
  • Extensive experience with cloud providers (AWS, GCP, or Azure) and serverless architectures.
  • Proven track record of managing GPU clusters and high-performance computing (HPC) environments.
  • Experience with observability tools (Prometheus, Grafana) and incident management (PagerDuty).
  • Familiarity with AI/ML frameworks (PyTorch, TensorFlow) and MLOps tools (MLflow, Kubeflow).

Required Skills

Kubernetes Python AWS Machine Learning Infrastructure Docker Kubernetes DevOps High-Performance Computing MLOps Cloud Architecture

Ready to Take This Challenge?

Make sure your resume is ready. Submit your application now before the deadline.

Apply Now

Related Jobs

Similar job recommendations for you

View All