All roles

Lead Site Reliability Engineer (GCP & Hybrid Cloud) Hybrid

Remote · USA Full-time New today

About the position Join Cisco’s Enterprise AI team, the core group enabling Generative AI powered experiences across Cisco. Our mission is to build secure, scalable AI platforms that empower teams to safely develop, deploy, and operationalize AI-powered solutions. We operate at the intersection of applied AI, cloud infrastructure and security - partnering across engineering, security, compliance, and product teams to bring trusted AI to life at an enterprise scale. We are a fast-growing, highly collaborative team of platform engineers, AI engineers, and data scientists who value technical depth, ownership, and pragmatic execution. What makes this team exciting is the opportunity to define how secure Generative AI is built and governed inside a global technology leader. As a Lead SRE, you will own the architectural integrity of our hybrid cloud infrastructure, ensuring our GCP and on-premise Kubernetes environments are resilient and secure. You will set the standard for automation and reliability that enables our AI models to scale globally.

Responsibilities

  • Lead the architectural design of scalable hybrid-cloud environments, managing GCP and On-premise Kubernetes clusters with Anthos Service Mesh (ASM) and Istio.
  • Direct the implementation of Identity and Access Management (IAM) policies and GCP Quota management to ensure secure and cost-effective resource utilization.
  • Architect multi-region, load-balanced microservices with DDoS hardening, end-to-end encryption, and automated secrets management.
  • Design a comprehensive observability strategy using Elasticsearch and Kibana to provide proactive alerts on service performance and cost envelope management.
  • Partner with development leads to integrate "Security by Design" into the automation and AI agent lifecycle using Apigee for secure API management.

Requirements

  • Bachelor’s Degree in Computer Science, Engineering, or a related field.
  • 7+ years of experience in Cloud/On-prem Operations, SRE, or DevOps.
  • Expert-level proficiency with Terraform, Kubernetes (GKE & On-prem), and Docker.
  • Hands-on expertise with Anthos Service Mesh (ASM), Istio, and Apigee.
  • Deep understanding of IAM implementation and GCP Quota management.

Nice-to-haves

  • GCP Professional Cloud Security Engineer or Network Engineer certification.
  • Experience with the ELK stack (Elasticsearch/Kibana) for large-scale observability.
  • Strong financial acumen for cloud cost optimization and proactive budget alerting.
  • Experience managing complex traffic between cloud platforms and on-premise data centers.

Apply tot his job Apply To this Job

Related roles

Senior Infrastructure Engineer/SRE

Remote · USA Full-time

Staff Site Reliability Engineer

Remote · USA Full-time

Software Engineer (Python + Kubernetes)

Remote · USA Full-time

Senior Systems Software Engineer, Containers and Kubernetes

Remote · USA Full-time

Kubernetes Networking Platform Engineer :: Bethesda, MD (Remote)

Remote · USA Full-time

Senior DevOps Engineer - Kubernetes Focused (Hub-Remote: DC or Philly Metro)

Remote · USA Full-time

Senior Software Engineer, Managed Orchestration (Managed Kubernetes)

Remote · USA Full-time

Forward Deployed Engineer, AI Inference (vLLM and Kubernetes)

Remote · USA Full-time

Java Engineer Level III - AWS , Kafka, Kubernetes (MEXICO ONLY)

Remote · USA Full-time

Ranchester Kubernetes Engineer; USC or GC W2

Remote · USA Full-time

Chief Revenue Office - Automotive AI

Remote · USA Full-time

Litigation Paralegal / Legal Assistant IV Supporting US Attorney with Security Clearance

Remote · USA Full-time

Client Services Specialist II - Evening and Overnight Shifts

Remote · USA Full-time

Bookkeeper / Staff Accountant

Remote · USA Full-time

Experienced Salesforce Data Entry Returnee: Global/Remote Opportunity to Drive Social Impact

Remote · USA Full-time

Associate Research Scientist, RWE – Contractor

Remote · USA Full-time

BSA/AML Analyst

Remote · USA Full-time

Experienced Chat Support Representative – Delivering Exceptional Customer Experience in arenaflex's Fast-Growing Property Management Software

Remote · USA Full-time

Experienced Director of Chance Administration – Hazard the board and Business Development

Remote · USA Full-time

Experienced Remote Customer Support Specialist – Delivering Exceptional Customer Experiences with arenaflex

Remote · USA Full-time