All roles

[Remote] Senior Site Reliability Engineer — Government & Sovereign Cloud

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. Veeam Software is the Data and AI Trust Company, specializing in data resilience and security. The role involves building a global Site Reliability Engineering function for the Veeam Data Cloud, focusing on government and sovereign cloud environments, while ensuring high availability and fault tolerance.

Responsibilities

  • Get up to speed on the full platform — all VDC workloads, dependencies, and risk areas. Much of this will happen through code, docs, and conversations rather than direct environment access
  • Work with SMEs across the org to fill knowledge gaps and build onboarding material for the team
  • Write and maintain runbooks, architecture docs, and operational guides
  • Design infrastructure for high availability and fault tolerance on Azure (including Azure Government)
  • Define SLIs, SLOs, and error budgets where none exist today
  • Run incident response and blameless postmortems. Turn incidents into improvements
  • Identify reliability risks across modern and legacy workloads and build practical remediation plans that work within compliance constraints
  • Close observability gaps — define instrumentation requirements and drive implementation
  • Set alerting, telemetry, and monitoring standards with partner teams
  • Build automation to reduce toil and support fleet management
  • Participate in on-call rotations
  • Work with IaC, CI/CD, deployment automation, and config management — including in air-gapped or compliance-restricted environments
  • Build and maintain testing, canary deployment, and release validation pipelines
  • Integrate chaos engineering and monitoring tools, adapting choices to meet regulatory requirements
  • Work across product, platform, security, legal, compliance, and operations teams
  • Own problems end-to-end — identify gaps, drive solutions, don't wait for direction
  • Mentor other engineers and help spread SRE practices across the org

Skills

  • 7+ years in Software Engineering, with 3+ years in SRE, Platform Engineering, or similar — across multi-service platforms, not just single-service environments
  • Experience with Government or Sovereign Cloud (e.g., Azure Government, AWS GovCloud)
  • Experience in regulated compliance environments — government (FedRAMP, CMMC, IL2/IL4/IL5), financial (PCI-DSS, SOX), or healthcare (HIPAA, HITRUST). You understand how compliance shapes architecture and operations
  • Strong experience building and running production services on cloud infrastructure (Azure preferred, including Azure Government)
  • Able to learn large, complex platforms quickly with limited guidance — comfortable building understanding from code, docs, and architecture artifacts when direct environment access is restricted
  • Can investigate systems independently and produce clear docs, risk assessments, and improvement plans
  • Comfortable working across teams — engineering, product, security, compliance, operations
  • Programming skills in one or more of: TypeScript/JS, Go, Java, C#, or similar
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry, ELK stack)
  • Experience with IaC (Terraform, Terragrunt, Pulumi) and container orchestration (Kubernetes)
  • Experience with CI/CD and GitOps tooling — GitHub Actions, Azure DevOps, GitLab CI, ArgoCD, FluxCD, or Dagger
  • Solid grasp of distributed systems, networking, and cloud-native architecture
  • Clear written and verbal communication skills
  • Experience on B2B SaaS platforms in regulated or government markets
  • Background in chaos engineering, resilience testing, or performance/load testing
  • Have built an SRE or reliability function from scratch before
  • Experience across mixed environments — modern cloud-native and older legacy systems
  • Familiar with AI-first development workflows — using LLM-powered tools for infrastructure automation, code generation, and documentation

Benefits

  • Unlimited paid time off, 12 paid holidays, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
  • Paid parental leave: 8 weeks for all parents, 16 weeks for birthing parents
  • Medical, dental, and vision coverage starting on your first day
  • Mental health support, therapy sessions, and digital wellness tools via our Employee Assistance Program
  • 401(k) retirement plan with company matching contributions
  • Fertility, adoption, and surrogacy support through Maven, plus paid volunteer time
  • AirVet: 24/7 virtual veterinary care at no cost
  • Legal services, identity protection, and supplemental health insurance options
  • Tax-advantaged spending accounts for healthcare, dependent care, and commuting

Apply tot his job Apply To this Job

Related roles

Urgently Need Site Reliability Engineer (Remote) in Saint Paul, MN

Remote · USA Full-time

Senior Site Reliability Engineer - AWS

Remote · USA Full-time

Python and Kubernetes Software Engineer - Data, Workflows, AI/ML & Analytics

Remote · USA Full-time

Datacenter Network Engineer (Remote opportunity)

Remote · USA Full-time

Senior Software Engineer, Kubernetes Platform, Fabric Integration

Remote · USA Full-time

Google Kubernetes Engine

Remote · USA Full-time

Openshift / Kubernetes Engineer

Remote · USA Full-time

Senior System Software Engineer, Kubernetes and KubeVirt

Remote · USA Full-time

System Engineer – Managed Kubernetes

Remote · USA Full-time

Sr. Kubernetes Engineer (Secret Eligible)

Remote · USA Full-time

Experienced Remote Data Entry Specialist - Aviation Industry at United Airlines

Remote · USA Full-time

Experienced Live Chat and Email Support Specialist – Remote Customer Service Representative for Dynamic Online Business Interactions

Remote · USA Full-time

Experienced Online Chat Representative – Automotive Sales and Support Expert

Remote · USA Full-time

Media Relations Associate Director, Communications

Remote · USA Full-time

[Work From Home] IDEMIA Recruitment : Hiring for Freshers as

Remote · USA Full-time

Solutions Architect, Cloud Services

Remote · USA Full-time

Business Development Manager, Hawaii or West Coast US (Outside Sales) (Remote) (Tagalog)

Remote · USA Full-time

Supply Chain Intern

Remote · USA Full-time

Experienced Part-Time Remote Administrative Assistant – Dynamic Support for a Thriving Employment Services Company in Singapore

Remote · USA Full-time

Disney Live Entertainment Costume Buying Intern, Spring 2026

Remote · USA Full-time