All roles

[Remote] DevOps Engineer

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. EPAM Systems is a major technology company specializing in infrastructure supporting AI research, and they are seeking a DevOps Engineer to help maintain production Kubernetes-based systems. The role focuses on site reliability engineering, observability, and SQL production support duties, ensuring system reliability and performance across an Azure Stack environment.

Responsibilities

  • Design, maintain and progressively improve observability solutions, including dashboards and visual reports built with Grafana or comparable monitoring tools
  • Set up, implement and oversee metrics, SLIs, SLOs and alerting approaches to guarantee reliability and transparency across production systems
  • Deliver business-hours operational support for Kubernetes-based production environments, involving initial troubleshooting, log review and metric-based investigations
  • Assist with SQL-based systems as part of production operations, contributing to issue examination and performance diagnostics
  • Examine incidents and system behavior to pinpoint root causes, take part in post-incident reviews and suggest enhancements for monitoring and reliability practices
  • Work hand in hand with engineering, platform and research teams to raise observability standards, refine operational processes and strengthen overall system stability
  • Add to documentation, knowledge-sharing activities and ongoing improvement initiatives within the team

Skills

  • At least 2 years of relevant hands-on professional experience
  • Demonstrated track record in Site Reliability Engineering (SRE), DevOps, Production Support or equivalent roles working with production systems
  • Practical exposure to observability and monitoring stacks including Grafana, Prometheus, Elastic Stack, Datadog or similar tools
  • Strong command of Linux systems, supported by solid troubleshooting and log analysis capabilities
  • Working experience supporting Kubernetes-based environments in production settings
  • Background in delivering SQL production support, including query troubleshooting and basic performance diagnostics
  • Confident scripting skills in Python, Bash or similar languages for automation and day-to-day operational activities
  • Capability to investigate incidents, determine underlying causes and drive continuous improvement efforts
  • Effective communication and teamwork skills for working successfully with distributed and cross-functional teams
  • Proficient English communication skills, both spoken and written, at a B2+ level or higher
  • Experience handling APIs and integration patterns to link services together and enable system interoperability
  • Knowledge of databases, covering administration, tuning and production-level support activities
  • Exposure to Infrastructure as Code development and maintenance for automating environment provisioning and configuration
  • Practical experience using Microsoft Azure to manage cloud resources and run production workloads

Benefits

  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn

Company Overview

  • EPAM leverages its core engineering expertise as a leading global product development and digital platform engineering services company. It was founded in 1993, and is headquartered in Newtown, Pennsylvania, USA, with a workforce of 10001+ employees. Its website is https://www.epam.com.
  • Company H1B Sponsorship

  • EPAM Systems has a track record of offering H1B sponsorships, with 11 in 2026, 120 in 2025, 172 in 2024, 232 in 2023, 373 in 2022, 359 in 2021, 502 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Related roles

    [Remote] Information Systems Security Engineer

    Remote · USA Full-time

    [Remote] Senior Business Development Manager, OEM & Power Generation

    Remote · USA Full-time

    [Remote] Enrollment Marketing Manager

    Remote · USA Full-time

    [Remote] Insights Analyst

    Remote · USA Full-time

    [Remote] Legal Assistant

    Remote · USA Full-time

    [Remote] Business Development Director - Arrow Global Supply Chain Services

    Remote · USA Full-time

    [Remote] Senior Android Engineer IRC296966

    Remote · USA Full-time

    [Remote] Key Account Sales Manager US

    Remote · USA Full-time

    [Remote] Senior Process Engineer

    Remote · USA Full-time

    [Remote] Account Executive Federal, Civilian

    Remote · USA Full-time

    Senior Manager/ Associate Director GMP Quality

    Remote · USA Full-time

    Data Engineer - Overlap with European time zone!

    Remote · USA Full-time

    Flex Agent - 1099 Independent Contractor

    Remote · USA Full-time

    Director, Business Intelligence & Analytics (BIA) - USA Remote

    Remote · USA Full-time

    Remote Sales Representative -Entry Level Full Time & Part Time

    Remote · USA Full-time

    Experienced Data Entry Associate – Remote Opportunity to Drive Business Success at arenaflex

    Remote · USA Full-time

    Junior Accountant

    Remote · USA Full-time

    Experienced Remote Data Entry Specialist and Customer Support Representative – Join arenaflex's Dynamic Team

    Remote · USA Full-time

    Bilingual Enrollment Specialist (Remote - Work From Home MST/CST)

    Remote · USA Full-time

    Temporary Remote Customer Service Representative

    Remote · USA Full-time