All roles

[Remote] Machine Learning Infrastructure Engineer

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. TRM Labs is a company dedicated to building a safer world through AI-powered intelligence solutions. The Senior Software Engineer, ML Infrastructure will design and operate scalable GPU-backed infrastructure that supports TRM's AI systems, collaborating with various teams to ensure effective model deployment and optimization.

Responsibilities

  • Design and operate GPU cluster infrastructure
  • Build and manage GPU-backed environments in cloud settings, including orchestration, autoscaling, resource isolation, and workload management across multiple concurrent models and users
  • Optimize high-throughput inference
  • Implement and tune serving systems that maximize token throughput, batching efficiency, GPU occupancy, and cost effectiveness across interactive and batch workloads
  • Enable distributed inference strategies
  • Support and operationalize model parallelism, tensor parallelism, and other distributed serving patterns for large-scale models
  • Implement model optimization and compilation workflows
  • Integrate and optimize acceleration stacks such as TensorRT, ONNX Runtime, vLLM, FlashAttention, and related tooling to improve performance and reduce inference cost
  • Schedule heterogeneous workloads
  • Design systems that manage multiple models, multiple users, and mixed workload types across heterogeneous accelerators (e.g., NVIDIA GPUs, Inferentia), ensuring predictable performance under varying demand
  • Build observability into ML infrastructure
  • Instrument systems to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput, and use data to continuously improve performance and reliability
  • Partner across engineering teams
  • Work closely with infrastructure, ML, and product teams to ensure models transition smoothly from experimentation to production-grade, highly available services

Skills

  • Bachelor's degree (or equivalent) in Computer Science or related field
  • 5+ years of experience building and operating distributed systems or infrastructure in production environments
  • Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP)
  • Deep understanding of high-throughput inference systems, including batching strategies, token throughput optimization, and the trade-offs between latency, throughput, and cost
  • Experience with one or more ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum
  • Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems
  • Familiarity with distributed inference strategies including model parallelism and tensor parallelism
  • Experience working with Kubernetes or equivalent orchestration systems in cloud environments
  • Adaptable. Goals can change fast. You anticipate and react quickly
  • Autonomous. You own what you work on. You move fast and get things done
  • Excellent communication. You communicate complex ideas effectively to both technical and non-technical audiences, verbally and in writing
  • Collaborative. You work effectively in a cross-functional team and with people at all levels in an organization
  • Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus
  • CUDA familiarity and experience debugging GPU-related issues is a plus

Company Overview

  • TRM Labs is a software company that offers blockchain, transaction monitoring, and analytics to help financial institutions and governments. It was founded in 2018, and is headquartered in San Francisco, California, USA, with a workforce of 201-500 employees. Its website is https://trmlabs.com.
  • Company H1B Sponsorship

  • TRM Labs has a track record of offering H1B sponsorships, with 2 in 2026, 1 in 2025, 4 in 2024, 3 in 2023, 3 in 2022, 1 in 2021. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Related roles

    [Remote] NetSuite Administrator - Remote

    Remote · USA Full-time

    [Remote] VP - IT and Digital Operations

    Remote · USA Full-time

    [Remote] Senior Enterprise Program Manager

    Remote · USA Full-time

    [Remote] Analytics Developer 3

    Remote · USA Full-time

    [Remote] Head of Legal

    Remote · USA Full-time

    [Remote] Clinical Administrative Assistant

    Remote · USA Full-time

    [Remote] Data Center Development - Project Manager

    Remote · USA Full-time

    [Remote] Sales Enablement PM

    Remote · USA Full-time

    [Remote] Senior Product Manager - Clinical Quality

    Remote · USA Full-time

    [Remote] Chief Marketing Officer

    Remote · USA Full-time

    Experienced Full Stack Customer Support Agent – Virtual Chat Support Role: Earn $25-$35 per Hour Working Remotely at arenaflex

    Remote · USA Full-time

    Experienced Customer Service Representative for Leading Performance Marketing Agency - blithequark - Remote Opportunity

    Remote · USA Full-time

    Remote Data Entry Specialist – Work From Home Position at arenaflex | Flexible Schedule Available

    Remote · USA Full-time

    Experienced Remote Chat Consultant – Customer Service & Sales Expert

    Remote · USA Full-time

    Experienced Travel Customer Service Representative – Remote Work Opportunity with arenaflex

    Remote · USA Full-time

    Experienced Remote Customer Service Representative – Part-Time Opportunity for Exceptional Service Delivery and Career Growth at arenaflex

    Remote · USA Full-time

    Director, Family Engagement and Health Services

    Remote · USA Full-time

    Online teacher for Computer Science - Intro to Python

    Remote · USA Full-time

    Preschool Educational Director

    Remote · USA Full-time

    CVS Data Entry Work-From-Home & Remote Jobs – Apply Now

    Remote · USA Full-time