[Remote] Machine Learning Infrastructure Engineer

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. TRM Labs is a company dedicated to building a safer world through AI-powered intelligence solutions. The Senior Software Engineer, ML Infrastructure will design and operate scalable GPU-backed infrastructure that supports TRM's AI systems, collaborating with various teams to ensure effective model deployment and optimization.

Responsibilities

Design and operate GPU cluster infrastructure
Build and manage GPU-backed environments in cloud settings, including orchestration, autoscaling, resource isolation, and workload management across multiple concurrent models and users
Optimize high-throughput inference
Implement and tune serving systems that maximize token throughput, batching efficiency, GPU occupancy, and cost effectiveness across interactive and batch workloads
Enable distributed inference strategies
Support and operationalize model parallelism, tensor parallelism, and other distributed serving patterns for large-scale models
Implement model optimization and compilation workflows
Integrate and optimize acceleration stacks such as TensorRT, ONNX Runtime, vLLM, FlashAttention, and related tooling to improve performance and reduce inference cost
Schedule heterogeneous workloads
Design systems that manage multiple models, multiple users, and mixed workload types across heterogeneous accelerators (e.g., NVIDIA GPUs, Inferentia), ensuring predictable performance under varying demand
Build observability into ML infrastructure
Instrument systems to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput, and use data to continuously improve performance and reliability
Partner across engineering teams
Work closely with infrastructure, ML, and product teams to ensure models transition smoothly from experimentation to production-grade, highly available services

Skills

Bachelor's degree (or equivalent) in Computer Science or related field
5+ years of experience building and operating distributed systems or infrastructure in production environments
Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP)
Deep understanding of high-throughput inference systems, including batching strategies, token throughput optimization, and the trade-offs between latency, throughput, and cost
Experience with one or more ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum
Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems
Familiarity with distributed inference strategies including model parallelism and tensor parallelism
Experience working with Kubernetes or equivalent orchestration systems in cloud environments
Adaptable. Goals can change fast. You anticipate and react quickly
Autonomous. You own what you work on. You move fast and get things done
Excellent communication. You communicate complex ideas effectively to both technical and non-technical audiences, verbally and in writing
Collaborative. You work effectively in a cross-functional team and with people at all levels in an organization
Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus
CUDA familiarity and experience debugging GPU-related issues is a plus

Company Overview

TRM Labs is a software company that offers blockchain, transaction monitoring, and analytics to help financial institutions and governments. It was founded in 2018, and is headquartered in San Francisco, California, USA, with a workforce of 201-500 employees. Its website is https://trmlabs.com.

Company H1B Sponsorship

TRM Labs has a track record of offering H1B sponsorships, with 2 in 2026, 1 in 2025, 4 in 2024, 3 in 2023, 3 in 2022, 1 in 2021. Please note that this does not guarantee sponsorship for this specific role.

Apply To This Job

Apply

[Remote] Machine Learning Infrastructure Engineer

Related roles

[Remote] NetSuite Administrator - Remote

[Remote] VP - IT and Digital Operations

[Remote] Senior Enterprise Program Manager

[Remote] Analytics Developer 3

[Remote] Head of Legal

[Remote] Clinical Administrative Assistant

[Remote] Data Center Development - Project Manager

[Remote] Sales Enablement PM

[Remote] Senior Product Manager - Clinical Quality

[Remote] Chief Marketing Officer

Experienced Full Stack Customer Support Agent – Virtual Chat Support Role: Earn $25-$35 per Hour Working Remotely at arenaflex

Experienced Customer Service Representative for Leading Performance Marketing Agency - blithequark - Remote Opportunity

Remote Data Entry Specialist – Work From Home Position at arenaflex | Flexible Schedule Available

Experienced Remote Chat Consultant – Customer Service & Sales Expert

Experienced Travel Customer Service Representative – Remote Work Opportunity with arenaflex

Experienced Remote Customer Service Representative – Part-Time Opportunity for Exceptional Service Delivery and Career Growth at arenaflex

Director, Family Engagement and Health Services

Online teacher for Computer Science - Intro to Python

Preschool Educational Director

CVS Data Entry Work-From-Home & Remote Jobs Â Apply Now

[Remote] Machine Learning Infrastructure Engineer

Related roles

[Remote] NetSuite Administrator - Remote

[Remote] VP - IT and Digital Operations

[Remote] Senior Enterprise Program Manager

[Remote] Analytics Developer 3

[Remote] Head of Legal

[Remote] Clinical Administrative Assistant

[Remote] Data Center Development - Project Manager

[Remote] Sales Enablement PM

[Remote] Senior Product Manager - Clinical Quality

[Remote] Chief Marketing Officer

Experienced Full Stack Customer Support Agent – Virtual Chat Support Role: Earn $25-$35 per Hour Working Remotely at arenaflex

Experienced Customer Service Representative for Leading Performance Marketing Agency - blithequark - Remote Opportunity

Remote Data Entry Specialist – Work From Home Position at arenaflex | Flexible Schedule Available

Experienced Remote Chat Consultant – Customer Service & Sales Expert

Experienced Travel Customer Service Representative – Remote Work Opportunity with arenaflex

Experienced Remote Customer Service Representative – Part-Time Opportunity for Exceptional Service Delivery and Career Growth at arenaflex

Director, Family Engagement and Health Services

Online teacher for Computer Science - Intro to Python

Preschool Educational Director

CVS Data Entry Work-From-Home & Remote Jobs Â Apply Now

CVS Data Entry Work-From-Home & Remote Jobs Â Apply Now