[Remote] Principal Machine Learning Engineer, ML Platform

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. Shippo is a company dedicated to revolutionizing the shipping layer of the internet, aiming to empower merchants through innovative logistics technology. They are seeking a Principal Machine Learning Engineer to develop a production-grade ML platform that enhances model reliability and optimizes shipping logistics. This role involves setting technical strategies, driving architecture decisions, and collaborating with cross-functional teams to improve ML workflows.

Responsibilities

Set technical strategy and drive a multi-quarter roadmap for ML platform capabilities aligned to Shippo’s business priorities
Own cross-team architecture decisions, RFCs, and design reviews for ML lifecycle and inference
Raise the engineering bar through mentorship, production readiness standards, and reusable platform primitives
Be accountable for platform adoption, reliability, and cost-performance outcomes
Build and operate core ML platform components:
ML lifecycle foundation (experiment tracking, reproducibility, artifact management, model registry, versioning, and controlled promotion workflows using MLflow or equivalent)
Training and experimentation enablement (standardized environments, reusable pipelines/templates, evaluation harnesses, and repeatable workflows that let data scientists move from exploration to production with confidence)
Kubernetes-native model serving for real-time inference (safe rollout and rollback, autoscaling, reliability practices, and cost controls)
Batch inference and scoring pipelines (repeatable backfills, retraining triggers, consistent packaging between training and inference)
Observability for ML systems (service health metrics, alerting, and model-quality signals such as drift and data quality)
Developer experience (templates, reference implementations, documentation, and self-service workflows)
Evaluate and recommend inference frameworks and deployment patterns, and document tradeoffs for Shippo’s workloads
Identify and resolve performance bottlenecks across the inference stack (model runtime, compute utilization, networking, serialization, and autoscaling behavior)
Establish ML engineering standards across training, evaluation, testing, model packaging, CI/CD, production readiness, and incident response
Partner with Data Science teams to bridge research and production environments by creating repeatable frameworks, shared standards for code quality and reproducibility, and self-serve paths to deploy models safely
Collaborate with Data and Engineering teams to ensure the platform supports real workflows, drives adoption, and meets reliability expectations
Mentor engineers through design reviews, architecture guidance, and shared best practices across platform and ML development

Skills

15+ years of software engineering experience, including ownership of production systems (platform, infrastructure, or distributed systems)
4+ years owning ML systems end-to-end in production, including on-call and incident response, and making architecture decisions based on operational constraints (latency, throughput, availability, and cost)
Strong experience building and running services on Kubernetes, including deployments, autoscaling, and observability
Hands-on experience with ML lifecycle tooling such as MLflow or equivalent (tracking, registry, packaging, and promotion workflows)
Demonstrated ability to evaluate inference tradeoffs across batch and real-time serving, CPU versus GPU, latency and throughput, cost, and operational complexity
Demonstrated Principal-level technical leadership, including setting technical direction, driving cross-team alignment via RFCs/design reviews, and delivering multi-quarter roadmaps
Proven ownership of reliability and operational outcomes for production systems (SLOs, incident response, and measurable improvements in stability and performance)
Demonstrated ability to ship incrementally, prioritize production reliability over perfect solutions, and drive adoption through pragmatic platform design
Experience working with or evaluating managed ML platforms (Databricks, SageMaker, Vertex AI, or similar), with clear judgement on strengths, limitations, and build-vs-buy decisions
Databricks experience (useful, not required), including Databricks workflows and ML tooling integration
Experience with inference and serving frameworks
Experience with feature store patterns, online and offline consistency, and model evaluation at scale
Experience supporting optimization systems and decision engines in production
LLM or agent workflow experience, especially evaluation harnesses, deployment patterns, guardrails, and monitoring

Benefits

Healthcare coverage for medical, dental, and vision (90% covered by the company, incl. dependents). Pets coverage is also available!
Take-as-much-as-you-need vacation policy & flexible working hours
One week-long company wide winter slow down
3 Volunteer Days Off (VTOs)
WFH stipend to set up your home office
Charity donation match up to $100
Dedicated programs, coaching, tools, and resources for your professional and career growth as well as an individual learning stipend for your personal and focused growth
Fun team in person time through our Shippos Everywhere program which includes regular team and company off-sites throughout the year as well as local Shippos gatherings
Equity
Medical, dental, vision and other benefits noted in our Shippos “package” section

Company Overview

Shippo is a shipping platform with tools for label creation, tracking, and carrier comparisons, saving time and cost. It was founded in 2013, and is headquartered in San Francisco, California, USA, with a workforce of 201-500 employees. Its website is https://goshippo.com.

Company H1B Sponsorship

Shippo has a track record of offering H1B sponsorships, with 1 in 2026, 1 in 2025, 5 in 2024, 8 in 2023, 14 in 2022, 11 in 2021, 10 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply To This Job

Apply

[Remote] Principal Machine Learning Engineer, ML Platform

Related roles

[Remote] Application Security Engineer - AI Trainer

[Remote] Digital Marketing Manager

[Remote] Test Lead – Healthcare Domain (PBM / RxClaim)

[Remote] QA Test Lead – Healthcare (PBM / RxClaim)

[Remote] Marketing Manager (North America, Remote)

[Remote] Director of Finance

[Remote] Media Sales Consultant

[Remote] Product Security Engineer - AI Trainer

[Remote] Marketing AI Specialist

[Remote] Audio Consultant Engineer - Linux Audio Stack (Remote/Anywhere)

[Remote] `Urgently Hiring | Fully Remote | Entry Level | Start This Week

Experienced Full Stack Data Entry Specialist – Remote Work Opportunity at arenaflex

[Remote] Analyst, Warehouse Administration & Finance Operations-Execution

Hiring Now: (Work From Home) Delta Airlines Careers Remote $28/Hr

[Work From Home] Jobs In Home Depot (Remote) - VacancyGlobal

Full-time Staff: Academic Advisor

Product Manager - Veeva QualityDocs & Station Manager

Content Specialist

Experienced Entry-Level Social Media Customer Support Representative – Air Travel Industry

Experienced Remote Customer Service Representative – Delivering Exceptional Healthcare Support and Solutions at arenaflex