All roles

Member of Technical Staff (Data Scientist, Evals)

Remote · USA Full-time New today

Perplexity serves tens of millions of users daily with reliable, high-quality answers grounded in an LLM-first search engine and our specialized data sources. We aim to use the latest models as they are released, but the intelligence frontier is a jagged one, and popular benchmarks do not effectively cover our use cases. In this role, you will build specialized evals to improve answer quality across Perplexity, covering search-based LLM answers and other scenarios popular with our users.

Responsibilities

  • Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products, ensuring high standards for accuracy and helpfulness
  • Design evaluation sets and methods specifically to measure the impact of tool calls (particularly web search retrieval) on the final answer's quality
  • Develop VLM-based solutions to programmatically evaluate how final answers render visually across different platforms and devices
  • Continuously review public benchmarks and academic evaluations for their applicability to the Perplexity product, adapting and incorporating them into our regular performance measurements
  • Operate within a small, high-impact team where your evaluation metrics directly shape product changes, collaborating closely with technical leadership to measure and improve Answer Quality

Qualifications

  • PhD or MS in a technical field or equivalent experience
  • 4+ years of experience in data science or machine learning
  • Strong proficiency in Python and SQL (expected to write production-grade code)
  • Experience building within a modern cloud data stack, specifically AWS and Databricks
  • Comfortable with agentic coding workflows and using AI-assisted development tools to iterate faster

Preferred Qualifications

  • 1+ years of experience working with LLMs at scale, specifically with LLM-as-a-judge setups
  • Prior experience working on customer-facing web products or consumer apps, with real user traffic at scale
  • A strong research background, with experience applying research methods to real-world ML problems
  • Experience defining evaluation metrics (e.g., factual consistency, hallucination rate, retrieval precision) and building ground truth datasets

Apply tot his job Apply To this Job

Related roles

Product Data Scientist

Remote · USA Full-time

Senior Data Scientist - Data Analyst job at Jahnel Group in Schenectady, NY

Remote · USA Full-time

Data Scientist/Engineer - Junior (Remote)/Junior Sofware Engineer (Remote)

Remote · USA Full-time

Staff Data Scientist, Core Data and Analytics

Remote · USA Full-time

Data Scientist - Marketing

Remote · USA Full-time

Director – Health AI, Data Science

Remote · USA Full-time

AI Engineer - Clinical Data Science

Remote · USA Full-time

Lead Data Scientist – Generative AI (GenAI)

Remote · USA Full-time

Senior Manager, Data Scientist

Remote · USA Full-time

Data Scientist | HYBRID

Remote · USA Full-time

Virtual Special Education Teacher (2026 - 2027 School Year)

Remote · USA Full-time

Experienced Customer Service Associate – Delivering Exceptional Patient Experience at arenaflex in McComb, MS

Remote · USA Full-time

Experienced Remote Customer Service Agent – Aviation & Travel Support Specialists (Work From Home)

Remote · USA Full-time

Motivierter ERP Consultant Finance gesucht! | Vollzeit | Remote/Norden

Remote · USA Full-time

Remote Data Entry Specialist – Financial Data Management & Documentation Excellence

Remote · USA Full-time

Experienced Full Stack Data Entry Specialist – High Paying Remote Opportunity at arenaflex

Remote · USA Full-time

KC083 - Temporary Part-Time or Full-Time Mother's Helper/Driver - Los Feliz, CA

Remote · USA Full-time

Saturday-Sunday Transcriptionist – Legal & Medical Audio

Remote · USA Full-time

Global Product Portfolio Strategy Manager, Mobility Group

Remote · USA Full-time

Commercial Claims Examiner

Remote · USA Full-time