All roles

[Hiring] Senior AI Scientist @You.com

Remote · USA Full-time New today

you.com is an AI-powered search and productivity platform designed to empower users with personalized, efficient, and trustworthy search experiences. As a cutting-edge technology company, we combine advanced AI models with user-first principles to deliver tools that enhance discovery, creativity, and productivity. At you.com, we are on a mission to create the most helpful search engine in the world—one that prioritizes transparency, privacy, and user control. We’re building a team of innovators, problem-solvers, and visionaries who are passionate about shaping the future of AI and technology. At you.com, you’ll have the opportunity to work on impactful projects, collaborate with some of the brightest minds in the industry, and grow your career in an environment that values creativity, diversity, and curiosity. If you’re ready to make a difference and help us revolutionize the way people search and work, we’d love to have you join us!

About the Role

We're hiring a Senior AI Scientist to lead the development of novel evals methodologies and customer-facing evaluation research. You'll own the full loop: from identifying gaps in how we evaluate AI quality, to inventing new evals approaches, to deploying them in customer engagements and competitive analyses. This role sits at the center of how we understand and improve our AI systems. You'll work directly with customers to understand their unique quality requirements, design evals that capture what matters, and create reusable evaluation frameworks that scale across our customer base. You'll also contribute to our evals research agenda, publishing work on evaluation methodologies for agents, RAG systems, and search-augmented AI. The ideal candidate brings both a researcher's rigor and a practitioner's pragmatism - comfortable writing papers on evals methodology and comfortable on sales calls explaining evaluation trade-offs to enterprise customers.

Responsibilities

  • Define and own what “good” means for search-augmented and agentic AI systems by designing evaluation frameworks that measure real-world quality, reliability, and user-relevant behavior beyond standard benchmarks.
  • Invent and validate novel evaluation methodologies for non-deterministic systems (LLMs, agents, RAG), including behavioral evals, long-tail and adversarial test sets, and task-specific metrics.
  • Develop rigorous statistical frameworks for model comparison, regression detection, and uncertainty estimation, ensuring evaluation results are defensible and decision-ready.
  • Build and maintain scalable evaluation systems—datasets, gold standards, eval harnesses, scoring pipelines, and analysis tooling—that can be reused across products and customers.
  • Lead customer-facing evaluation research, working directly with enterprise customers to translate domain-specific quality requirements into credible, actionable evals that support product decisions and sales outcomes.
  • Drive competitive evaluations and internal quality reviews, surfacing meaningful performance differences, trade-offs, and failure modes to inform product strategy and prioritization.
  • Partner with engineering and product teams to integrate evals into development loops, release gating, and ongoing quality monitoring.
  • Mentor and set standards for evaluation practice, reviewing eval designs, guiding other scientists, and shaping the long-term evals roadmap as systems become more agentic and complex.
  • End-to-End Project Leadership: Lead the development of new AI-driven projects, encompassing ideation, prototyping, research, infrastructure design, scalability, monitoring, and evaluation.
  • Rapid Iteration: Adapt quickly to user feedback and evolving requirements, ensuring continuous improvement in a fast-paced environment.

Qualifications

  • Strong grounding in applied ML and statistics, with experience evaluating non-deterministic AI systems (LLMs, agents, RAG, search).
  • Deep experience with AI evaluation, including metric design, gold dataset creation, head-to-head comparisons, slicing, and error analysis.
  • Statistical rigor in model comparison, using methods such as paired tests, bootstrap confidence intervals, and robustness analyses.
  • Proficiency in Python for evaluation and analysis, including building eval harnesses, data pipelines, scoring logic, and reproducible analysis workflows.
  • Ability to translate vague product or customer goals into measurable evaluation criteria, and to challenge metrics or conclusions that don’t reflect real quality.
  • Comfort engaging directly with customers and cross-functional stakeholders, explaining evaluation results, trade-offs, and limitations clearly.
  • Strong written and verbal communication, including documenting methodologies and contributing to external publications or talks.

Our salary bands are structured based on a combination of geographic tiers and internal leveling. Compensation is determined by multiple factors assessed during the interview process, with the final offer reflecting these considerations. Salary Band $200,000 - $260,000 USD Company Perks:

  • Hubs in San Francisco and New York City offering regular in-person gatherings and co-working sessions
  • Flexible PTO with U.S. holidays observed and a week shutdown in December to rest and recharge*

• A competitive health insurance plan covers 100% of the policyholder and 75% for dependents* • 12 weeks of paid parental leave in the US* • 401k program, 3% match - vested immediately!* • $500 work-from-home stipend to be used up to a year of your start date* • $1,200 per year Health & Wellness Allowance to support your personal goals* • The chance to collaborate with a team at the forefront of AI research

  • Certain perks and benefits are limited to full-time employees only

You.com participates in E-Verify. We will provide the Social Security Administration (SSA) and, if necessary, the Department of Homeland Security (DHS) with information from each new employee’s Form I-9 to confirm work authorization. (English/Spanish: E-Verify Participation/Right to Work) We are also an inclusive, equitable, and accessible workplace. Please let us know if you require accommodation for any portion of the recruitment and hiring process. Apply tot his job Apply To this Job

Related roles

Senior Systems Software Engineer, AI Infrastructure

Remote · USA Full-time

AI System Engineer

Remote · USA Full-time

Programs and Business Operations Lead, Airbnb Services

Remote · USA Full-time

AI Business Systems Engineer

Remote · USA Full-time

Lead Actuary

Remote · USA Full-time

Amazon Immediate Data Entry and Form Filling Job Opening - Work from Home

Remote · USA Full-time

Principal Program Manager, Creative Studio Business Operations, Creative Garage

Remote · USA Full-time

Senior Amazon & Retail Media Paid Media Manager

Remote · USA Full-time

Amazon & Shopify Growth Operator (Remote – $70–85K + Bonus)

Remote · USA Full-time

Senior Creative Producer, Amazon Delivery Experience (ADX)

Remote · USA Full-time

(Remote Jobs No Experience) Walgreens Data Entry Remote - VacancyGlobal

Remote · USA Full-time

Apply Now: Urgently Need French IV Honors Tutor - Part Time in

Remote · USA Full-time

Care Center Development Facilitator

Remote · USA Full-time

Experienced Customer Service Representative – Remote Customer Support Team at arenaflex

Remote · USA Full-time

AI Workforce Enablement Lead | People Team (Remote)

Remote · USA Full-time

Experienced Investor Relations Associate – Strategic Communication and Partnership Development Specialist

Remote · USA Full-time

Strategic Onboarding Manager

Remote · USA Full-time

Senior Director II, Customer Strategic Insights and Innovation Leader for arenaflex Foodservice and On-Premise Business Growth

Remote · USA Full-time

Data Steward

Remote · USA Full-time

Experienced Remote Customer Service Representative – Delivering Exceptional Travel Experiences from the Comfort of Your Own Home with arenaflex

Remote · USA Full-time