Job Description
A builder mindset is the core of this role and where you'll spend most of your time. But we're a small team building a whole product, not a research lab. The best person here treats ML systems as their primary craft while staying willing to do whatever the product needs — thinking through the product itself, shipping backend or frontend code, untangling data pipelines. We're looking for someone energized by the breadth, not someone who wants to stay in their lane.
What you'll work on
Evaluation systems for AI featuresHelp build the eval backbone our AI features ship against — failure taxonomies, LLM-as-judge rubrics, golden datasets, calibration against human judgment. Learn what it takes to keep automated scores honest as models and prompts change. A feature with no eval has no quality floor. Model routing & inference economics Get hands-on with how we route work across models — balancing cost, quality, and latency per task. Help...
What you'll work on
Evaluation systems for AI features