Research Engineer, Evaluations

permanent
Fully Remote

Only accepting applications from: United States

  • Own end-to-end and integration-level model evaluation across accuracy, latency, and feature-specific metrics
  • Build and maintain competitive benchmarking pipelines
  • Design and run systematic experiments to measure the impact of model changes
  • Onboard, curate, and maintain evaluation datasets
  • Create evaluation subsets to stress-test specific capabilities and edge cases
  • Define evaluation metrics for real-world performance
  • Translate qualitative customer feedback into quantifiable evaluation criteria
  • Work with customer-facing teams to understand pain points and convert them into research priorities
  • Maintain clean evaluation pipelines and clear documentation
  • Identify evaluation gaps proactively and propose solutions

Experience

  • ML fundamentals: Interpret results and debug issues without training from scratch
  • Strong Python skills: Write clean evaluation scripts, work with data pipelines, comfortable with SQL and cloud infrastructure
  • Metric intuition: Understanding of good evaluation metrics and ensuring statistical rigor
  • Voice agent stack familiarity: Understands VAD, ASR, turn detection, LLM, TTS systems interaction
  • Tinkerer mentality: Preference for shipping and iterating quickly
  • Communication skills: Explain technical results, summarize findings, and translate customer feedback
  • Ownership mindset: Proactively fill evaluation gaps
  • Work at least 3-4 hours overlapping with Eastern US Time Zone

Salary and Perks

Pay range: $210K - $260K

About AssemblyAI

Industry-leading Speech AI models to automatically recognize and understand speech.

Industry-leading Speech AI models to automatically recognize and understand speech.

View all manager/exec jobs

Workster

Remote Jobs for US Residents

We've built a new platform specifically for US residents to find remote work.

Discover Workster

Power Search

Find the jobs that don't get advertised

We've built a tool to help you discover all of the remote jobs that never get advertised.

Discover Power Search