Configure Auto Annotation

How to select the right evaluation properties, thresholds and aggregate into a high quality evaluation flow.

This guide explains how to configure Deepchecks' auto-annotation for your LLM-based application. Auto-annotation evaluates and labels your interactions based on customizable criteria, helping you maintain quality standards at scale. You'll learn how to select the right properties for your use case, adjust evaluation thresholds, and build end-to-end annotation pipelines.

Whether you're building customer support chatbots, medical AI systems, or other LLM-based applications, this guide will help you create an annotation system that reduces manual overhead while maintaining consistent quality standards across your team.

This guide is divided into three practical sections to help you design, build, and test an accurate auto-annotation pipeline, followed by a technical section explaining how to configure the YAML itself:


📘

Auto-Annotation Scope

Auto-annotation is configured per interaction type and applies to all versions of an application. For example, an agent that performs both tool calling and report generation will have two distinct interaction types, each with its own properties and auto annotation pipeline.