DocumentationAPI ReferenceRelease Notes
DocumentationLog In
Documentation

Configuring the Automatic Annotation

Now that you've successfully uploaded data to the Deepchecks system, the next direct step is selecting the relevant evaluation properties for each interaction type and Configuring the Automatic Annotation YAML.

Summarization

Let's start by looking at the default dashboard for the Base version.

The dashboard above gives us a good picture of the summarization quality.

In this use case, the seller is interested in factually accurate and comprehensive summaries, which can be seen as high by Grounded In Context (avg. score 0.9) and Coverage (avg. score 0.89) yet the overall score which is based on an expert review is very low. The Deepchecks Root Cause Analysis (RCA) framework will use the user annotation to identify specific data segments where your application underperforms.

Let's try to understand the reason behind the bad annotations:

By looking at the data we can see that the summaries are dry and have too many technical details.

In our E-commerces use case, the retailer wants to sell his products online and therefore his summaries must be attractive to the reader. To address this, we will create a Custom LLM Property to evaluate attractiveness and incorporate it in the auto annotation pipeline.

Adding an LLM Property

New LLM Property Content

Property Name: Attractiveness
Description: Evaluate the appealingness of a text to a customer.
System Message:

  1. Evaluate whether the language is engaging, captures the reader's interest, and maintains a persuasive, professional, and customer-focused tone.
  2. Assess if the summary effectively conveys key product features and benefits without unnecessary length, ensuring the language is simple, clear, and impactful.
  3. Check if the summary highlights aspects likely to influence purchasing decisions, such as unique features, key benefits, or practical applications, while avoiding overly technical details.
  4. Confirm that the writing style and formatting are tailored to the marketing context and meet the standards for online product descriptions.

Interaction Steps for Property: output

Recalculate LLM Property

After saving the property's definition, recalculate it with the following definitions (by default LLM properties are calculated only for data that is uploaded after they're defined, so recalculation is needed to get their results for the data that is already in the system):
Versions: Select All (3)
Environment: Both

Now, we can see what is Base's version Attractiveness score:


Automatic Annotation - YAML Configuration

In this example, we want to update the default YAML configuration to take into account the 3 metrics that are important to our client - Attractiveness, Grounded In Context and Coverage.

We will upload the following YAML and re-run the estimated annotation pipeline.

The main changes we made are:

  1. Attractivness: added as a new property to Good and Bad chains - setting a threshold of <=2 for bad samples and >=3 for good samples.
  2. Conciseness and Instruction Fulfillment: removed.

Read more about Customizing the Auto Annotation Configuration and how to upload the above configuration.

Feature Extraction

Remember: Before uploading the data to the system, we created a custom property that was pre-computed and made available as a column in the dataset.

Feature Extraction comes out of the box with properties that validate the factuality, coverage, and structure of the output based on the user requirements described in the full prompt. In our use case, we can see that the base version achieves good results across all metrics and the overall score.

Even though the out of the box automatic configuration works quite well here we would want to customize it further to our use case by taking into account the properties of Percent of Valid Values and Content Type.

We will upload the following YAML and re-run the estimated annotation pipeline.

Read more about Customizing the Auto Annotation Configuration and how to upload the above configuration.