DocumentationAPI ReferenceRelease Notes
DocumentationLog In
Documentation
These docs are for v0.8.0. Click to read the latest docs for v0.21.0.

Properties

What are properties in Deepchecks LLM Evaluation, what kinds of properties are there and how they are used.

Properties are one-dimensional values that are calculated on each text sample. For example, a property could be simple text characteristics such as the number of words in the text, or more complex properties such identifying if the text contains toxic language, or if a given summary is capturing the key points of the original article.

What are properties used for?

Properties measure various aspects of our LLM interactions that we may be interested in. They are used in the following ways:

  • Properties are used to create estimated annotations. By defining rules on the calculated properties, you can create a flow that automatically estimates the quality of each LLM interaction. For example, by default summarization interactions with low Conciseness are deemed to be bad interactions.
  • Average values of calculated properties are shown in the Dashboard screen. You can then dive in to a specific property and see interactions with extreme values, such as extremely irrelevant answers.
  • Properties are shown in the data page, and can be used there to sort and filtered the viewed interactions. This is useful for example if you wish to see only the interactions with Toxicity > 0.5, and perhaps combine that filter with additional ones (e.g., a specific topic).

What kinds of properties are there?

Deepchecks LLM Evaluation has 3 types of properties - Built-in, Custom and LLM.

Built-in properties

You can read more about our Built-in properties in the dedicated section.

Custom properties

Custom properties are values that are passed by the user alongside the interaction fields, such as the LLM input and output. For example, you may want to know from what device a specific question was sent you your system. You can then define "Device" as a custom categorical property in the Custom Properties screen. Then, for example if you're using csv upload to send your data to they system, if the Device column exists in the csv its values will automatically be added to the system.

LLM properties

LLM properties are properties evaluated by LLMs, and are used to evaluate the more nuanced qualities of your LLM interactions. These are calculated by asking an LLM to grade your interaction according to given steps using a score of 1-5 (with higher being better).

When you create a task in Deepchecks LLM Evaluation your task is by default initialized with a given set of such properties appropriate for your task type. You can then go on and add new LLM properties by defining the steps the LLM will use when grading your interaction, and the components of the interaction it will be using to grade. For example, the default LLM property "Coherence", used for summarization tasks, is defined as follows:

Steps:

1. Describe whether there are sentences which are not fluent or do not follow a logical order.
2. A coherent summary should be well-structured and easy to read.

Components to use: output

LLM Properties templates

LLM properties are calculated by passing prompts to an LLM that contain information about your interaction, and the way you want it to grade your interaction (the "steps"). There are unique LLM templates used for each of the pre-defined application types, detailed in the sections below.

📘

Note - Interaction Components

You can select which of the interaction components (Input, Information Retrieval, Output) should be accessible to the LLM property.

Q&A

You are a harsh evaluator. Your task is to evaluate whether an answer meets the required standards. 

Your guidelines are:
<STEPS>

You should describe your reasoning in a few sentences and then give a final score for <LLM_PROPERTY_NAME> between 1 and 5, where 1 is the lowest and 5 is the highest.
Answer:
<OUTPUT>
  
Document:
<INFORMATION_RETRIEVAL>
  
Question:
<INPUT>

Summarization

You are a harsh evaluator. Your task is to evaluate whether a summary meets the required standards.

Your guidelines are:
<STEPS>

You should describe your reasoning in a few sentences and then give a final score for <LLM_PROPERTY_NAME> between 1 and 5, where 1 is the lowest and 5 is the highest.
Article:
<INPUT>
  
Summary:
<OUTPUT>

Generation

You are a harsh evaluator. Your task is to evaluate whether an output generated for provided user instructions meets the required standards.

Your guidelines are:
<STEPS>

You should describe your reasoning in a few sentences and then give a final score for <LLM_PROPERTY_NAME> between 1 and 5, where 1 is the lowest and 5 is the highest.
Output:
<Output>
  
Context:
<INFORMATION_RETRIEVAL>

User Instructions:
<INPUT>

Chat

You are a harsh evaluator. Your task is to evaluate whether an output meets the required standards.

Your guidelines are:
<STEPS>

You should describe your reasoning in a few sentences and then give a final score for <LLM_PROPERTY_NAME> between 1 and 5, where 1 is the lowest and 5 is the highest.
Output:
<OUTPUT>

Input:
<INPUT>