📖
This Page Includes

Deepchecks' Interaction Types: 💡Q&A , 📚Summarization, 🎨Generation, 🎯Classification, ⛏️Feature Extraction, Tool Use 🦾 , Chat 💬 , Retrieval 🔍, 💭 Custom Interaction Type

A few of the key properties for each interaction type

The data structure to upload for each interaction type

📘 Note 📘: Deepchecks is compatible with all interaction types. Selecting your specific interaction type helps you begin with optimized default settings, which you can fully customize to your needs. If you're not sure which configuration would be the best starting point, you can choose "Other", create a new custom type or just consult with us!

Question Answering (RAG) 💡

An interaction type whose end goal is to answer a question provided by the end user. The interaction can contain multiple intermediate steps such as embeddings-based information retrieval in the classic RAG scheme or SQL query generation to retrieve relevant information.

Q&A Key Properties

Among the various Deepchecks' properties, we found the following to be especially useful for Q&A interactions: Grounded in Context, Retrieval Properties, Relevance, Correctness, and Completeness.

Data to upload for Q&A interactions

Input (str) - should contain the user-provided question.
Information Retrieval (List[str]) - retrieved information on which the answer should be based. Either documents in the case of RAG or data retrieved from the database.
(Optional) History (List[str]) - additional context relevant to the interaction, which wasn't retrieved from a knowledge base. For example: chat history.
(Optional) Additional Steps - In case your interaction has additional intermediate steps they should be logged under the steps mechanism. See more details here.
(Optional) Full Prompt (str) - the full prompt that was sent to the LLM.
Output (str) - the final response that was displayed to the user.

In cases where the question is asked after a chat between the user and the application, the chat history should be included as part of the input.

See GVHD Q&A Example for a complete RAG interaction quickstart.

Summarization 📚

An interaction type designed to condense text into either a free-form summary or a specified format. Examples of such interactions are meeting-summary bots and bots tasked with summarizing a legal case or a financial report.

Summarization Key Properties

Among the various Deepchecks' properties, we found the following to be especially useful for summarization interactions: Coverage, Grounded in Context, and Conciseness (a Prompt Property).

Data to upload for Summarization interactions

Input (str) - The text to be summarized.
(Optional) Additional Steps - In case your interaction has additional intermediate steps they should be logged under the steps mechanism. See more details here.
(Optional) Full Prompt (str) - the full prompt that was sent to the LLM.
Output (str) - the generated summary.

Generation 🎨

Interactions that utilize data to perform a creative task of some sort. This is the most diverse interaction type and can include several different kinds of interactions. To name a few specific examples:

Report Generation - receives information about a company or real estate asset, and generates a report in a specific requested format.
Content Generation - receives details about a subject and generates a web page. For example, generate an "About Us" page on a company's website.
Docstring Generation - receives a code snippet describing a class or a function and generates a high-quality docstring for the input code.

Generation Key Properties

Since Generation use-cases tend to be very diverse, the go-to property among the various Deepchecks' properties should be Instruction Fulfilment. In addition, it is likely that different interactions could significantly benefit from additional properties so it is highly recommended to check out the full catalog.

Data to upload for Generation interactions

Input (str) - The user's input to the interaction, without additional application instruction.
(Optional) Information Retrieval (List[str]) - retrieved information on which the output should rely.
(Optional) History (List[str]) - additional context relevant to the interaction, which wasn't retrieved from a knowledge base. For example: chat history.
(Optional) Additional Steps - In case your interaction has additional intermediate steps they should be logged under the steps mechanism. See more details here.
Full Prompt (str) - the full prompt that was sent to the LLM.
Output (str) - the interaction's output.

Classification 🎯

Includes both multi-class as well as multi-label classification interactions in which the LLM performs the classification.

Classification Key Properties

For classification interactions Deepchecks' properties, are mainly useful for segmenting and analyzing different aspects of the user input. Specifically interesting properties for this include Sentiment, Fluency, Formality, and in relevant cases also Content Type. In classification interactions, the main tools used for auto annotation are similarity-based annotation and Deepchecks Evaluator.

Data to upload for Classification interactions

Input (str) - The input to be classified.
(Optional) Additional Steps - In case your interaction has additional intermediate steps they should be logged under the steps mechanism. See more details here.
Full Prompt (str) - the full prompt that was sent to the LLM.
Output (str) - the classification. In the case of multi-label classification should be formatted as Class 1, Class 2, ...

Feature Extraction ⛏️

Interactions where specific pieces of information are extracted from a text by an LLM. The most common example is where data is extracted from free text into a JSON file of a predefined schema.

Feature Extraction Key Properties

There are 3 key properties for this use-case:

Extraction Groundedness - assesses whether the extracted output is faithful to the input data. A high score indicates that the output is faithful to the input and doesn't contain hallucinations..
Extraction Coverage - assesses whether the extracted output contains all the information required by the prompt that is provided in the input. A high score indicates that a high ratio of the required information was correctly extracted.
Structural Validity - assesses whether the extracted output fits the structure specified in the full prompt prompt. A high score indicates that the output's structure adheres to the specifications in the prompt.

Data to upload for Feature Extraction interactions

Input (str) - The text from which data is to be extracted.
Full Prompt (str) - The instructions for the extraction task. May contains free text, an expected structure or both.
(Optional) Additional Steps - In case your interaction has additional intermediate steps they should be logged under the steps mechanism. See more details here.
Output (str) - The extracted data.

Tool Use 🦾

Interactions where an AI agent or AI workflow (we will call both "agent" from now on for the sake of brevity) utilize an external tool in order to perform a task. Tools can retrieve information, perform DB actions or call other agents. This is Deepchecks' main interaction-type for evaluating the performance of an agent.

Tool Use Key Properties

There are 3 key properties for this use-case:

Tool Calling - Assesses whether the tool selected by the agent is appropriate given its planning and the user request, and whether the tool call is correctly formatted.
Planning Efficiency - Assesses whether the agent's planning is efficient, effective and goal oriented.
Tool Completeness - Assess how relevant is the information provided by the tool for addressing the user request.

Data to upload for Tool Use interactions

Input (str): The user’s input.
Full Prompt (str): The list of available tools and any other relevant instructions provided to the agent.
History (str): A list of past actions taken by the agent, including the thoughts, actions and tool responses.
Output (str): The agent's thought process or planning.
Action (str): The action itself—specifically, tool invocation.
Tool Response (str): Information returned from the tool.
(Optional) Additional Steps - In case your interaction has additional intermediate steps they should be logged under the steps mechanism. See more details here.

Chat 💬

Scenarios where a user engages in an ongoing, multi-turn conversation with an AI assistant. In these interactions, it is important to assess how effectively the assistant follows both system and user-provided instructions throughout the dialogue, as well as the overall satisfaction experienced by the user.

Chat Key Properties

There are 3 key properties for this use-case:

Instruction Fulfillment - Assesses how accurately the output adheres to the specified system instructions in a multi-turn settings.
Intent Fulfillment - Assesses how accurately the output follows the instructions provided by the user in the current turn or in the conversation history if they are still relevant.
User Satisfaction - Assesses the user's satisfaction with the assistant's response based on its last message.

Data to upload for Chat interactions

Input (str): The user’s last input.
Full Prompt (str): The system instructions provided to the assistant.
History (str): The conversation history, including both user and assistant messages, and also tool invocations in case the assistant is a tool-user.
Output (str): The assistant's last message.
(Optional) Additional Steps - In case your interaction has additional intermediate steps they should be logged under the steps mechanism. See more details here.

Retrieval 🔍

Scenarios where a user wants to evaluate the retrieval process within a RAG (Retrieval-Augmented Generation) pipeline. In these interactions, the focus is on the information retrieval aspect, where each document is classified as either relevant (gold or platinum) or irrelevant with respect to the input. The evaluation emphasizes how well the retrieved documents match the query in both content and ranking order.

Retrieval Properties

There are 3 key properties for this use-case:

Retrieval Coverage - Assesses how well the retrieved documents cover the information needed to fully answer the input.
Normalized DCG (nDCG) - Evaluates the ranking quality by considering the positions of relevant (gold/platinum) documents, giving higher weight to those appearing earlier in the list.
Retrieval Precision - Measures the proportion of relevant (gold/platinum) documents among all retrieved documents.

Data to upload for Retrieval interactions

Input (str) - The user’s question or query.
Information Retrieval (List[str]) - The documents or data retrieved by the pipeline (e.g., from a database or vector store).
(Optional) Additional Steps - In case your interaction has additional intermediate steps they should be logged under the steps mechanism. See more details here.

Custom Interaction Type 💭

In additional to the ability to edit the existing interaction types, you can also create a new interaction type from scratch and customize it for your needs.

Key Properties

By default, the custom interaction type contains only a basic selection of general purpose properties and similarity based annotation flow. Upon creation you will need to add the properties that are relevant for your use case and customize them in the Properties screen.

Supported Use Cases

📖
This Page Includes

Question Answering (RAG) 💡

Q&A Key Properties

Data to upload for Q&A interactions

Summarization 📚

Summarization Key Properties

Data to upload for Summarization interactions

Generation 🎨

Generation Key Properties

Data to upload for Generation interactions

Classification 🎯

Classification Key Properties

Data to upload for Classification interactions

Feature Extraction ⛏️

Feature Extraction Key Properties

Data to upload for Feature Extraction interactions

Tool Use 🦾

Tool Use Key Properties

Data to upload for Tool Use interactions

Chat 💬

Chat Key Properties

Data to upload for Chat interactions

Retrieval 🔍

Retrieval Properties

Data to upload for Retrieval interactions

Custom Interaction Type 💭

Key Properties

📖This Page Includes

Question Answering (RAG) 💡

Q&A Key Properties

Data to upload for Q&A interactions

Summarization 📚

Summarization Key Properties

Data to upload for Summarization interactions

Generation 🎨

Generation Key Properties

Data to upload for Generation interactions

Classification 🎯

Classification Key Properties

Data to upload for Classification interactions

Feature Extraction ⛏️

Feature Extraction Key Properties

Data to upload for Feature Extraction interactions

Tool Use 🦾

Tool Use Key Properties

Data to upload for Tool Use interactions

Chat 💬

Chat Key Properties

Data to upload for Chat interactions

Retrieval 🔍

Retrieval Properties

Data to upload for Retrieval interactions

Custom Interaction Type 💭

Key Properties

📖
This Page Includes