DocumentationAPI ReferenceRelease Notes
DocumentationLog In
Documentation

Hierarchy & Data Upload Format

How to structure your data in order to gain the most value from using Deepchecks for Evaluation?

This page enables you to get a good understanding of Deepchecks Structure, and how to work with you data to upload it to Deepchecks. It includes both:

  • Data Hierarchy - Our structure of: Applications, Environments, and Versions and the interactions inside them.
  • Data Upload Format - How to structure the different data fields (and metadata) for Deepchecks to process it.

Data Hierarchy: Concepts

Application 💬

  • The application is the highest hierarchy within your organization's workspace.
  • It represents a full task (end to end use case) and can encompass various Interaction Types, each tailored to specific parts of the workflow or evaluation.

Note: Applications are standalone. Therefore if you'd like to compare between scores across applications, we recommend considering to upload the data to the same application and separate it by versions.

Version 🔎

Each iteration of your pipeline. They may differ in base model, prompts, data processing, etc.

Environment ♻️

Relates to the stage in your lifecycle:

  • Evaluation - Typically for benchmarking the performance, comparing between multiple versions, iterating on a version to improve and fix problems, checking new application candidates, CI/CD, etc. Usually, the samples in the evaluation set would be the same for all versions of an application.
  • Production - for deployed applications, usually after choosing the best version amongst a few, and after initial configuration of scores and properties, to enable efficient overtime monitoring.
  • Pentesting (where applicable) - Separate environment for safety-related evaluation.

Interaction

The data itself. It's the minimal annotatable unit. It can include input, output, and additional steps on the way. Read more about the interaction's structure below. A flow from an initial input to the final output can contain multiple interactions that should be grouped under a session.

Session

A session groups related interactions within the same flow, such as a conversation or a series of tasks in a workflow. Identified by a session_id, it provides a structure for organizing and analyzing interconnected interactions in a flow.

Note: In case your flow contain a single interaction, the session is not relevant is the session_id field should be left empty during data upload.

Interaction Type 🧩

Interaction Types define the logical nature of an interaction, allowing the categorization and evaluation of interactions based on their type. Examples of predefined types include Q&A, Summarization, Generation, Classification and Other.

Interaction Types are essential for:

  • Associating relevant properties with interactions. Properties are defined at the Interaction Type level, allowing flexibility in evaluation.
  • Defining auto-annotation YAML configurations for each interaction type, streamlining annotation processes.
  • Grouping interactions with similar structures for consistent evaluation and comparison.

Note: Each application can contain multiple interaction types, while one interaction always belongs to one interaction type. See here for additional information about the default interaction types.

Data Upload Format: Structure

🎯

Uploading Interactions - Target Location

Make that you're chosen the desired Application, Version & Environment when uploading data to Deepchecks. Whether if you're uploading data from a file via the system's UI, or with Deepchecks' SDK.

Deepchecks only has access to data that was strictly sent to it for evaluation purposes, in the supported data format (following the structure explained below). Structuring the data correctly is essential so that the evaluation will yield the correct results.

Interactions Overview

📘

Interactions Data Fields

Every interaction has two kinds of data fields that Deepchecks utilizes:

  • Data Content Fields: They contain the actual text from the interaction and are the ones used for the property calculations (with respect to the specific field names and purposes).
  • Metadata Fields are everything else: They are used for annotations, enabling version comparison, displaying latency and cost, and much more. All metadata fields are optional.

Each field has its own significance, and almost all of them are optional (though may affect accessible features). Read more about them below:

Interaction Metadata Fields

  • user_interaction_id (str) - must be unique within a single version. Used for identifying interactions when updating annotations, and identifying the same interaction across different versions
  • session_id (str) - The identifier for the session associated with this interaction. In case your use case only contains a single interaction per flow, ignore this field on data upload.
  • started_at (timestamp) - timestamp for interaction start. UTC time is the display timezone.
  • finished_at (timestamp) - timestamp for interaction end. Delta with started_at is used for calculating latency.
  • interaction_type (str) - Specifying the type of interaction (e.g., Q&A, Summarization). Helps group and evaluate interactions of similar types within an application.

    Note: If interaction_type is not provided, the application kind will be used as a default.

  • user_annotation (str) - is the pipeline's output good enough for this interaction? Possible values: Good, Bad, or Unknown.
  • user_annotation_reason (str) - textual reasoning for annotation.

👍

All the above fields are optional.

Interaction Data Fields

  • input - input for this interaction.
  • information_retrieval - data retrieved as context for the LLM in this interaction. Separate Documents
  • history - additional context relevant to the interaction, which wasn't retrieved from a knowledge base. For example: chat history.
  • Additional Steps - In case your application has additional intermediate steps they should be logged under the steps mechanism. See more details here.
  • full_prompt - the prompt sent to the LLM used in this interaction. For user display, not used for properties.
  • output - output of this interaction.

👍

All fields are optional, except "input" and "output": every interaction should have at least one of them, and usually both of them.

🤓

See Supported Use Cases for further details about the specific content of the data fields for each use case

There are some nuances for the above field's meaning that vary if the task is Q&A, Summarization, Generation, Classification, or Other.