DocumentationAPI ReferenceRelease Notes
DocumentationLog In
Documentation

Hierarchy & Data Upload Format

How to structure your data in order to gain the most value from using Deepchecks for Evaluation?

This page enables you to get a good understanding of Deepchecks Structure, and how to work with you data to upload it to Deepchecks. It includes both:

  • Data Hierarchy - Our structure of: Applications, Environments, and Versions and the interactions inside them.
  • Data Upload Format - How to structure the different data fields (and metadata) for Deepchecks to process it.

Data Hierarchy: Concepts

Application 💬

  • The application is the highest hierarchy within your organization's workspace.
  • It represents a full task (end to end use case) and may include multiple steps, or represent a chat scenario.
  • All customizations such as the selected Properties (built-in, custom, or LLM) and Annotation YAML are for the application level.

Note: Applications are standalone. Therefore if you'd like to compare between scores across applications, we recommend considering to upload the data to the same application and separate it by versions. Alternatively (e.g. for different tasks), you can separate the applications, and keep all of the identifiers except application name similar (version names, user_interaction_id's). That will allow easy comparison when retrieving the values with our SDK.

Environment ♻️

Relates to the stage in your lifecycle:

  • Evaluation - Typically for benchmarking the performance, comparing between multiple versions, iterating on a version to improve and fix problems, checking new application candidates, CI/CD, etc. Usually, the samples in the evaluation set would be the same for all versions of an application.
  • Production - for deployed applications, usually after choosing the best version amongst a few, and after initial configuration of scores and properties, to enable efficient overtime monitoring.
  • Pentesting (where applicable) - Separate environment for safety-related evaluation.

Version 🔎

Each iteration of your pipeline. They may differ in base model, prompts, data processing, etc.

Interaction

The data itself. It's the minimal annotatable unit. Can include input, output, information retrieval, and additional steps on the way. Read more about the interaction's structure below.

Data Upload Format: Structure

🎯

Uploading Interactions - Target Location

Make that you're chosen the desired Application, Version & Environment when uploading data to Deepchecks. Whether if you're uploading data from a file via the system's UI, or with Deepchecks' SDK.

Deepchecks only has access to data that was strictly sent to it for evaluation purposes, in the supported data format (following the structure explained below). Structuring the data correctly is essential so that the evaluation will yield the correct results.

Interactions Overview

📘

Interactions Data Fields

Every interaction has two types of data fields that Deepchecks utilizes:

  • Data Content Fields: They contain the actual text from the interaction and are the ones used for the property calculations (with respect to the specific field names and purposes).
  • Metadata Fields are everything else: They are used for annotations, enabling version comparison, displaying latency and cost, and much more.

Each field has its own significance, and almost all of them are optional (though may affect accessible features). Read more about them below:

Interaction Metadata Fields

  • user_interaction_id (str) - must be unique within a single version. Used for identifying interactions when updating annotations, and identifying the same interaction across different versions
  • started_at (timestamp) - timestamp for interaction start. UTC time is the display timezone.
  • finished_at (timestamp) - timestamp for interaction end. Delta with started_at is used for calculating latency.
  • user_annotation (AnnotationType or None) - is the pipeline's output good enough for this interaction? Possible values: AnnotationType.Good/Bad/Unknown or None.
  • user_annotation_reason (str) - textual reasoning for annotation

👍

All the above fields are optional.

Interaction Data Fields

  • input - input for this interaction.
  • information_retrieval - data retrieved as context for the LLM in this interaction. Separate Documents
  • [coming up] history - additional context relevant to the interaction, which wasn't retrieved from a knowledge base. For example: chat history.
  • Additional Steps - In case your application has additional intermediate steps they should be logged under the steps mechanism. See more details here.
  • full_prompt - the prompt sent to the LLM used in this interaction. For user display, not used for properties.
  • output - output of this interaction.

👍

All fields are optional, except "input" and "output": every interaction should have at least one of them, and usually both of them.

🤓

See Supported Applications for further details about the specific content of the data fields for each use case

There are some nuances for the above field's meaning that vary if the task is Q&A, Summarization, Generation, Classification, or Other.