Sending Evaluation Data via Deepchecks SDK

To add interactions to an evaluation set, follow these steps. Please note these should work both for a version name that already exists within the system, and for a new version (and version name):

  1. Read the interactions within your Python code (this step depends on how you saved them, etc):
import pandas as pd
interactions_df = pd.read_csv('new_interactions_to_add.csv')
  1. Set up the Deepchecks Python client:
from deepchecks_llm_client.client import dc_client
from deepchecks_llm_client.data_types import EnvType, LogInteractionType

# Initialize the Deepchecks LLM Evaluation client
dc_client.init(host="", api_token="Your Deepchecks API Token Here (get it from within the app)",
               #app_name and version_name should be modified to a similar format given your use case
               app_name="GVHD-demo", version_name="baseline",
               env_type=EnvType.EVAL, auto_collect=False)
  1. Now format the DataFrame as an interaction within Python, leaving only the columns that need to be sent:
# Mandatory columns
mandatory_columns = ['input', 'output']
# Optional columns (highly recommended to include any of these whenever possible)
optional_columns = ['user_interaction_id', 'information_retrieval', 'annotation', 'full_prompt']
# Custom properties
custom_properties_columns = ['My Custom Property']

interactions = []
for index, row in interactions_df.iterrows():
  # Prepare arguments for LogInteractionType object, only include optional columns if they exist
  interaction_args = {
    'input': row['input'],
    'output': row['output'],
    'custom_props': {key: row[key] for key in custom_properties_columns if key in row and not pd.isnull(row[key])}

  # Add optional arguments if the column is present and the value is not NaN
  for column in optional_columns:
    if column in row and not pd.isnull(row[column]):
      interaction_args[column] = row[column]

      # Create a LogInteractionType object from the prepared arguments
      interaction = LogInteractionType(**interaction_args)

  1. Now log the interactions:


  1. For existing versions, this only works in cases where the 'user_interaction_id' doesn't exist yet in the evaluation set. If some of them may already exist in the evaluation set, it's recommended per evaluation version to fetch the evaluation data as a DataFrame via SDK (using the 'get_data' function, see here), and then add a check making sure each id is non-existent in the 'user_interaction_id' column.
  2. If there is data from multiple versions (with different outputs for each version, etc), it's possible to create a similar workflow to log these new interactions to multiple versions in parallel. For this, the advanced user can get inspiration from the 'Cloning Interactions from Production into the Evaluation Set' section above.