Excluding Undesired Interactions from the Evaluation Set

In the process of evaluating LLMs with Deepchecks, it is important to ensure that the evaluation set represents the actual interactions we are expecting or receiving in production. Part of what's needed in order to do so is to add interactions to the evaluation set (as described in other sections). However, another important part is to remove examples that shouldn't be in the evaluation set or shouldn't be there anymore. This can happen for any of the following reasons:

  1. The sample was added without careful review and is only now being reviewed carefully
  2. We now have more data or more representative data from the production use case
  3. There are a few similar interactions, or the total dataset is too large to manage, and we're getting rid of some of the interactions

We'll show how it's done using 2 different methods - using the UI and using the SDK.

Excluding Undesired Interactions via the UI

To exclude/remove undesired interactions from the evaluation set using the UI, follow the following steps:

  1. Go to the data screen

  1. Choose the Application and Version (see top right corner) you would like to edit:

  1. Check the boxes on the left of the interactions you would like to remove. Not that these may be easier to find using the filtering and sorting capabilities:

  1. Click on the trash can icon on the top, that appears alongside the "Select All" option after selecting some of the interactions. Then confirm the deletion within the pop-up window:

Note: This will delete the interactions only for the version that's currently being edited. To delete interactions that appear in multiple versions, you must repeat this action for a few different versions, or for a more streamlined experience - use the SDK option as described below.

Excluding Undesired Interactions via the SDK

To exclude/remove undesired interactions from the evaluation set using the SDK, follow the following steps:

  1. Create a list of user_interaction_ids of samples that you'd like to delete. This can be achieved by following steps 1-3 in the "Excluding Undesired Interactions via the UI" section above, and then clicking the "download" icon and using only the user_interaction_id. Example Python code (after downloading):
import pandas as pd
interactions = pd.read_csv('v1.1-EVAL.csv')
user_interaction_ids_list = interactions['user_interaction_id'].tolist()

Of course, instead of this, you can create a list manually after seeing the IDs within the system. When clicking on a specific interaction, the interaction ID appears at the top of the popup window, and its text can be selected and copied:

And we can simply add a line of code that contains "pasted" interaction IDs as such:

user_interaction_ids_list = ['qa-medical-gvhd-63', 'qa-medical-gvhd-37']
  1. To delete interactions via the SDK, we'll first initiate the Deepchecks Python client:

    from deepchecks_llm_client.client import dc_client
    from deepchecks_llm_client.data_types import EnvType
    
    # Initialize the Deepchecks LLM Evaluation client
    dc_client.init(host="https://app.llm.deepchecks.com", api_token="Your Deepchecks API Token Here (get it from within the app)",
                   #app_name and version_name should be modified to a similar format given your use case
                   app_name="GVHD-demo", version_name="baseline",
                   env_type=EnvType.EVAL, auto_collect=False)
    
  2. If you've already initiated dc_client within your Python code, you don't have to do it again. You can just override the current variables of the app & version names:

    dc_client.app_name = 'GVHD-demo'
    dc_client.version_name = 'baseline'
    
  3. Now for the deletion of the samples, we need to iterate on the different versions of the app for which we don't want these samples to appear. Usually, this will be all versions, but in some cases it may be a selected group of versions or just one of them. In this case, we'll manually create a list of the versions we'd like to delete the samples from:

    version_list = ['baseline', 'v2-IR']
    

And then, for each of the versions, we'll delete the samples:

for version in version_list:
    dc_client.version_name = version
    dc_client.delete_interactions(user_interaction_ids_list)

SDK Conclusion & Snippet

Just for convenience, here's a snippet of code containing all of the steps for deleting multiple interactions from multiple versions of the app (and choosing one of the alternatives when there are 2 options):

from deepchecks_llm_client.client import dc_client
from deepchecks_llm_client.data_types import EnvType

# Initialize the Deepchecks LLM Evaluation client
dc_client.init(host="https://app.llm.deepchecks.com", api_token="Your Deepchecks API Token Here (get it from within the app)",
               #app_name and version_name should be modified to a similar format given your use case
               app_name="GVHD-demo", version_name="baseline",
               env_type=EnvType.EVAL, auto_collect=False)

user_interaction_ids_list = ['qa-medical-gvhd-63', 'qa-medical-gvhd-37']
version_list = ['baseline', 'v2-IR']

for version in version_list:
    dc_client.version_name = version
    dc_client.delete_interactions(user_interaction_ids_list)

Good luck!