SDK Quickstart

Use the Deepchecks LLM Evaluation Python SDK to send data to the system

Intro

Deepchecks LLM Evaluation SDK is a python package built on top of Deepchecks LLM Evaluation REST API, to install it simply run pip install deepchecks-llm-client

The SDK allows you to upload data to the system. That can be done automatically using code instrumentation for OpenAI calls or manually using explicit SDK function calls. For more info check out the SDK Reference section.

In addition the SDK can be used to annotate the logged interactions and to download the interactions enriched by various deepchecks-computed enrichments, such as: topics, properties and estimated annotations.

Interaction is a single call of the LLM pipeline, consisting of:

user_interaction_idinputinformation_retrievalfull_promptoutputannotation
Must be unique within a single version.
used for identifying interactions when updating annotations, and identifying the same interaction across different versions
(mandatory) The input to the LLM pipelineData retrieved as context for the LLM in this interactionThe full prompt to the LLM used in this interaction(mandatory) The pipeline output returned to the userWas the pipeline response good enough?
(Good/Bad/Unknown)

πŸ“Œ

Uploading data directly from the UI

Notice - you can also upload data to the system using CSV/XLSX format directly from the UI

SDK Minimal Example

Before diving in, let's outline the most basic example for using the SDK to log an interaction to the system.

from deepchecks_llm_client.client import dc_client
from deepchecks_llm_client.data_types import (EnvType, AnnotationType, LogInteractionType,
                                              ApplicationType)

# Initiate the Deepchecks LLM Evaluation client
dc_client.init(host="https://app.llm.deepchecks.com", api_token="Fill API Key Here",
               app_name="DemoApp", app_type=ApplicationType.QA, version_name="v1",
               env_type=EnvType.EVAL, auto_collect=False)

# Log two interactions to the system
dc_client.log_batch_interactions(
        interactions=[
            LogInteractionType(
                input="my user input1",
                output="my model output1",
                annotation=AnnotationType.GOOD,
            ),
            LogInteractionType(
                input="my user input2",
                output="my model output2",
                annotation=AnnotationType.BAD,
            ),
        ]
    )

The basic two steps are first initiating the client, and then logging samples to the system. The client, dc_client, can be initiated once within your code and then can be used throughout your application to log interactions.

To better understand what is an application, a version, an environment type and an interaction please refer to the UI Quickstart.

Setup

Install the python SDK

pip install deepchecks-llm-client

Generating an API Key


πŸ“˜

app_name, app_type and version_name

Initializing the SDK using dc_client.init for an application or version name that doesn't already exist will create the given application and / or version in the system.

If the application doesn't already exist, you must also explicitly set the app_type argument to define the type of the new application that will be created.

Initializing Deepchecks Client

Before any information can be tracked using the Deepchecks LLM Eval client, you must first initialize the dc_client singleton object.

dc_client.init(host="https://app.llm.deepchecks.com", api_token="Fill API Key Here",
               app_name="DemoApp", app_type=ApplicationType.QA, version_name="v1",
               env_type=EnvType.EVAL, auto_collect=False)

πŸ“Œ

Managing Error Handling

By default, any issues whilst using the SDK and logging data will result in an error being raised. Since the SDK can be integrated in production environments, we have the option to not to catch these exceptions and instead print error/warn logs. This can be done by setting the silent_mode flag to True. The level of the log used if silent_mode is True can be set using the log_level argument.

Logging and Downloading Data

Uploading a New Golden Set Version

The Golden Set are the interactions used by you to regularly evaluate your LLM pipeline and to compare its performance between different versions. For that reason, the inputs for the golden set interactions will often be identical between different versions, to enable "apples to apples" comparison between the two versions. The golden set will usually contain a relatively small but diverse set of sample representative of the interactions you've encountered in production.

You will probably start out by collecting these inputs, joined by the outputs generated by your system for these given inputs. If you have these inputs and outputs in a DataFrame, you can upload them using the log_batch_interactions command.

dc_client.log_batch_interactions(
        interactions=[
            LogInteractionType(
                input=row["input"],
                output=row["output"],
                user_interaction_id=row['id']
            ) for idx, row in df.iterrows()
        ]
)

This will upload the data to the application, version and environment defined in your dc_client.init command, and will initiate calculation of the various components provided by the Deepchecks LLM Eval system, such as properties, similarity scores and estimated annotations.

Downloading Data

Once the calculations have completed, you may now download a the data "enriched" by the various deepchecks components.

# This will download the basics - interaction components and annotations
dc_df = dc_client.get_data(EnvType.EVAL, version_name="v1")

# To download compoments computed by the system, such as properties and similariy scores,
# request them by setting the relevant arguments to True
dc_df = dc_client.get_data(EnvType.EVAL, version_name="v1", 
                          return_output_props=True, return_llm_props=True,
                          return_similarities=True)

Uploading Further Golden Set Versions

When you've made some changes to your LLM pipeline, you can quickly use the golden set downloaded from the system to generate the outputs for this new version and upload them to the system as well.

golden_set_inputs = dc_df['input']

v2_outputs = v2_llm_pipeline(golden_set_inputs) # Replace this with the code for running your updated LLM pipeline

dc_df['output'] = v2_outputs

dc_client.version_name = 'v2' # Set dc_client to upload to a new version, named v2
dc_client.log_batch_interactions(
        interactions=[
            LogInteractionType(
                input=row["input"],
                output=row["output"],
                user_interaction_id=row['id']
            ) for idx, row in dc_df.iterrows()
        ]
)
  • The dc_client object is a singleton, and can be used after being instantiated once in the system.
  • To change the application, version or environment type to which you wish to log your interactions, use the app_name, version_name and env_type functions of the dc_client object. These settings will apply until any other such settings are applied.

Uploading Production Data

A further use-case is logging data to production. This can be done inside your production code in the following way:

dc_client.version_name = current_version # Current production pipeline version
dc_client.env_type = EnvType.PROD # set logging to prod env

## Your Production code here
sample_output = your_llm_pipeline(sample_input)
## /End Production code

dc_client.log_interaction(input=sample_input,
                          output=sample_output)

Advanced Logging Options

Updating Annotations and Custom Properties

You can update an annotation and the value of any custom properties:

dc_client.update_interaction(user_interactions_id_2, version_name="v2",
                             annotation=AnnotationType.GOOD, 
                             annotation_reason=None,
                             custom_props={'My Custom Property': 1.5}
                            )

Note: Custom Properties must be set using the properties configuration screen before you can log their values using the SDK.

πŸ“˜

User Interaction ID

In the SDK/API you might see user_interaction_id, This is your way to set unique identifier for you "inputs", so the same "input" cross versions will get the same user_interaction_id

If you maintain such id in your system, please add it when you upload data. You will be able to search by the id from the UI / REST API. This is very helpful in cases were you have feedback you got on a particular interaction and what to observe that interaction in Deepchecks LLM Eval.

Notice that user_interaction_id must be unique in the context of a single version!
If you do not set it, Deepchecks will generate global unique UUID and put it in for you.

Steps

In many cases LLM systems are composed of chains of various steps - multiple LLM calls, RAG systems, queries made to some DB and so on. On addition to the key fields of an interaction (input, output, information_retrieval) you may log an arbitrary amount of steps, each containing an input and an output.

from deepchecks_llm_client.data_types import Step, StepType

dc_client.log_interaction(input="my user input",
                          output="my model output",
                          full_prompt="system part: my user input",
                          annotation=AnnotationType.BAD,
                          user_interaction_id=str(uuid.uuid4()),
                          started_at=datetime(2024, 10, 31, 15, 1, 0).astimezone(),
                          finished_at=datetime.utcnow().astimezone(),
                          steps=[
                            Step(
                              name="Information Retrieval",
                              type=StepType.INFORMATION_RETRIEVAL,
                              attributes={"embeddings": "ada-02"},
                              input="my user input",
                              output="This is a relevant document for this input"),
                            Step(
                              name="Information Retrieval",
                              type=StepType.LLM,
                              attributes={'model': 'gpt-3.5-turbo'},
                              input="Full prompt with my user input + the retrieved document",
                              output="my model output"),
                          ]
)

Additional Interaction Data

Interactions have many more fields that can be set when logging. Logging these fields will make them viewable in the system (e.g. Latency, calculated if start and finish times are were logged).

# Log a batch of LLM calls to Deepchecks server
import uuid4
from datetime import datetime

user_interactions_id_1 = str(uuid.uuid4())
user_interactions_id_2 = str(uuid.uuid4())
dc_client.log_batch_interactions(
        interactions=[
            LogInteractionType(
                input="my user input2",
                output="my model output2",
                information_retrieval=["my information retrieval2"],
                full_prompt="system part: my user input2",
                annotation=AnnotationType.BAD,
              	annotation_reason='Output is not correctly grounded in information retrieval result',
                user_interaction_id=user_interactions_id_2,
                started_at=datetime(2024, 10, 31, 15, 1, 0).astimezone(),
                finished_at=datetime.utcnow().astimezone(),
                custom_props={'My Custom Property': 2}
            ),
            LogInteractionType(
                input="my user input1",
                output="my model output1",
                information_retrieval=["my information retrieval - first document",
                                       "my information retrieval - second document"],
                full_prompt="system part: my user input1",
                annotation=AnnotationType.GOOD,
                user_interaction_id=user_interactions_id_1,
                started_at=datetime(2024, 10, 31, 15, 1, 0).astimezone(),
                finished_at=datetime.utcnow().astimezone(),
              	custom_props={'My Custom Property': 1}
            )
        ]
)

OpenAI Call Instrumentation


🚧

Supported OpenAI python SDK versions

For instrumentation we support openai 0.x.x versions only. if you use openai 1.0.0 and above, please use our explicit SDK API call to log your interactions

Deepchecks LLM Client has the option to automatically log calls made using the openai package. You can instrument your OpenAI calls in the following way

from deepchecks_llm_client.client import dc_client

dc_client.init(host=DEEPCHECKS_LLM_HOST,
               api_token=DEEPCHECKS_LLM_API_KEY,
               app_name=DEEPCHECKS_APP_NAME,
               version_name=version,
               env_type=env_type,
               auto_collect=True,
               silent_mode=False,
               )

dc_client.set_tags({Tag.INPUT: "my input",
                    Tag.INFORMATION_RETRIEVAL: "documents to include in prompt"})

openai.key ="Fill OpenAI API Key Here"

chat_completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  temperature=0.7,
  messages=[
    {"role": "system",
     "content": "my system prompt"},
    {"role": "user", "content": "my user prompt"}
  ]
)
  • The set_tags functions is used to defined the special fields - user input, and information retrieval. The output and full prompt are logged directly from the Open AI call.
  • Make sure to set auto_collect=Truewhen initializing the dc_client singleton object for use in OpenAI instrumentation.

Full Use Case

Let's now outline how you would upload a complete version to the system.

# In this code snippet we demonstrate how to upload Evaluation data (Golden Set)
# to Deepchecks' LLM Evaluation using our python SDK

import uuid
from datetime import datetime
import pandas as pd

from deepchecks_llm_client.client import dc_client
from deepchecks_llm_client.data_types import EnvType, AnnotationType, LogInteractionType, Step, StepType

# This is deepchecks's service url
DEEPCHECKS_LLM_HOST = "https://app.llm.deepchecks.com"

# Login to deepchecks' service and generate new API Key (Configuration -> API Key) and place it here
DEEPCHECKS_LLM_API_KEY = "Your API Key"

# Use "Update Data" in deepchecks' service, to create a new application name and place it here
# This application must be exist, deepchecks' SDK cannot function without pre-defined application
# to work with
DEEPCHECKS_APP_NAME = "DemoApp"

# download data and read as csv
df = pd.read_csv('https://figshare.com/ndownloader/files/44077487')


# Init SDK's client
# Please notice - when using Deepchecks' SDK in an environment that rquires that exceptions won't stop
# execution, please set the silent_mode argument to True
dc_client.init(host=DEEPCHECKS_LLM_HOST, api_token=DEEPCHECKS_LLM_API_KEY,
               app_name=DEEPCHECKS_APP_NAME, app_type=ApplicationType.QA, version_name="0.0.1",
               env_type=EnvType.EVAL, auto_collect=False, silent_mode=False)

# Log a batch of 97 LLM calls to Deepchecks server, some with user annotations and some without
dc_client.log_batch_interactions(
        interactions=[
            LogInteractionType(
                input=row["input"],
                output=row["output"],
                information_retrieval=row["information_retrieval"],
                user_interaction_id=idx,
                annotation=AnnotationType.UNKNOWN if pd.isnull(row["annotation"]) else (
                    AnnotationType.GOOD if row["annotation"] == 'Good' else AnnotationType.BAD),
            ) for idx, row in df.iterrows()
        ]
)

print(f"Created version 0.0.1 in deepchecks server")

# Add another version equivalent to the first, this time logging the individual steps.
def log_eval_interactions(df):
    interactions = []
    for index in range(len(df)):
        interaction = df.iloc[index]
        steps = []
        # Append the data retriever (information retrieval) step
        steps.append(Step(
            name='Data Retriever',
            type=StepType.INFORMATION_RETRIEVAL,
            started_at=datetime.now().astimezone(),
            input=str(interaction['input']),
            output=str(interaction['information_retrieval']),
            finished_at=datetime.now().astimezone(),
            attributes={'model': 'gpt-3.5-turbo'}
        ))

        # Append the full prompt and response step
        steps.append(Step(
            name='LLM',
            type=StepType.LLM,
            started_at=datetime.now().astimezone(),
            input=str(interaction['full_prompt']),
            output=str(interaction['output']),
            finished_at=datetime.now().astimezone(),
            attributes={'model': 'gpt-3.5-turbo'}))

        interaction_to_log = LogInteractionType(
            input=steps[0].input,
            output=interaction['output'],
            user_interaction_id=index,
            steps=steps,
        )
        # Annotate the current interaction if annotation is provided
        if not pd.isna(interaction['annotation']):
            interaction_to_log.annotation = AnnotationType.GOOD if interaction['annotation'] == 'Good'\
                else AnnotationType.BAD
        else:
            interaction_to_log.annotation = AnnotationType.UNKNOWN

        interactions.append(interaction_to_log)

    dc_client.log_batch_interactions(interactions)


dc_client.version_name = '0.0.2'
log_eval_interactions(df)

print(f"Created version 0.0.2 in deepchecks server")

You can now access the version created in the system at https://app.llm.deepchecks.com/?appName=DemoApp&versionName=0.0.1&env=EVAL, or by searching for the new application name within the "Applications" selection. After a short while, properties and estimated annotations will be calculated (see the Properties Guide and UI Quickstart for more information about these).

SDK Reference

For a comprehensive list of available functionality, please check out the full SDK reference:

Python SDK Reference


What’s Next

Now that you have data in the system, head over to the dashboard to observe the insights deepchecks has to offer