DocumentationAPI ReferenceRelease Notes
DocumentationLog In
Documentation

Data Upload

The different functions and knowhow needed to upload data into the Deepchecks platform.

The data hierarchy in Deepchecks starts with applications, each representing a use-case. Each application contains different versions, and the data in each version can either be part of the Evaluation set (which should be common to all versions) or the Production set. Therefore, when uploading data, we need to specify the location to which we want the data to be uploaded.

dc_client.log_batch_interactions(
        app_name="DemoApp", version_name="v1", env_type=EnvType.EVAL,...

Batch Upload

Used either for uploading evaluation set data from the research (or staging) environment or as part of a periodic job to upload the collected production data.

Deepchecks represent data samples as LogInteractionType. Prior to batch data uploading, we need to organize our data as LogInteractionType objects. In the example below, the data is first converted into a list of LogInteractionType objects and then uploaded to the evaluation set of version 'v1'.

interactions = [LogInteractionType(input=row["input"],output=row["output"],
                                   user_interaction_id=row['id']) for _, row in df.iterrows()]

dc_client.log_batch_interactions(
        app_name="DemoApp", version_name="v1", env_type=EnvType.EVAL, interactions=interactions)

Stream Upload

Most commonly used in inference server integration. Deepchecks allow fractional logging of an interaction as it flows throw the inference pipeline. Stay tuned for code snippets about how to upload the interaction data fields gradually throughout the inference process.

Multi-Step Application

For applications that contain more than a single input and output, we will want to log and evaluate each logical step that was executed. In Deepchecks, these intermediate steps are logged under the steps argument in an interaction.

from deepchecks_llm_client.data_types import Step, StepType

intermidiate_steps = [
                Step(
                  name="Router",
                  input="user input 1",
                  output="Go to Question Answering agent"),
                Step(
                  name="PII_Removal",
                  type=StepType.PII_REMOVAL,
                  attributes={'model': 'gpt-4o-mini'},
                  input="user input 1",
                  output="anonymized user input"),
 								Step(
                  name="Question Answering",
                  type=StepType.LLM,
                  attributes={'model': 'gpt-4o-mini'},
                  input="anonymized user input",
                  output="model answer 1"),
]


interaction_w_steps = LogInteractionType(user_interaction_id="id-1",
                                         input="user input 1",
                                         output="model answer 1",
                                         steps=intermidiate_steps)

Stay tuned for info about how to log the steps on after the other in steram upload (using the steps_to_add argument).

Custom Properties and Timestamps

Custom Properties are your way of supplying additional information about the sample to the Deepchecks platform. Custom properties are usually either metrics that are designed to be part of the Automatic Annotations or metadata used both for logging purposes as well as for Root Cause Analysis (RCA).

Timestamps are provided via the started_at and finished_at parameters. They are used to calculate the application's latency and to mark the time the samples were processed for production monitoring.

from deepchecks_llm_client.data_types import LogInteractionType

interaction = LogInteractionType(user_interaction_id="id-1",
                                 input="user input 1",
                                 output="model answer 1",
                                 started_at='2024-09-01T23:59:59',
  															 finished_at=datetime.now().astimezone(),
                                 custom_properties={"My Numeric Property": 1.5,
                                                    "My Categorical Property": "USA"}
                                )

Updating Existing Interactions

In certain instances, auxiliary information may arrive after the interaction has already been logged on the Deepchecks platform. Common examples include annotations that can be obtained later through direct user feedback or an internal quality assurance agent, as well as custom properties values.

Deepchecks does not permit updates to the data fields, as these changes can impact the property calculation process. If such modifications are necessary, you can delete the interaction and then re-upload it.

dc_client.update_interaction(
    app_name="DemoApp",
    version_name="v1",
    user_interaction_id="id-1",
    annotation="Bad", 
    annotation_reason=None,
    custom_props={"My Custom Property": 1.5}
)

📌

Uploading data directly from the UI

Notice - you can also upload data to the system using CSV/XLSX format directly from the UI