DocumentationAPI ReferenceRelease Notes
DocumentationLog In
Documentation

Python SDK Integration

Send data from your custom LLM pipeline to Deepchecks using the Python SDK - batch uploads for evaluation, streaming for production, and everything in between.

The Python SDK gives you full control over what data you send to Deepchecks and when. Use it when:

  • You have a custom LLM pipeline that does not use a supported framework (LangGraph, CrewAI, Google ADK, LangChain)
  • You want fine-grained control over exactly which fields are sent and when
  • You are setting up a production integration that logs interactions in real time as they flow through your pipeline
  • You are building a CI/CD workflow that uploads evaluation sets and checks results programmatically

Using a supported framework? Auto-Instrumentation is simpler and captures more data automatically - including raw framework attributes that Deepchecks parses into structured fields.

Building an agentic pipeline with hierarchical spans? If your data has a parent-child span structure (agents, tools, sub-agents), use Upload Agentic Data instead of this page. The SDK's log_batch_interactions and log_interaction methods are designed for flat, single-level interactions.

Running on AWS SageMaker? The data-upload methods on this page work identically, but SDK initialization and authentication differ (SigV4 signing, different env vars). See Using the Python SDK on SageMaker before running the code snippets below.

For deeper references:


Installation and setup

pip install deepchecks-llm-client

Initialize the client and create an application (if you have not already):

from deepchecks_llm_client.client import DeepchecksLLMClient
from deepchecks_llm_client.data_types import EnvType, ApplicationType

dc_client = DeepchecksLLMClient(api_token="your-api-key")

# Create an application (safe to re-run - skips if it already exists)
dc_client.create_application("My App", app_type=ApplicationType.QA)

SageMaker users: Authentication works differently on SageMaker - the SDK uses AWS SigV4 signing and reads configuration from Partner AI App environment variables. See Using the Python SDK on SageMaker for the initialization code. Everything else on this page applies as-is.


Uploading data

Batch upload

Batch upload is the most common way to send data to Deepchecks. Use it for uploading evaluation sets, periodic production data dumps, or any scenario where you have a collection of interactions ready to send at once.

Deepchecks represents data samples as LogInteraction objects. Organize your data into a list of these objects and upload them in one call.

Here is an example that reads from a pandas DataFrame (df) with input, output, and id columns:

from deepchecks_llm_client.data_types import LogInteraction

interactions = [
    LogInteraction(
        input=row["input"],
        output=row["output"],
        user_interaction_id=row["id"],
        interaction_type="Q&A",
        session_id="session-1",
    )
    for _, row in df.iterrows()
]

dc_client.log_batch_interactions(
    app_name="My App",
    version_name="v1",
    env_type=EnvType.EVAL,
    interactions=interactions,
)

Before any data can be uploaded, an application with the appropriate name must exist. You can create one via the SDK (as shown above) or from the Manage Applications screen in the UI.

Stream upload (production real-time logging)

Stream upload lets you log an interaction progressively as it moves through your inference pipeline - rather than waiting until the entire pipeline finishes. This is useful in production, where input, retrieval results, and output are generated at different times. The interaction stays hidden from the UI until you mark it as completed, at which point Deepchecks begins evaluation.

from deepchecks_llm_client.data_types import LogInteraction, UserValueProperty

# Step 1: Log the initial input (not yet completed)
dc_client.log_interaction(
    app_name="My App",
    version_name="v1",
    env_type=EnvType.PROD,
    interaction=LogInteraction(
        user_interaction_id="id-1",
        input="My Input",
        history=["Hi", "Hello! how can I assist you today?"],
        user_value_properties=[UserValueProperty("User Region", "US")],
        is_completed=False,
        interaction_type="Q&A",
        session_id="session-1",
    ),
)

# Step 2: Add retrieval results as they become available
dc_client.update_interaction(
    app_name="My App",
    version_name="v1",
    user_interaction_id="id-1",
    information_retrieval=["First doc", "Second doc"],
    user_value_properties=[UserValueProperty("# of Documents", 2)],
)

# Step 3: Add the output and mark as completed
dc_client.update_interaction(
    app_name="My App",
    version_name="v1",
    user_interaction_id="id-1",
    output="My Output",
    is_completed=True,
)

Once marked as completed, Deepchecks begins calculating properties and running the automatic annotation pipeline on that interaction.


Interaction fields

LogInteraction accepts the following fields. See Data Fields Reference for full descriptions of each field and what it enables.

Content fields (used for property calculation):

FieldRequiredDescription
inputRequiredThe user's input or question. Without it, most properties cannot run.
outputRequiredThe pipeline's response. Without it, most properties cannot run.
full_promptRecommendedThe complete prompt sent to the LLM. Required for Instruction Following and Instruction Fulfillment properties.
information_retrievalOptionalRetrieved context documents (e.g., RAG results)
historyOptionalChat history or additional conversation context
expected_outputOptionalGround truth reference output

Metadata fields (used for organization, metrics, and evaluation):

FieldRequiredDescription
user_interaction_idOptionalUnique identifier for this interaction. Auto-generated if not provided - but providing your own ID is recommended when you need to match interactions across versions for comparison, or update them after upload.
session_idOptionalGroups interactions into a session. Auto-generated if not provided.
interaction_typeOptionalType of interaction (e.g., Q&A, Summarization)
started_atOptionalStart timestamp (ISO 8601 or Unix epoch)
finished_atOptionalEnd timestamp
modelOptionalModel name (e.g., gpt-4o)
model_providerOptionalModel provider (e.g., openai)
input_tokensOptionalNumber of prompt tokens
output_tokensOptionalNumber of completion tokens
tokensOptionalTotal token count
annotationOptionalHuman annotation: Good, Bad, or Unknown
annotation_reasonOptionalTextual reason for the annotation
user_value_propertiesOptionalList of UserValueProperty objects for custom metrics
stepsOptionalIntermediate processing steps (see below)
is_completedOptionalWhether the interaction is complete (default: True)

Tip: Including started_at, finished_at, model, input_tokens, and output_tokens is strongly recommended - these enable latency tracking, cost calculation, and a complete observability picture.


Enriching interactions

Adding timestamps and model info

Timestamps and model fields enable system metrics (latency, token usage, cost):

from deepchecks_llm_client.data_types import LogInteraction

interaction = LogInteraction(
    user_interaction_id="id-1",
    input="user input 1",
    output="model answer 1",
    started_at="2025-01-01T23:59:59",
    finished_at="2025-01-02T00:00:01",
    model="gpt-4o",
    model_provider="openai",
    input_tokens=150,
    output_tokens=80,
    interaction_type="Q&A",
)

Adding user-value properties

User-value properties are your way of supplying additional context about an interaction - numeric (e.g., a custom similarity score) or categorical (e.g., a topic label from your own classifier). They can be used in automatic annotations, filtering, and root cause analysis.

from deepchecks_llm_client.data_types import LogInteraction, UserValueProperty

interaction = LogInteraction(
    user_interaction_id="id-1",
    input="user input 1",
    output="model answer 1",
    interaction_type="Q&A",
    user_value_properties=[
        UserValueProperty("My Numeric Property", 1.5),
        UserValueProperty("My Categorical Property", "Low quality",
                          reason="Bad grammar"),
    ],
)

Including manual annotations

In addition to the automatic annotations Deepchecks calculates, you can include your own human judgments - from internal QA reviews or direct user feedback:

interaction = LogInteraction(
    user_interaction_id="id-1",
    input="user input 1",
    output="model answer 1",
    annotation="Good",
    annotation_reason="Accurate and complete answer",
)

Possible values: Good, Bad, or Unknown. These appear as filled badges in the UI alongside the estimated annotations Deepchecks calculates automatically.

Intermediate steps (optional)

If your pipeline has intermediate processing stages you want to capture alongside the main input/output, you can log them as steps. Steps are optional and specialized - use them when you have additional processing data (like query rephrasing, PII removal, or routing decisions) that is useful for debugging or analysis. They can also be used to calculate properties.

from deepchecks_llm_client.data_types import LogInteraction, Step

interaction = LogInteraction(
    user_interaction_id="id-1",
    input="user input 1",
    output="model answer 1",
    steps=[
        Step(name="Router", value="Go to Question Answering agent"),
        Step(name="PII_Removal", value="anonymized user input"),
    ],
    interaction_type="Q&A",
)

Steps can also be added progressively via update_interaction - useful when each step completes at a different time:

dc_client.update_interaction(
    app_name="My App",
    version_name="v1",
    user_interaction_id="id-1",
    steps=[Step(name="Router", value="Go to Question Answering agent")],
)

Updating and retrieving data

Updating existing interactions

In some cases, additional information arrives after an interaction has already been logged as completed - for example, annotations from user feedback or user-value properties computed asynchronously:

dc_client.update_interaction(
    app_name="My App",
    version_name="v1",
    user_interaction_id="id-1",
    annotation="Bad",
    annotation_reason="Hallucinated the return policy details",
    user_value_properties=[UserValueProperty("User Feedback Score", 2)],
)

Note: Deepchecks does not permit updates to data content fields (input, output, information_retrieval, etc.) on completed interactions, as these changes can affect property calculations. If you need to change content fields, delete the interaction and re-upload it.

Downloading enriched data

You can retrieve your data back from Deepchecks with all computed properties, topics, and annotations included. This is useful for CI/CD pipelines, custom analysis, or exporting results:

# Basic download
data = dc_client.get_data(
    app_name="My App",
    version_name="v1",
    env_type=EnvType.EVAL,
)

# Wait for all property calculations to finish before returning
# (useful in CI/CD where you need final scores before proceeding)
data = dc_client.get_data_if_calculations_completed(
    app_name="My App",
    version_name="v1",
    env_type=EnvType.EVAL,
)

Both functions return a pandas DataFrame containing your original data plus all computed metrics. The second variant blocks until all properties (including LLM-based ones that run asynchronously) have finished calculating.