This guide outlines how to integrate Deepchecks LLM Evaluation with your OpenAI models to monitor and analyze their performance. Deepchecks provides comprehensive tools for evaluating LLM-based applications, including:

  • Data logging and enrichment: Capture interactions with your OpenAI models, including inputs, outputs, and annotations. Deepchecks automatically enriches this data with valuable insights like topics, properties, and estimated annotations.
  • Performance comparison: Compare different versions of your LLM pipeline side-by-side to track improvements and identify regressions.
  • Golden set testing: Evaluate your models on a curated set of examples to ensure consistent performance across versions.
  • Production monitoring: Monitor your models in production to detect issues and ensure they are performing as expected.


Before you begin, ensure you have the following:

  • A Deepchecks LLM Evaluation account.
  • An OpenAI API key.
  • Python environment with the deepchecks-llm-client package installed (pip install deepchecks-llm-client).

Integration Steps

  1. Initialize Deepchecks Client:
1. from deepchecks_llm_client.client import dc_client  
   from deepchecks_llm_client.data_types import EnvType
       env_type=EnvType.EVAL,  # Change to EnvType.PROD for production monitoring  
       auto_collect=True,  # Enable automatic OpenAI call instrumentation  

Replace the placeholders with your actual API key, application name, and version name.

  1. Instrument OpenAI Calls:

Deepchecks can automatically capture interactions made with the openai package. Simply ensure you've set auto_collect=True during client initialization. You can also add tags to provide additional context:

    Tag.INPUT: "my user input",
    Tag.INFORMATION_RETRIEVAL: "documents used for prompt",

# Make your OpenAI calls as usual
response = openai.Completion.create(...)

Log Interactions Manually (Optional):

If you need more control or are not using the openai package directly, you can manually log interactions:

from deepchecks_llm_client.data_types import LogInteractionType, AnnotationType

    input="user input",
    output="model output",
    annotation=AnnotationType.GOOD,  # Optional annotation
    # ... other interaction data
  1. View Insights in Deepchecks Dashboard:
    Once you've logged interactions, head over to the Deepchecks LLM Evaluation dashboard to analyze your model's performance. You can explore various insights, compare versions, and monitor production data.

Advanced Options

Deepchecks offers several advanced features for fine-grained control and analysis:

  • Updating Annotations and Custom Properties: You can update annotations and custom properties for logged interactions.
  • Logging Steps: For complex LLM pipelines, you can log individual steps with their inputs and outputs.
  • Additional Interaction Data: Log additional data like timestamps, user IDs, and custom properties for richer analysis.