OpenAI

This guide outlines how to integrate Deepchecks LLM Evaluation with your OpenAI models to monitor and analyze their performance. Deepchecks provides comprehensive tools for evaluating LLM-based applications, including:

Data logging and enrichment: Capture interactions with your OpenAI models, including inputs, outputs, and annotations. Deepchecks automatically enriches this data with valuable insights like topics, properties, and estimated annotations.
Performance comparison: Compare different versions of your LLM pipeline side-by-side to track improvements and identify regressions.
Golden set testing: Evaluate your models on a curated set of examples to ensure consistent performance across versions.
Production monitoring: Monitor your models in production to detect issues and ensure they are performing as expected.

Prerequisites

Before you begin, ensure you have the following:

A Deepchecks LLM Evaluation account.
An OpenAI API key.
Python environment with the deepchecks-llm-client package installed (pip install deepchecks-llm-client).

Integration Steps

Initialize Deepchecks Client:

from deepchecks_llm_client.client import DeepchecksLLMClient  

dc_client = DeepchecksLLMClient(
  api_token="YOUR_API_KEY"
)

Replace the placeholders with your actual API key, application name, and version name.

Instrument OpenAI Calls:

Make your call to OpenAI:

# Make your OpenAI calls as usual
response = openai.Completion.create(...)

Log Interactions:

from deepchecks_llm_client.data_types import LogInteraction, AnnotationType, EnvType

dc_client.log_interaction(
  app_name="YOUR APP NAME",
  version_name="YOUR VERSION NUMBER",
  env_type=EnvType.EVAL,
  interaction=LogInteraction(
    input="user input",
    output="model output",
    annotation=AnnotationType.GOOD  # Optional annotation
  )
)

View Insights in Deepchecks Dashboard:
Once you've logged interactions, head over to the Deepchecks LLM Evaluation dashboard to analyze your model's performance. You can explore various insights, compare versions, and monitor production data.

Advanced Options

Deepchecks offers several advanced features for fine-grained control and analysis:

Updating Annotations and Custom Properties: You can update annotations and custom properties for logged interactions.
Logging Steps: For complex LLM pipelines, you can log individual steps with their inputs and outputs.
Additional Interaction Data: Log additional data like timestamps, user IDs, and custom properties for richer analysis.