DocumentationAPI ReferenceRelease Notes
DocumentationLog In
Documentation

This guide outlines how to integrate Deepchecks LLM Evaluation with your Anthropic models to monitor and analyze their performance.

Prerequisites

Before you begin, ensure you have the following:

  • A Deepchecks LLM Evaluation account.
  • An Anthropic API key.
  • Python environment with the deepchecks-llm-client and anthropic packages installed (pip install deepchecks-llm-client anthropic).

Integration Steps

  1. Initialize Deepchecks Client:
from deepchecks_llm_client.client import DeepchecksLLMClient  

dc_client = DeepchecksLLMClient(
  api_token="YOUR_API_KEY"
)
  1. Log Interactions with Anthropic Models:

Here's an example of how to log interactions with Anthropic's Claude model:

from deepchecks_llm_client.data_types import LogInteractionType, AnnotationType, EnvType
from anthropic import Anthropic

# Configure Anthropic client
client = Anthropic(api_key="YOUR_ANTHROPIC_API_KEY")

def log_anthropic_interaction(user_input):
    # Make prediction using Anthropic model
    message = client.messages.create(
        max_tokens=1024,
        messages=[{"role": "user", "content": user_input}],
        model="claude-3-opus-20240229",
    )
    prediction = message.content

    # Log interaction to Deepchecks
    dc_client.log_interaction(
      app_name="YOUR APP NAME",
      version_name="YOUR VERSION NUMBER",
      env_type=EnvType.EVAL,
      input=user_input,
      output=prediction,
      annotation=AnnotationType.UNKNOWN  # Add annotation if available
    )

# Example usage
user_input = "Write a short story about a robot who wants to become human."
log_anthropic_interaction(user_input)

This code snippet demonstrates how to:

  • Use the anthropic library to interact with Anthropic models.
  • Make predictions using the model.
  • Log the interaction data (input, output) to Deepchecks using the log_interaction method.
  1. View Insights in Deepchecks Dashboard:
    Once you've logged interactions, head over to the Deepchecks LLM Evaluation dashboard to analyze your model's performance. You can explore various insights, compare versions, and monitor production data.

Advanced Options

Deepchecks offers several advanced features for fine-grained control and analysis:

  • Updating Annotations and Custom Properties: You can update annotations and custom properties for logged interactions.
  • Logging Steps: For complex LLM pipelines, you can log individual steps with their inputs and outputs.
  • Additional Interaction Data: Log additional data like timestamps, user IDs, and custom properties for richer analysis.