Integration Overview

If you have already run through the SDK Quickstart or UI Quickstart, you have seen Deepchecks evaluate sample data. This section shows you how to connect your real pipeline data so you can start evaluating and monitoring your actual application.

Deepchecks only has access to data that you explicitly send to it. How you send that data depends on your stack and how much automation you want.

Choose your integration path

There are three main ways to get data into Deepchecks. Pick the one that matches your setup:

Path A: Auto-instrumentation (recommended for supported frameworks)

If you are using LangGraph, CrewAI, Google ADK, or LangChain, this is the fastest and most complete path. Add a few lines of setup code and Deepchecks automatically captures every trace, span, tool call, and LLM invocation - including system metrics like latency, token usage, and cost.

Best for: Teams using supported agentic or LLM frameworks who want full tracing with minimal effort.

Go to Auto-Instrumentation (Frameworks)

Path B: Python SDK

If you have a custom LLM pipeline - or you want fine-grained control over exactly what data you send - use the Python SDK directly. You can upload data in batch (for evaluation sets) or stream it in real time (for production).

Best for: Custom pipelines, batch evaluation workflows, production integrations, or any setup where you want explicit control.

Go to Python SDK Integration

Path C: Upload via CSV

If you have data in a spreadsheet or exported from another system, you can upload a CSV file directly from the Deepchecks UI. No code required.

Best for: One-time uploads, quick exploration, or teams that process data outside of Python.

Go to Upload via CSV

Before you start

Whichever path you choose, you will need:

A Deepchecks account - sign up here if you do not have one yet
An API key - generate one from your profile icon > API Key tab in the Deepchecks app
An application - create one in the Deepchecks UI (Manage Applications) or via the SDK

When uploading data, you always specify three things:

Application - which LLM task this data belongs to
Version - which implementation of the pipeline generated this data
Environment - Evaluation (for benchmarking) or Production (for live traffic)

These concepts are explained in detail in Key Concepts.

Tip: Store your API key as an environment variable (DEEPCHECKS_API_KEY) rather than hardcoding it in your scripts.

If you are building agents

Agentic workflows have a specific data structure - traces composed of hierarchical spans (agent calls, tool calls, LLM calls, etc.). There are two ways to get this data into Deepchecks:

Supported frameworks (LangGraph, CrewAI, Google ADK, LangChain) - use Auto-Instrumentation and the hierarchy is captured automatically. This is the recommended path.
Custom agent frameworks - use the SDK's span-based upload to manually define parent-child relationships. See Upload Agentic Data.

What happens after you upload data

Once your data arrives in Deepchecks - regardless of how you sent it - the platform automatically:

Maps each interaction to the correct interaction type (Root, Agent, Chain, Tool, LLM, Retrieval) - based on Deepchecks' research and parsing logic.
Computes system metrics - latency, token usage, and cost per interaction, aggregated per session (when timestamps and token data are available).
Groups spans into the trace hierarchy so you can inspect full executions in the Sessions view.
Calculates property scores on every interaction - quality metrics like Grounded in Context, Avoided Answer, and Fluency.
Runs the automatic annotation pipeline to label each interaction as Good, Bad, or Unknown.

You can then explore results in the Deepchecks UI - see Navigating the UI for a guide to every screen.

To understand how each of these steps works and how to use the evaluation tools to improve your application, continue to the Core Features section - starting with Properties (the quality scores that drive everything else) and Automatic Annotations (how those scores become Good/Bad labels).