Data Fields Reference
Complete reference for every data field Deepchecks supports - content fields, metadata fields, and span fields - with descriptions and usage guidance.
This page is a complete reference for every field you can include when sending data to Deepchecks - whether via the Python SDK, CSV upload, or auto-instrumentation. Use it to understand what each field does and what evaluation or observability features it enables.
Using auto-instrumentation? If you are sending data via framework integrations (LangGraph, CrewAI, Google ADK, LangChain), most fields are filled in automatically from parsed span attributes. This reference is most useful when uploading data via the SDK or CSV, where you control exactly which fields to include.
Interaction content fields
Content fields contain the actual text from the interaction and are used for property calculations. Which fields are relevant depends on your interaction type.
| Field | Required | Type | Description | What it enables |
|---|---|---|---|---|
input | Required | str | The input to the LLM pipeline - the user's question, request, or source text | Required for most properties. Without it, Deepchecks cannot evaluate the interaction. |
output | Required | str | The pipeline's response or generated content | Required for most properties. Without it, Deepchecks cannot evaluate the interaction. |
full_prompt | Optional | str | The complete prompt sent to the LLM, including system instructions | Required for LLM-based properties. Enables Instruction Following and Instruction Fulfillment. Displayed in the interaction detail view. |
information_retrieval | Optional | list of str | Documents or data retrieved as context for the LLM (e.g., RAG results, database query results) | Enables retrieval-based properties: Grounded in Context, Retrieval Relevance, Retrieval Coverage |
history | Optional | list of str | Additional context relevant to the interaction that was not retrieved from a knowledge base - for example, chat conversation history | Enables history-aware evaluation in Chat and multi-turn interaction types |
expected_output | Optional | str | A reference output serving as ground truth - typically created by a human annotator or deemed high-quality through review | Enables similarity-based evaluation and reference comparison |
steps | Optional | list of Step | Intermediate processing steps within the interaction (e.g., query rephrasing, routing, PII removal). Can be used in property calculations. | Displayed in the interaction detail view. Provides visibility into intermediate processing. See Python SDK Integration. |
Interaction metadata fields
Metadata fields are used for organization, filtering, annotations, system metrics, and version comparison.
| Field | Required | Type | Description | What it enables |
|---|---|---|---|---|
user_interaction_id | Optional | str | Unique identifier for the interaction within a version. Auto-generated if not provided. Providing your own stable ID is recommended when you need to match interactions across versions or update them after upload. | Matching interactions across versions for comparison. Updating annotations or properties after upload. |
session_id | Optional | str | Groups related interactions into a session (e.g., a conversation, a multi-step workflow). Auto-generated if not provided. | Session-level evaluation, Sessions view in the UI, grouping multi-turn conversations |
interaction_type | Optional | str | The type of interaction (e.g., Q&A, Summarization, Agent). Defaults to the application's default type if not provided. | Determines which built-in properties are enabled, which auto-annotation rules apply, and how the interaction is grouped in the UI |
started_at | Optional | timestamp | When the interaction started. Accepts ISO 8601 format (e.g., 2025-01-01T00:00:01+00:00) or Unix epoch (e.g., 1742742893). | Latency calculation (with finished_at), production monitoring timeline |
finished_at | Optional | timestamp | When the interaction ended. Same format as started_at. | Latency calculation, production monitoring |
model | Optional | str | The model name used for the LLM call (e.g., gpt-4o, claude-3-5-sonnet). | Cost tracking (combined with model pricing configuration), filtering by model |
model_provider | Optional | str | The model provider (e.g., openai, anthropic). | Cost tracking, filtering by provider |
input_tokens | Optional | int | Number of prompt tokens consumed by the LLM. | Token usage metrics, cost calculation |
output_tokens | Optional | int | Number of completion tokens generated by the LLM. | Token usage metrics, cost calculation |
tokens | Optional | int | Total token count. Auto-calculated from input_tokens + output_tokens if not provided. | Token usage metrics |
annotation | Optional | str | Human annotation label: Good, Bad, or Unknown | Appears as a filled badge in the UI. Takes precedence over estimated annotations. |
annotation_reason | Optional | str | Textual reasoning for the annotation | Displayed alongside the annotation for context |
Span-specific fields
These fields are used when uploading hierarchical agentic data via the Span class and log_spans function. See Upload Agentic Data for full details.
Required span fields
| Field | Type | Description |
|---|---|---|
span_id | str | Unique identifier for the span |
span_name | str | Descriptive name of the operation |
trace_id | str | Identifier grouping spans into a single trace |
span_kind | SpanKind | Type of span: CHAIN, AGENT, TOOL, LLM, RETRIEVAL |
parent_id | str or None | span_id of the parent span (None for the Root span) |
started_at | float | Start timestamp (Unix epoch) |
finished_at | float | End timestamp (Unix epoch) |
input | str | Data passed into the operation. Required for property calculation - without it, Deepchecks cannot evaluate this span. |
output | str | Data returned by the operation. Required for property calculation - without it, Deepchecks cannot evaluate this span. |
full_prompt | str | The complete prompt sent to the LLM. Required for LLM spans - enables Instruction Following and Instruction Fulfillment properties. |
Optional span fields
| Field | Type | Description |
|---|---|---|
expected_output | str | Expected result for evaluation |
information_retrieval | list | Retrieved documents or data |
model | str | Model name (e.g., gpt-4o) |
model_provider | str | Model provider (e.g., openai) |
input_tokens | int | Number of prompt tokens |
output_tokens | int | Number of completion tokens |
tokens | int | Total tokens (auto-calculated from input + output if not provided) |
status_code | str | Execution status (e.g., OK, ERROR) |
status_description | str | Additional context about the status |
graph_parent_name | str | Logical parent span name (used for graph visualization) |
session_id | str | Groups related traces into a session |
metadata | dict | Custom key-value properties. Raw attributes stored here are parsed by Deepchecks into structured fields. |
steps | list of Step | Intermediate processing steps within the span |
user_value_properties | list of UserValueProperty | Custom numeric or categorical properties |
System metrics
System metrics are computed from metadata fields and provide operational observability alongside quality evaluation:
| Metric | Source | Description |
|---|---|---|
| Latency | started_at + finished_at | Time to process the interaction |
| Input tokens | input_tokens field or auto-instrumentation | Number of prompt tokens consumed |
| Output tokens | output_tokens field or auto-instrumentation | Number of completion tokens generated |
| Total tokens | tokens field or sum of input + output | Total token consumption |
| Cost | Token counts + model pricing configuration | Monetary cost per interaction (configure model pricing in Workspace Settings). Broken into input_cost and output_cost per span. |
| Run status | status_code field or auto-instrumentation | Whether the span executed successfully |
In agentic pipelines, system metrics are aggregated across child spans - so you can see total token usage and cost for an entire agent trace, not just individual LLM calls.
When using auto-instrumentation, most system metrics are captured automatically from parsed framework attributes. For manual uploads via the SDK or CSV, include timestamps, model info, and token counts to get a complete observability picture.
Field requirements by use case
Not every interaction type needs every field. Here is a quick guide to what matters most for common use cases:
| Use case | Essential fields | Recommended fields |
|---|---|---|
| Q&A / RAG | input, output, information_retrieval | full_prompt, expected_output, timestamps, token fields |
| Summarization | input, output | full_prompt, timestamps, token fields |
| Generation | input, output | full_prompt, information_retrieval, timestamps |
| Classification | input, output | expected_output, full_prompt |
| Chat | input, output, history | full_prompt, session_id, timestamps |
| Agentic | span_id, trace_id, span_kind, parent_id, timestamps, input, output, full_prompt (LLM spans) | model, input_tokens, output_tokens, status_code |
See Supported Use Cases for a full breakdown of interaction types and their properties.
Updated 14 days ago