DocumentationAPI ReferenceRelease Notes
DocumentationLog In
Documentation

Dataset Management

Dataset Management

Overview

Datasets are curated collections of test samples used to systematically evaluate your LLM application's performance. Unlike production interactions that reflect real user behavior, datasets provide controlled, reproducible test scenarios that help you measure quality improvements across versions, catch regressions, and validate changes before deployment. Each dataset contains input samples (and optionally expected outputs) that you can run against your application to generate evaluation scores.

What are Datasets?

A dataset is a named collection of test samples within an application. There are two types of datasets - single-turn datasets and multi-turn datasets.

Each sample in a single-turn dataset consists of:

  • Input (required) - The test input to send to your application (can be a string, JSON object, or array)
  • Reference Output (optional) - Expected or reference output for comparison
  • Metadata (optional) - Additional context like test category, difficulty level, or sample tags

Each sample in a multi-turn dataset consists of:

  • Input:
    • Task (required) - The instruction or goal the AI agent should accomplish during the conversation.
    • Persona (optional) - The identity or role the simulated user takes on when interacting with the agent
    • Context (optional) - Background information or scenario details that shape how the conversation unfolds (e.g., "the user already tried resetting their password twice").
  • Reference Output (optional) - Expected or reference output for comparison
  • Metadata (optional) - Additional context like test category, difficulty level, or sample tags

Datasets serve multiple purposes: regression testing across versions, benchmarking performance improvements, evaluating model changes, and validating prompt modifications before production rollout.

Creating Datasets

Via SDK (Single-Turn Dataset Example)

from deepchecks_llm_client import DeepchecksLLMClient

client = DeepchecksLLMClient(api_token="your-token", host="your-host")

# Create a new dataset
dataset = client.create_dataset(
    app_name="my-app",
    dataset_name="regression-tests-v1"
)

# Add samples
samples = [
    {
        "input": {"prompt": "What is machine learning?"},
        "output": {"expected": "ML is..."},
        "sample_metadata": {"category": "definitions"}
    },
    {
        "input": {"prompt": "Explain neural networks"},
        "output": {"expected": "Neural networks are..."},
        "sample_metadata": {"category": "concepts"}
    }
]

client.add_dataset_samples(
    app_name="my-app",
    dataset_name="regression-tests-v1",
    samples=samples
)

Via UI

  1. Navigate to your application's Datasets page
  2. Click Create Dataset and choose single/multi-turn
  3. Provide a descriptive dataset name
  4. Add samples manually, upload a CSV, or use AI generation (see AI Data Generation)

Managing Dataset Samples

Viewing Samples

The dataset details page displays all samples within a dataset. Each row shows:

  • Input preview
  • Output preview (if provided)
  • Metadata tags (if provided)
  • Actions (edit, delete)

Adding Samples

Add manual samples individually or in batches:

Single Sample (UI):

  1. Open the dataset
  2. Click Add Sample at the bottom of the screen.
  3. Enter input (required) and output (optional)
  4. Add metadata as key-value pairs (optinal)
  5. Save

Batch Upload (SDK):

# Add up to 55000 samples per API call
client.add_dataset_samples(app_name, dataset_name, samples_list)

In addition, clicking on the "Add Samples" button on the top-right corner of the screen will enable you to add samples to the dataset via csv/json upload of via AI generation

Editing Samples

Update existing samples by clicking the edit icon:

  • Modify input or output content
  • Update metadata tags
  • Changes are saved immediately