Dataset Management
Dataset Management
Overview
Datasets are curated collections of test samples used to systematically evaluate your LLM application's performance. Unlike production interactions that reflect real user behavior, datasets provide controlled, reproducible test scenarios that help you measure quality improvements across versions, catch regressions, and validate changes before deployment. Each dataset contains input samples (and optionally expected outputs) that you can run against your application to generate evaluation scores.
What are Datasets?
A dataset is a named collection of test samples within an application. There are two types of datasets - single-turn datasets and multi-turn datasets.
Each sample in a single-turn dataset consists of:
- Input (required) - The test input to send to your application (can be a string, JSON object, or array)
- Reference Output (optional) - Expected or reference output for comparison
- Metadata (optional) - Additional context like test category, difficulty level, or sample tags
Each sample in a multi-turn dataset consists of:
- Input:
- Task (required) - The instruction or goal the AI agent should accomplish during the conversation.
- Persona (optional) - The identity or role the simulated user takes on when interacting with the agent
- Context (optional) - Background information or scenario details that shape how the conversation unfolds (e.g., "the user already tried resetting their password twice").
- Reference Output (optional) - Expected or reference output for comparison
- Metadata (optional) - Additional context like test category, difficulty level, or sample tags
Datasets serve multiple purposes: regression testing across versions, benchmarking performance improvements, evaluating model changes, and validating prompt modifications before production rollout.
Creating Datasets
Via SDK (Single-Turn Dataset Example)
from deepchecks_llm_client import DeepchecksLLMClient
client = DeepchecksLLMClient(api_token="your-token", host="your-host")
# Create a new dataset
dataset = client.create_dataset(
app_name="my-app",
dataset_name="regression-tests-v1"
)
# Add samples
samples = [
{
"input": {"prompt": "What is machine learning?"},
"output": {"expected": "ML is..."},
"sample_metadata": {"category": "definitions"}
},
{
"input": {"prompt": "Explain neural networks"},
"output": {"expected": "Neural networks are..."},
"sample_metadata": {"category": "concepts"}
}
]
client.add_dataset_samples(
app_name="my-app",
dataset_name="regression-tests-v1",
samples=samples
)Via UI
- Navigate to your application's Datasets page
- Click Create Dataset and choose single/multi-turn
- Provide a descriptive dataset name
- Add samples manually, upload a CSV, or use AI generation (see AI Data Generation)
Managing Dataset Samples
Viewing Samples
The dataset details page displays all samples within a dataset. Each row shows:
- Input preview
- Output preview (if provided)
- Metadata tags (if provided)
- Actions (edit, delete)
Adding Samples
Add manual samples individually or in batches:
Single Sample (UI):
- Open the dataset
- Click Add Sample at the bottom of the screen.
- Enter input (required) and output (optional)
- Add metadata as key-value pairs (optinal)
- Save
Batch Upload (SDK):
# Add up to 55000 samples per API call
client.add_dataset_samples(app_name, dataset_name, samples_list)In addition, clicking on the "Add Samples" button on the top-right corner of the screen will enable you to add samples to the dataset via csv/json upload of via AI generation
Editing Samples
Update existing samples by clicking the edit icon:
- Modify input or output content
- Update metadata tags
- Changes are saved immediately
Updated 27 days ago