Jump to Content
Documentation
Documentation
API Reference
Release Notes
v0.6.0
v0.7.0
v0.8.0
v0.9.0
v0.10.0
v0.11.0
v0.12.0
v0.13.0
v0.14.0
v0.15.0
v0.16.0
v0.17.0
v0.18.0
v0.19.0
v0.20.0
v0.21.0
v0.22.0
v0.23.0
v0.24.0
v0.25.0
v0.26.0
v0.27.0
v0.28.0
v0.29.0
v0.30.0
v0.31.0
v0.32.0
v0.33.0
v0.34.0
v0.35.0
v0.36.0
v0.37.0
v0.38.0
v0.38.1
v0.39.0
v0.40.0
v0.41.0
v0.42.0
Documentation
Log In
Documentation
Search
Ask AI
Log In
v0.42.0
Documentation
API Reference
Release Notes
Build & Maintain a High-Quality Evaluation Set
All
Pages
Start typing to search…
Overview
Deepchecks LLM Evaluation
Deepchecks LLM Evaluation for Agents
Agent Execution Flow Graph
Supported Use Cases
Data Integration
Hierarchy & Data Structure
Data Model Cheat-Sheet
Step by Step Integration Walkthrough
Core Features
Properties
Built-in Properties
RAG Use-Case Properties
Agent Use-Case Properties
Prompt Properties
Improve Guidelines with AI
Data Component for Prompt Properties
Refining Properties with Feedback
User-Value (Custom) Properties
Session-Level Properties
Automatic Annotations
Children Annotation Aggregation
Root Cause Analysis
Version Comparison
Dataset Management
AI Data Generation
Cost Tracking
Production Monitoring
Additional Features
"How To" Guides
Use the Deepchecks' SDK
Setup: Python SDK Installation & API Key Retrieval
Main SDK Classes
Data Upload
Interaction and Session Completion Status
Upload Hierarchical Data via SDK
Automatically Log Traces to Deepchecks
Export your traces and log them to Deepchecks
Data Download
Code Snippets: Full Examples
Navigate the single session view
Configure Auto Annotation Rules
Selecting the Right Properties
Threshold Detection
Auto Annotation Design
The Configuration YAML
Configuring Auto Annotation in the UI
Perform Root Cause Analysis
Identifying Failures
Analyzing Failures
End-to-End RCA Examples
Build & Maintain a High-Quality Evaluation Set
Building Your Initial Evaluation Set
Measuring Evaluation Set Quality
When and How to Update Your Evaluation Set
Manually Annotate Your Data
Hard Sample Mining for Fine-Tuning
Pentest Your LLM-Based App
Integrate Deepchecks Into Your CI/CD Pipelines
CookBooks
Multi-Agent Demo: Content Creator Crew
Logging the Data
Analyze Performance
Root Cause Analysis (RCA)
Compare Versions
Q&A Demo: GVHD Data
Uploading the Data
Identify Problems Using Properties, Estimated Annotations and Insights
User-Value Properties and Prompt Properties
Compare Between Versions
Monitor Production Data and Research Degradation
Summarization Demo: E-Commerce Data
Uploading the Data
Configuring the Automatic Annotation
Compare Between Versions
Production Monitoring
Classification Demo: Movie Genre
Uploading the Data
Evaluation Set Analysis
Production Monitoring
Settings & Integrations
Deepchecks in AWS SageMaker
Get the Most Out of Your DPUs in SageMaker
Working with LLM Features - Deepchecks on SageMaker
Using the Owner Panel
AWS Cloudwatch Integration
Self-Hosted Deployments
Model Configuration for Self-Hosted Deployments
Access Control
Usage Management and Optimization
Compliance
Framework Integrations
Langchain
CrewAI
LangGraph
Google ADK
Production Monitoring Integrations
Datadog
New Relic
AWS Cloudwatch
Powered by
Loading
Loading…