0.30.0 Release Notes

We’re excited to introduce several powerful improvements to data visibility, evaluation control, and LLM-based analysis. This release brings new pages, enhanced customization for properties, and more intuitive session-level insights—designed to help you streamline your evaluation workflows and better understand your pipeline’s performance.

Deepchecks LLM Evaluation 0.30.0 Release:

📁 New Data Pages: Sessions & Storage
📝 Edit LLM-Based Properties
🧩 Must/Optional Fields in Prompt Properties
📊 High-Level Property Insights
🧠 Session Topics per Environment

What's New and Improved?

New Data Pages: Sessions & Storage

We’ve added a dedicated Sessions page under each version. This view allows quick inspection and comparison of all evaluated sessions, including key metadata: session ID, number of interactions, initial user input, total latency, token usage, session annotation, and interaction types. It's a fast and informative way to analyze session-level data in your version.

The Storage page provides visibility into unevaluated sessions—available only for the production environment. Sessions are stored here if they were not selected for evaluation based on your configured sampling ratio. This page allows basic filtering, session-level inspection, and the ability to send selected sessions to evaluation, either individually or in bulk. Learn more about sampling here.

Edit LLM-Based Properties

It’s now possible to edit existing LLM-based properties directly (instead of creating a copy). For prompt properties, numerical or categorical, you can fully update prompt content and instructions as well as the steps and description. For built-in LLM properties, editing focuses on adjusting guidelines—allowing users to better align our prebuilt properties with their specific use cases. See full details in the Property Guide.

Example of Editing a Categorical Prompt Property

Must/Optional Fields in Prompt Properties

Prompt property creation is now more flexible with Must/Optional field configuration. When defining fields for your prompt logic, you can now mark each as: Must – the field must exist in the interaction for the property to be calculated. Optional – used if present, but doesn’t block evaluation if missing. This helps reduce unnecessary N/As and improves robustness.

The Dropdown Enables Choosing Must/Optional for Each Data Field

High-Level Property Insights

We’ve added a new RCA capability called Analyze Property Failures—providing LLM-generated summaries of how your properties are performing across the version. This gives a quick, high-level view of the failure points of each property that are causing problems on the interaction and session levels, helping you prioritize areas for version improvement. Read more here.

"Text Quality" Property Failure Analysis

Session Topics per Environment

We now support topic assignment at the session level and scoped by environment. While each interaction is still tagged with a topic (available via the SDK), this reflects the session’s topic. Separating topics between evaluation and production environments enables detection of new or unexpected topics appearing only in production.