Root Cause Analysis (RCA)

Imagine this scenario: You've just been alerted by your customer-success team or Deepchecks monitoring that your mission-critical deployed model is behaving erratically. Or perhaps you're planning to thoroughly examine a version candidate before its production release.

In either case, Deepchecks provides a comprehensive suite of tools to gain deep insights into your application's performance:

Annotation Breakdown

Located in the overview screen, this feature offers a quick snapshot of the distribution of your applications errors and their prevalence.

In the Production environment, you can compare the score breakdown across two different time ranges to identify trends, detect potential drifts, and better understand changes in your model's behavior. This comparison helps highlight which properties may be contributing to performance issues—or, conversely, driving improvements—allowing for faster root-cause analysis and more informed decision-making.

Property Explainability

Found in the single sample view, this tool allows for a granular understanding of your application's mistakes. Depending on the property you click, you'll be supplied with an in-depth explanation about the scoring process reasoning or a chunk of the output where the property score is the lowest or greatest.

Weak Segments

This automated analysis uses Deepchecks-generated features to identify specific data segments where your application underperforms. These features encompass various aspects of the input, retrieved information, and output, including text length, topics, and more.

Recommendations

This automated analysis leverages Deepchecks-generated properties to detect common pitfalls in LLM-based applications and suggest solutions, based on the properties scores. For instance, if both the

PII Risk and Grounded in Context properties score high, it may indicate that your retrieved data contains personal information that your model isn't filtering out during output generation. In such cases, we might recommend including specific instructions in your prompt or implementing a post-processing detection mechanism. Here's another example for hallucination detection and mitigation:

Generated Property Summaries

Deepchecks provides an LLM-powered analysis tool that automatically examines low-scoring interactions for any evaluation property. By reviewing metadata or highlighted text from failed samples, the tool groups related issues into high-level, meaningful categories, each illustrated with concrete examples and references to the original cases.

For properties that are evaluated by an LLM-Judge (with full reasoning), each category also includes an estimated root cause for the failure, offering valuable insight into underlying model weaknesses. For properties that include only highlighted text-chunks but no judge explanations, root causes are not provided, but content patterns and examples are still grouped and summarized to guide evaluation efforts.

Download All

For those seeking maximum control, this feature allows you to download all Deepchecks-calculated features as a CSV for further analysis. You can access this functionality through the UI on the "Interactions" page or via the

SDK.

With these powerful tools at your disposal, you're well-equipped to dive deep into your application's performance, identify areas for improvement, and make data-driven decisions to enhance your model's reliability and effectiveness.