DocumentationAPI ReferenceRelease Notes
DocumentationLog In
Documentation

Compare Between Versions

In the E-Commerce Summarization application, the goal is to compare prompts and identify one that optimally balances two key properties: Attractiveness and Grounded in Context. Additionally, we assess other relevant Deepchecks properties for summarization: Coverage and Text Quality.

Version Screen

The Deepchecks Versions screen provides an overview of key metrics and their differences across three versions.

Here we can see the first insight about the different prompts - there is an inherent tradeoff between the three metrics which are most important for our use case: Attractiveness, Grounded in Context and Coverage. In this scenario, the user selected the Balanced version, which achieves satisfactory scores in all the key metrics even though it was not the top performer in any of them.

📘

Note

We can see that user preferences are represented in the estimated score calculated by the auto annotation yaml in which the selected version received a significantly higher score.

Comparing Interactions Between Versions

After gaining a high-level understanding of the version differences, the user might wish to compare two specific versions.

To perform the comparison, select the versions named Attractive and Balanced, then click on "Compare 2 Versions."

Next we can look into the differences between the two across different samples, Attractive (right) and Balanced (left):

Remember: We created auto annotation yaml which helps us estimate the 'Good' and 'Bad' annotations.

In the sample above we can see that the Balanced (left) was evaluated as good by properties: Coverage(0.88), Text Quality(5), Attractiveness(5), Grounded In Context(0.99), while the Attractive was evaluated as bad by the low Grounded In Context score (0.5).

We can see that the Attractive (right) version contains an hallucination, by marking the difference between the two versions and reading the Consistency explanation:

"It also features DTS X: Ultra technology for immersive gaming audio. Support: This fact is not explicitly mentioned in the input."