DocumentationAPI ReferenceRelease Notes
DocumentationLog In
Documentation

LLM Properties

LLM properties utilize a generic LLM-as-a-judge framework to evaluate the subtle qualities of your application. This evaluation is conducted by having the LLM assess your interaction based on specified criteria, assigning a score from 1 to 5, with higher scores indicating better performance.

When you initiate a task in Deepchecks LLM Evaluation, it is automatically set up with a predefined set of properties suited to your task type. You can further customize this by adding new LLM properties, detailing the evaluation steps the LLM should follow, and identifying the interaction components to be assessed. For example, the default LLM property "Completeness," used in Q&A tasks, is defined as follows:

📘

Recalculate Properties

When a new property is defined, it will be automatically applied to all of the new interactions sent to the system. You can use the "Recalculate" button in order to trigger the LLM properties calculation for existing interactions. If you "recalculate" for interactions that already have values, the "recalculate" will override them. Note: this recalculation is part of the "LLM Tokens Processed" in system usage.

Few Shot Prompting

You can enhance the evaluation of new interactions by providing a CSV file with annotated examples (using the bottom-left button in the above screenshot). These examples should demonstrate exemplary reasoning and scoring. The provided CSV will be used for few shot prompting the LLM, a technique which guides the LLM in applying consistent formatting and reasoning.

The CSV file should include:

  1. Columns for each field in the User Input Structure ("Input" and "Output" in the screenshot above).
  2. Reason column: This free-text column should capture the reasoning or explanation for the provided score.
  3. Score column: A numeric score between 1 and 5 that reflects the quality of the interaction.

For instance, here's a CSV with a 3 exemplars few-shot that fits the above "Completeness" example: