Use Custom Properties and LLM Properties
Using Custom Categorical Properties
A powerful way to monitor the performance of your LLM-based application is to categorize it based on your chosen criteria. This approach helps identify which types of input questions your application excels at and which categories might be underperforming.
Custom Properties in Deepchecks
Custom Properties are user-defined. To display a Custom Property in the Deepchecks system, users must upload interactions with the specified Custom Property value. Interactions lacking a value in the Custom Property field will be automatically assigned an N/A value.
In our GVHD use case, we introduced a categorical property named âQuestion Type,â which classifies input questions into the following five categories:
- General GVHD Question
- Treatments & Medications
- Emergency / Urgent
- Appointments & Services
- Other
To filter by a specific category on the Data page:
-
Select the category within the property view on the Overview page and click the âSee Interactionsâ button.
-
Alternatively, use the âFilter Categoriesâ option in the property filters section on the Data page, and select the desired categories to be shown.
Results in version v2_improved_IR: Switching between versions with a categorical property filter on allows for effective comparison between versions. Youâll observe changes in relative annotation scores and property scores for these specific interaction subgroups. The larger information pool in version v2_improved_IR contributes to improved responses, particularly for questions in the âEmergency / Urgentâ and âAppointments & Servicesâ that version v1_gpt3.5 struggled with.
Using LLM Properties
Another valuable method for evaluating your versionâs performance is by incorporating LLM Properties in it. Refer to our LLM Properties documentation page for more information.
For instance, In our GVHD use case, we observed a recurring output pattern where many answers begin with phrases like âBased on the provided context âŚâ. Other interactions contain similar phrases that imply the output was generated by an LLM, using a given context to answer the input question.
This pattern may not be desirable in a Q&A type application.
To highlight this observation of ours, we can create a Custom LLM Property as follows:
New LLM Property Content
Property Name: Based on context
Description: Highlight outputs mentioning it is "based on the context" it was provided with.
System Message:
- Read the entire output answer and analyze if it implies that it was based on a given context. If so, give it a high score. Otherwise, give it a low score.
- For example, outputs containing "Based on the provided context..", "Based solely on the context..", "The provided context.." or any similar phrase to these should get a high score.
Interaction Steps for Property: output
Recalculate LLM Property
After saving the property's definition, recalculate it with the following definitions (by default LLM properties are calculated only for data that is uploaded after they're defined, so recalculation is needed to get their results for the data that is already in the system):
Versions: Select All (2)
Environment: Both
-
After a brief calculation period, the new property will appear on the Overview page. When sorting or filtering by this property, youâll see that outputs implying they are using a given context in their answer indeed receive higher scores.
To explore another relevant LLM Property for our GVHD use case, return to the Properties page and add the built-in LLM property from the built-in property template âDocument Relevancyâ. As stated in its description, this property evaluates âHow relevant the retrieved documents (information_retrieval) are to the user inputâ.
Applying this property to our GVHD application reveals significant score differences between the two versions, highlighting the information gaps each version was trained on.
You can see another example of LLM Custom Property usage on the Monitor Production Data page.
Updated 4 months ago