Additional Features
These features are an important component of the solution and should be used after an initial experience with Deepchecks.
AI-Assisted Annotations 🖌️
Deepchecks allow in-system annotations within the data page and in the sample view by clicking and editing the suggested estimated annotation. Unlike the estimated annotation, manual annotations are used as a source of truth for advanced features such as Deepchecks Evaluator[link] and can be tracked separately from estimated annotations.
AI-Assisted Annotations can be done by utilizing the properties and auto annotations in the system. These can be incorporated both when annotating with an external tool (by downloading the data, or consuming it via SDK), or within the Deepchecks UI. To learn more about this flow see AI-Assisted Annotations.
Evaluation Set Creation & Management 🌲
To effectively evaluate a version before releasing it to production or to compare it with other versions it is crucial to have a high quality and representative evaluation set. Deepchecks can assist in this process via two main functionalities:
Interaction Generation
The interaction generation is an LLM-based process that utilizes provided context (either documents or webpages*) as well as information about the application to generate diverse high-quality inputs that can be appended to the evaluation set.
*Deepchecks scraping process also looks at additional webpages under the domain. For example, supplying the https://deepchecks.com/ URL will automatically parse all web pages on the website.
Copy Production Samples
Once you have identified types of interactions from the production environment that are unrepresented in your evaluation set, you can copy them to your evaluation dataset. This will ensure that your next rounds of testing will show a more realistic score. You can also download these examples into a CSV, for further analysis.
For more information on how to use this functionality see Evaluation Dataset Management.
Pentest 🛑
LLMs are able to tackle an amazing range of textual tasks, but this capability comes with a price - the end users are often able to input free text to your system that will end up directly as part of the prompt sent to your LLM. There are several known possible "attacks", or vulnerabilities, of LLM-based systems. The Pentest environment is built to test your system for a wide range of known vulnerabilities. Many of these are based on the Garak OSS package. This testing is done by providing you with a set of malicious inputs which you can then run through your LLM pipeline. Deepchecks will then automatically test and score your LLM's app resilience to these attack attempts, by looking at the corresponding outputs and identifying if the pipeline has been able to block the attack.
For more information on how to use this functionality see Pentesting Your LLM-Based App.
Translation 🔤
Deepchecks support a wide range of languages including most of the European and South Asia languages using a translation module. As such, data is stored in Deepchecks in its original form alongside an English translation which is used by models that were trained on English data. We found that for multi-national companies the translation itself can be valuable and as such we made it available in the Deepchecks system.
A partial list of the languages Deepchecks clients use includes: Spanish, German, French, Dutch, Hebrew, Japanese, Mandarin, and Malay.
Topic Detection 🫧
Deepchecks use a multi-step approach to cluster interactions based on topics. The topics are useful to better understand your data, especially when it comes to the production environment, and are included in all of Deepchecks' RCA features.
In the first step, interactions are clustered via semantic similarity and then the clusters are filtered and assigned a topic via a LLM-based process. Some samples that do not fit in any of the created topic categories are assigned to the "Other" category. The topic categories are initialized once 50 unique samples are available in the application and can be recalculated once a significant amount of new data is available.
Custom Topics
Deepchecks offers the option to override the default topic detection by uploading your own topics. You can simply add a column named "topic" to your CSV file and specify your custom topic for each interaction. You can also add a "topic" argument when defining each interaction via SDK.
Updated about 1 month ago