DocumentationAPI ReferenceRelease Notes
DocumentationLog In

Hard Sample Mining for Fine-Tuning

Use Deepchecks to gain a fine-grained understanding of where an LLM-based-app is underperforming, and for the use case of "fine-tuning" - download "hard samples" to use them for next version

In most cases, the currently deployed version of an LLM app isn’t the final one and is expected to be updated or improved in the future. Improvements can occur by updating prompts, updating code in the pipeline, replacing an LLM model completely, or fine-tuning the LLM model based on recent results.

For many of these use cases, but particularly for fine-tuning the LLM model based on recent results, it’s important to gather examples of interactions in which the LLM app has underperformed. The more advanced version of this is obtaining examples of underperformance, segmented by several explainable properties (e.g. buckets of text length, separation by language, with/without code, etc.). Streamlining this “hard sample mining” process can significantly accelerate the process, thus leading to quicker iterations.

Filtering Out "Hard Samples" Using Deepchecks

There are many options for how to do this with the Deepchecks system. We’ll demonstrate one example, by conducting the following steps:

  1. Go to the Data screen, and filter only the “thumbs down” and “estimated thumbs down” interactions.
  2. Review several of these interactions, while sorting by properties, and determine which properties are interesting to filter by.
  3. Create and activate property filters based on the initial analysis, so that only a specific segment of the “hard interactions” is displayed.
  4. Click on “select all” and download the interactions (that can then be added to the relevant training dataset, or just indexed for exploration).

This is what the flow looks like within the system:

Filtering out "hard samples" within Deepchecks. Worth noting that in this example, both the "thumbs down" and "estimated thumbs down" filters are both activated, along with another couple of filters.

Filtering out "hard samples" within Deepchecks. Worth noting that in this example, both the "thumbs down" and "estimated thumbs down" filters are both activated, along with another couple of filters.

Notes:

  1. When conducted within the production tab, there is a designated flow to enable adding “hard interactions” to the golden set.
  2. This workflow can be expanded to further simplify the fine tuning process, especially in cases where there is a strategic need to enable updating the fine tuning process in a few clicks.
  3. This flow is also supported via SDK.