DocumentationAPI ReferenceRelease Notes
DocumentationLog In
Release Notes
These docs are for v0.12.0. Click to read the latest docs for v0.38.0.

0.18.0 Release Notes

by Shir Chorev

This version includes improved understanding of your version’s performance with root cause analysis, added visibility the system’s usage, along with more features, stability and performance improvements, that are part of our 0.18.0 release.

Deepchecks LLM Evaluation 0.18.0 Release

  • 💡 Version Insights Enhancements
  • 🔎 Score Reasoning Breakdown
  • 📶 Usage Plan Visibility
  • 🦸‍♀️ Improvement to PII Property
  • ⚖️⚖️ Versions Page Updates
  • 📈 Production Overtime View Improvements

What’s New and Improved?

  • Version Insights Enhancements

    • Explainable insights with analysis, and actionable suggestion. Insights are based on property values. They Can be seen in the "Overview" screen per version, and in "Versions" screen to see application-wide insights with link to the relevant version.

  • Score Reasoning Breakdown

    • According to Annotation Reason. Click the “Show breakdown” next to the Score on the Dashboard.

  • Usage Plan Visibility

    • Displaying your stats and limits (Applications, Users, Processed data tokens) in the "Usage Tab" in the workspace settings

  • Improvement to PII Property

    • Combined several mechanisms to widen detection and improve detection recall and precision on a wide set of benchmarks
  • Versions Page Updates

    • Added high level data on version: insights on specific versions, and widget showing the most recent version, best performing one, etc.

  • Production Overtime View Improvements

    • Default view now loads time ranges of most recent weeks of data

0.17.0 Release Notes

by Shir Chorev

This version which includes a new overtime view for property values and score for production data, and exciting property related improvements, along with more features, stability and performance improvements, that are part of our 0.17.0 release.

Deepchecks LLM Evaluation 0.17.0 Release

  • 📉 Overtime Production View for Monitoring
  • 💬 New Property: Information Density
  • 🏃🏼‍♀️‍➡️ Ability to Rerun Annotation Pipeline on Multiple Versions in Application
  • 🔤 OpenAI Support for LLM Properties
  • 🥑 Improvements to Relevance and Grounded in Context Properties

What’s New and Improved?

  • Overtime Production View for Monitoring

    • In Production Environment, annotation scores and property scores are displayed over time

    • Timestamps are taken from the "started_at" field for each interaction. If no timestamp was give, current time of upload will be considered as interaction time.

  • New Property: Information Density

    • Information density is a score between 0 and 1, measuring the ratio of statements that convey information (e.g. facts, suggestions), out of all statements in the output. Read more about it in the Information Density Property documentation.
    • It helps finding places where the outputs aren’t actually useful, whether if the desired information is missing (e.g. the answer is not complete), avoided, or very general without directly addressing the user’s desire.
  • Rerun Annotation Pipeline on Multiple Versions

    • After uploading a new customized “Auto-Annotation YAML” in the Annotation Config screen, the annotation can now be conveniently rerun on all versions and environment in application, with the “Run Annotation Pipeline” button on the top right.

  • OpenAI Support for LLM Properties

    • LLM Properties can now run using OpenAI (vs. Azure OpenAI). This setting is organization wide, and can be enabled by request.
  • Improvements to Relevance and Grounded in Context Properties

    • Improved recall and precision on a wide set of benchmarks. Improving accuracy of Grounded in Context score also when grounding information is distributed between different documents.

0.16.0 Release Notes

by Shir Chorev

This version includes improvements to the grounded in context model, explainability for grounded in context and retrieval relevancy properties, supporting GPT-4o in the backend, english translation visibility, along with more features, stability and performance improvements, that are part of our 0.16.0 release.

Deepchecks LLM Evaluation 0.16.0 Release

  • 🤫 Updated Model for the Grounded in Context Property
  • 🧮 Properties Explainability
  • 💬 Updated LLM Reasoning View
  • 🔡 Translation to English is Viewable
  • 🚦 GPT-4o Supported in LLM Properties

What’s New and Improved?

  • Updated Model for the Grounded in Context Property

    • Improved recall and precision on a wide set of benchmarks, enabling accurate identification of problems.
  • Properties Explainability

    • Upon click, Grounded in Context displays the least grounded part of output, and similarly Retrieval Relevance marks the most relevant part of the information retrieval.

  • Updated LLM Reasoning View

    • LLM reasoning is now viewable upon click

  • Translation to English is Viewable

    • For applications where translation is enabled, the translated english version can be viewed
  • GPT-4o Supported in LLM Properties

    • LLM Properties can now run using GPT-4o. This setting is organization wide, and can be enabled by request.

0.15.0 Release Notes

by Shir Chorev

This version includes SDK usage improvements, monitoring flow with Datadog integration, avoided answer property updates, along with more features, stability and performance improvements, that are part of our 0.15.0 release.

Deepchecks LLM Evaluation 0.15.0 Release

  • 🖥️ SDK Updates for Ease of Use
  • 🤫 Improved Avoided Answer Property
  • 👀 Datadog Monitoring Integration
  • 🧮 Robust Tiers and Usage Tracking

What’s New and Improved?

  • SDK Updates - important note for SDK use of deepchecks-llm-client>=0.15.0

    • SDK client (previously dc_client ) is now created by instantiating DeepchecksLLMClient.
    • All existing functions, such as log_batch_interactions, now require the app_name, version_name and env_type arguments to enable better control of logged data.
  • Avoided Answer Property

    • New and improved algorithmic implementation for the avoided answer property.
    • Achieving an F1 score of above 0.98, on a variety of datasets (with the threshold of 0.5)
  • Datadog Monitoring Integration

    • See full documentation here: Datadog Integration

    • Deepchecks now suppports seamless integration with Datadog, in order to easily track the metrics over time, view configurable dashboards and receive alerts. This enables Datadog users to have the relevant logs, dashboards and alerts, viewable and configurable all in one place.

  • Robust Tiers and Usage Tracking

    • Monthly usage is now aggregated to allow automatic limits and billing, according to the product tier.

0.14.0 Release Notes

by Shir Chorev

This version includes improvements of using the properties (improved properties screen, multiple columns in data screen, improved retrieval relevance property), output text difference highlighting, visibility for the used tokens, and more features, stability and performance improvements that are part of our 0.14.0 release.

Deepchecks LLM Evaluation 0.14.0 Release

  • 🧮 New Properties Screen
  • 🔢 Multiple Columns in Data Screen
  • 👻 Improved Retrieval Relevance Property
  • 🧑‍🤝‍🧑 Text Difference Highlighting
  • 🛤️ Usage Visibility

What's New and Improved?

  • New Properties Screen

    • Following your feedback, the properties experience was now improved, to enable searching and viewing all properties, no matter their type, conveniently.

    • Starred properties are the ones that are viewable in the Overview screen. Remove or add them to overview screen from the properties page itself or from the overview page.

    • Custom properties now support underscores in the property name.

  • Multiple Columns in Data Screen

    • You can now choose multiple properties (or similarity) columns to display simultaneously in the data page, allowing to conveniently view multiple values for the interactions.

  • Improved Retrieval Relevance

    • The “Retrieval Relevance” property is now being calculated using an improved method, leading to better detection of cases in which irrelevant information has been retrieved as part of RAG systems.
  • Text Difference Highlighting

    • You can now turn “on” the toggle in the output view when comparing interactions, to highlight the differences between the outputs of the same interaction across two different versions.

  • Usage Visibility

    • Token usage in the system is now tracked and displayed in the new “Usage” screens in the workspace settings. The token tracking mechanism will replace the existing limits on the number of interactions uploaded or on calculating LLM properties.

0.13.0 Release Notes

by Shir Chorev

This version includes enhancements to the version comparison, additional similarity metrics: ROUGE and BLUE, expansions to the insights mechanism, and more features, stability and performance improvements that are part of our 0.13.0 release.

Deepchecks LLM Evaluation 0.13.0 Release

  • 👀 Similarity additions: added ROUGE and BLEU and allow sorting by similarity
  • 👭 Version comparison improvements: versions metadata, updated versions screen and comparison dialogue
  • 💡 Expanded insights mechanism
  • 📂 Application can now be created with SDK

What's New and Improved

  • Similarity Additions

    • ROUGE and BLEU metrics are now calculated between the outputs of every two similar interactions across versions (marked by having the same user_interaction_id), in addition to the existing Deepchecks similarity.

    • In the Data screen and in the Versions screen, the Similarity column can now be used for sorting.

  • Version Comparison Improvements

    • New design for Versions Screen, allowing to alternate between environments or expand a single version on click, to see all of its environments. Versions can now be compared across multiple properties.

    • Version's metadata can be added when creating a new version or editing a version. The version description is viewable upon hover in comparison screen, other fields are viewable upon opening the edit mode.

    • When choosing to see different interactions across versions, interactions can now be browsed to together (when scrolling and exploring different interaction sections)

  • Expanded Insights Mechanism

    • Insights for weak segments detection now run on all characteristics, including: topics, custom properties, and LLM properties, in addition to the built-in properties as before.
  • Create Application via SDK

    • Either as part of the init of the Deepchecks Client (see relevant section in SDK Quickstart), or with the create_application function in the SDK.

0.12.0 Release Notes

by Shir Chorev

This page includes updates from our 0.12.0 Release, which includes new features, stability and performance improvements.

Deepchecks LLM Evaluation 0.12.0 Release

  • 🦠New: Pentest Environment for detecting vulnerabilities that your LLM app is prone to
  • 🔠 Improved Support for Non-English Use Cases
  • 🗒️ Docs Additions: E2E Use case (GVHD), and new demonstrations for many evaluation features
  • 🪞"Golden Set" Environment Renamed to "Evaluation" in the UI
🚧

SDK Breaking Changes

The api for determining the version and environment the SDK is set to has changed.

Previously as a method:

dc_client.env_type(EnvType.PROD)
dc_client.version_name('v1')

Now, updating the class member:

dc_client.env_type = EnvType.PROD
dc_client.version_name = 'v1'

What's New and Improved

  • Pentest Environment
    • A dedicated environment for testing your system against known attack types. Enable it in the "Workspace Settings" to check it out.
    • Includes Pentesting data for running on your app, which should then be uploaded to Deepchecks to get an evaluation of your app's resilience to different types of attacks.
    • For more info, check out: Pentesting Your LLM-Based App
  • Improved Support for Non-English Use Cases
    • Deepchecks now includes built-in support for additional languages.
    • Reach out to us to have it enabled for your organization.
  • Docs Additions

0.11.0 Release Notes

by Shir Chorev

This page includes updates from our 0.11.0 Release, which includes new features, stability and performance improvements.

Deepchecks LLM Evaluation 0.11.0 Release

  • 💬 LLM Properties Enhancements: Property Bank & Multi-step LLM Properties
  • ✅ Email Upon Completion of Data Processing
  • 🎨 Updated Designs: Workspace Settings & Properties
  • 🤼 Version Comparison Flow
  • 🏛️ Classification Use Case Support

What's New and Improved

  • LLM Properties Enhancements

    • LLM Properties "Bank" - enables starting from a builtin llm property template, alongside starting from scratch and building one of your own.

    • LLM Properties can now receive any interaction step. If you have more than "Input", "Information Retrieval", and "Output", these steps can be logged to the interaction via our SDK, and then used for feeding the LLM Properties. See the "Interaction Steps for Property" phase inside the LLM Properties Definition.

  • Email upon completion of data processing after upload

    • Email notifications can now be configured in the "Workspace Settings" notifications tab.

    • Selected emails will receive a notification after data upload is completed - whether uploaded by csv or by email, per configured application.

  • Updated Designs for Workspace Settings and Properties Screens

    • The "Built-in" Properties can now be disabled, such that they won't be viewable 👁️anywhere across the app, to help you stay focused on the properties that matter.
  • Version Comparison Flow

    • Versions can now be selected in the Versions screen, enabling a deeper comparison.

    • The comparison enables:

      • Comparing multiple properties across versions.

      • Pinpointing identical Interactions (same user_interaction_id) which differ most between the versions: Different property scores, lowest similarity scores, different annotations.

  • Classification Use Case Support

    • New application type, for classification:

    • Automatically identifies and parses the interaction "Output" as a class, and enables using in-system capabilities (properties, RCA, property based auto-annotation, etc.) for evaluating your LLM Classification app.

0.10.0 Release Notes

by Shir Chorev

This page includes updates from our 0.10.0 Release, which includes new features, stability and performance improvements.

Deepchecks LLM Evaluation 0.10.0 Release

  • 👀 Comparative View Per Interaction
  • 📈 Deepchecks Evaluator - Auto-annotation component that learns from past annotations.
  • 🧑‍🤝‍🧑 Similarity Scores Are Now Widely Available
  • 📊 Property Distribution View in Overview Screen
  • ✍️ Application and Version Can Now Be Renamed
  • 🧖‍♂️ PII Risk Property
  • ⏳ System Quotas Visibility

What's New and Improved

  • Comparative View Per Interaction

    • The ability to see side-by side two or three versions for the same interaction (same user-interaction-id) across different versions.
    • This view can be accessed by "Compare to Other Version" button in Interaction view, or when choosing to "View Similar Interaction", when an interaction was auto-annotated based on Similarity.
  • Deepchecks Evaluator

    • Deepchecks' LLM-based evaluator offers high-quality annotations by learning from user annotated samples and generalizing that knowledge for new instances. The more user annotated samples across more versions, the better it performs.
    • It is now available by default as part of the auto annotation, and its priority within the different components (properties and similarity) can be customized.
  • Similarity Score

    • Similarity scores are now viewable in the comparative view, and also included when downloading data from the system.
    • Enables re-running auto-annotation pipeline more efficiently and tuning similarity accordingly.
  • Property Distributions View

    • Distributions of numeric property values are now showed when selecting properties in the overview screen

  • Application and Version Can Now Be Renamed

    • "Manage Applications" Screen and the "Versions" Screen enable renaming, for easing application and version management
  • PII Risk Property

    • A new property added to the system builtin properties, identifying risk for PII leakage on the application's output
  • System Quotas Visibility

    • Notification is now viewable when exceeding system limitations
    • Limitations are set upon: Number of LLM property recalculations, Number of interactions per application, and number of enabled LLM properties

0.9.0 Release Notes

by Shir Chorev

This page includes updates from our 0.9.0 Release, which includes new features, stability and performance improvements.

Deepchecks LLM Evaluation 0.9.0 Release

  • 🏃‍♀️ Expanded LLM Properties Support: Test Run, See Reasoning
  • 💾 Estimated Annotations Saved
  • ☸️ New Applications Management Screen

What's New and Improved

  • Expanded LLM Properties Support

    • LLM Properties can now be tested and results observed on chosen interactions before saving them to run.

    • Score and reasoning of the interactions can be viewed, during test run and also in the full interaction view in the "Data" Screen

  • Estimated Annotations Are Saved

    • Can be viewed and reverted to upon hover on the user's annotation.

  • Applications Management Screen

    • A list of all applications can now be accessed via the sidebar. Enabling a central place to see the applications processing status, application info log, and add or delete applications.

🚧

Breaking Changes (REST API)

Changes in setting verbosity in SDK

SDK init param changed from verbose to silent_mode - Default is False, hence you suppose to expect exceptions to be thrown, if you do not want exception - set it to True
dc_client.init(host=DEEPCHECKS_LLM_HOST, api_token=DEEPCHECKS_LLM_API_KEY, app_name=DEEPCHECKS_APP_NAME, version_name=version, env_type=env_type, silent_mode=False,

REST API changes in the /interaction endpoint.

  • Body params change:
    • data InteractionCreationSchema -> List[InteractionCreationSchema]
    • env_typewas excluded from data and moved to be top level
    • app_namewas excluded from data and moved to be top level
    • version_namewas excluded from data and moved to be top level
  • Before:
    { "user_interaction_id": "string", "full_prompt": "string", "information_retrieval": "string", "input": "string", "output": "string", "annotation": "good", "annotation_reason": "string", "app_name": "string", "version_name": "string", "env_type": "PROD", "raw_json_data": {}, "steps": [ { "name": "string", "annotation": "good", "type": "LLM", "attributes": {}, "started_at": "2024-01-18T07:48:25.156Z", "finished_at": "2024-01-18T07:48:25.156Z", "input": "string", "output": "string", "error": "string" } ], "custom_props": {}, "started_at": "2024-01-18T07:48:25.156Z", "finished_at": "2024-01-18T07:48:25.156Z" }
  • After:
    { "app_name": "string", "version_name": "string", "env_type": "PROD", "interactions": [ { "user_interaction_id": "string", "full_prompt": "string", "information_retrieval": "string", "input": "string", "output": "string", "annotation": "good", "annotation_reason": "string", "raw_json_data": {}, "steps": [ { "name": "string", "annotation": "good", "type": "LLM", "attributes": {}, "started_at": "2024-01-18T07:50:26.707Z", "finished_at": "2024-01-18T07:50:26.707Z", "input": "string", "output": "string", "error": "string" } ], "custom_props": {}, "started_at": "2024-01-18T07:50:26.707Z", "finished_at": "2024-01-18T07:50:26.707Z" } ] }