over 1 year ago

0.25.0 Release Notes

by Shir Chorev

This version includes new user roles, updated design for the expected output data, and metadata information for the automatic annotation pipeline, along with more features, stability and performance improvements, that are part of our 0.25.0 release.

Deepchecks LLM Evaluation 0.25.0 Release

🎨 New Expected Output Design
⏳ Estimated Annotations Configuration - Metadata & Download
🫵 User Roles

What’s New and Improved?

Expected Output Design
- When "expected_output"s are logged for an interaction, they are now conveniently available alongside the original output, allowing easy comparison, highlighting and evaluating alongside the Expected Output Similarity property.
Estimated Annotations Configuration Updates
- The interaction type auto annotation configuration now allows:
  - Seeing when the auto-annotation YAML was last uploaded and by whom.
  - Downloading the current or default (preset) configuration for that interaction type.
User roles
- Deepchecks now supports different user roles. The following are the three preset roles:
  - Viewers - can view the applications and data inside the Deepchecks system
  - Members - can upload data, update the properties and evaluation configurations
  - Admins - full control, including inviting and removing users from organization, and organization deletion

over 1 year ago

0.24.0 Release Notes

by Shir Chorev

This version includes support for expected outputs (comparison to ground truth) and customization of interaction types for evaluation, along with more features, stability and performance improvements, that are part of our 0.24.0 release.

Deepchecks LLM Evaluation 0.24.0 Release

✅ Support for Expected Outputs for Evaluation Data Comparison
🥗 Custom Interaction Types & Configuration

What’s New and Improved?

Support for Expected Outputs for Evaluation Data Comparison
- You can now send an expected_output field, allowing you to log your ground truths alongside your outputs
- Expected Output Similarity Property - Deepchecks built in property for assessing the accuracy of your output in comparison to the ground truth. 5 is highly accurate and 1 is the opposite. This is used for identifying wrong outputs in the auto-annotation configuration. Read more about this property here.
Custom Interaction Types & Configuration
- Update to the Interaction Types screen, including the auto-annotation configuration which is now available here.
- You can now define your custom interaction types alongside the Deepchecks preset ones. Choose an icon, name, the desired properties and your auto-annotation configuration and you’re ready to go.
- When defining a new interaction type you can either start from scratch, or use as a template any of the interaction types that you already have defined in your current app.

🚧

Note: SDK Breaking Changes

All calls to log_batch_interactions, are now done using the LogInteraction object, which is a renaming of the previous LogInteractionType object.

Previosly:

dc_client.log_batch_interactions(
	app_name="app", version_name="version", env_type=EnvType.EVAL,
  interactions=LogInteractionType(
  	input="input",
    output="output",
    user_interaction_id="id",
    interaction_type="Q&A",
    session_id="session-id"  
  )
)

now:

dc_client.log_batch_interactions(
	app_name="app", version_name="version", env_type=EnvType.EVAL,
  interactions=LogInteraction(
  	input="input",
    output="output",
    user_interaction_id="id",
    interaction_type="Q&A",
    session_id="session-id"  
  )
)

over 1 year ago

0.23.0 Release Notes

by Shir Chorev

This version introduces the concept of Sessions, enabling better organization and analysis of interactions across complex workflows such as agents. This capability is now fully integrated across the platform, including SDK support for managing and interacting with session-level data. The Sessions concept, along with additional improvements, stability updates, and performance enhancements, is part of our 0.23.0 release.

Deepchecks LLM Evaluation 0.23.0 Release

🧮 New Sessions Layer, and SDK Enhancements Supporting it
🔡 Data Screen Content Search
⛏️ Feature Extraction Interaction Type

What’s New and Improved?

New Sessions Layer for evaluating and viewing multi-phase and agentic workflows
- Sessions introduce a new hierarchy for organizing interactions, allowing users to logically group related activities, such as conversations or tasks split into multiple steps.
- When opening an interaction on the data screen, you can see all interactions associated with the same session id
- More info about SDK adaptations below
Data Screen Content Search
- Interactions can now be searched for in Data screen based on interaction content and not only IDs.
- This is selectable using the search filters in the Data screen.
Feature Extraction Interaction Type
- Feature Extraction is an interaction type dedicated to cases where information is extracted from a text into a predefined format (e.g. a JSON schema). This interaction type also presents three new properties that excel in evaluating an LLM's performance in an extraction task.

Session SDK/API Enhancements

Session Support inLogInteraction

Introduced the optional session_id parameter in the LogInteraction class, enabling developers to assign custom session identifiers to group related interactions.

If session_id is omitted, the system generates a unique session ID automatically.

from deepchecks_llm_client.data_types import LogInteraction
from datetime import datetime

single_sample = LogInteraction(
    user_interaction_id="id-1",
    input="my user input1",
    output="my model output1",
    started_at="2024-09-01T23:59:59",
    finished_at=datetime.now().astimezone(),
    annotation="Good",  # Either Good, Bad, Unknown, or None
    interaction_type="Generation",  # Optional. Defaults to the application's default type if not provided.
    session_id="session-1",  # Optional. Groups related interactions; auto-generated if not provided.
)

Session Support in Stream Upload

Added support for session_id in stream upload via the log_interaction method, facilitating real-time tracking of interactions within sessions.

dc_client.log_interaction(
    app_name="DemoApp",
    version_name="v1",
    env_type=EnvType.EVAL,
    user_interaction_id="id-1",
    input="My Input",
    session_id="session-1",
    is_completed=False,
)

Session-Based Filtering inget_data
- Enhanced the get_data method to include filtering by session_ids, providing greater flexibility in retrieving session-specific data.
```
dc_client.get_data(
    app_name="MyAppName",
    version_name="MyVersionName",
    environment=EnvType.EVAL,
    session_ids=["session-1", "session-2"],
)
```

over 1 year ago

0.22.0 Release Notes

by Shir Chorev

This version adds support for multi-step workflows, by allowing different types of interactions within a single application. Properties and annotations now run on the Interaction Type level. This, alongside additional improvements such as to the Grounded in Context property, UI simplifications, stability improvements and performance enhancements, are part of our 0.22.0 release.

Deepchecks LLM Evaluation 0.22.0 Release

🚀 Enhanced Support for Complex Applications
- 🧩 New Interaction Types Layer
- 🔄 SDK Updates
☝️ Improved Grounded in Context Property
🟣 Simplified Versions and Auto-annotation Screen

What’s New and Improved?

Enhanced Support for Complex Applications - Interaction Types
- Applications now natively support multi-phase workflows.
- Interaction types allow specifying a distinct type for each phase in the application, allowing to adapt the properties and evaluation for that logical phase. Supported predefined types include Q&A, Summarization, Generation, Classification, and Other.
- For more details about configuring the Properties and annotation on the Interaction Type level, see Properties and Auto-Annotation YAML Configuration.
SDK/API Updates
- The app_type parameter now determines the default interaction type for all interactions within an application. This provides a more intuitive setup and ensures consistent property evaluation.
```
# Example usage
dc_client.create_application(APP_NAME,
                             app_type=ApplicationType.QA)
```
- The new LogInteraction class introduces support for the optional interaction_type parameter, allowing you to specify the type of interaction directly when logging.
  Note:While LogInteractionType is still supported for backward compatibility, we recommend transitioning to LogInteraction as LogInteractionType will be deprecated in future versions.
```
from deepchecks_llm_client.data_types import LogInteraction

single_sample = LogInteraction(
    user_interaction_id="id-1",
    input="my user input1",
    output="my model output1",
    started_at="2024-09-01T23:59:59",
    finished_at=datetime.now().astimezone(),
    annotation="Good",  # Either Good, Bad, Unknown, or None
    interaction_type="Generation"  # Optional. Defaults to the application's default type if not provided.
)
```
- Interaction types can now be specified in SDK methods designed for creating or retrieving interactions. Methods for logging interactions, such as log_interaction and log_batch_interactions, now allow assigning interaction types during creation. Similarly, data retrieval methods like get_data and data_iterator support an interaction_types array, enabling filtering and retrieval based on specific interaction types. For more, see Deepchecks' SDK.

over 1 year ago

0.21.0 Release Notes

by Shir Chorev

This version aligns capabilities and versions across Deepchecks Multi-tenant SaaS alongside SageMaker Partner AI Apps, towards the AWS re:Invent launch.

over 1 year ago

0.20.0 Release Notes

by Shir Chorev

This version includes new history field, enhancements to llm properties and improved explainability highlighting, along with more features, demos, stability and performance improvements, that are part of our 0.20.0 release.

Deepchecks LLM Evaluation 0.20.0 Release

💬 New “History” field
🏦 LLM properties bank enhancements
🟣 Multiple line highlighting for property explainability
🍿 Use case demos: Classification and Guardrails
📩 Data logging: partial interaction logging, steps download and upload

What’s New and Improved?

New special field: History
- For supplying previous historical context, such as chat history. Relevant properties will use the “History” field as additional context for checking property values.
LLM properties bank enhancements
- Added new prompts and improved prompt performance. Includes unifying the “Completeness” prompt template into one (Non-Q&A use cases have the “coverage” built-in property for uncovering issues such as a non-complete summary).
Multiple line highlighting for explainability
- Now properties such as “Grounded in Context”, “PII”, can display more than one area attributing to the highest/lowest scores, allowing efficient RCA
New demos
- Classification Demo for working with deepchecks on a classification use casue
- Guardrails for configuring guardrails in production
Data logging
- An interaction can now be gradually logged, in separate parts, useful for example for production flows: Stream Upload Documentation.
- Interaction steps can now be downloaded and uploaded via csv and SDK

over 1 year ago

0.19.0 Release Notes

by Shir Chorev

This version includes expanded explainability for properties, multi-category property support, multi-label classification support, and enhancements to the documentation, along with more features, stability and performance improvements, that are part of our 0.19.0 release.

Deepchecks LLM Evaluation 0.19.0 Release

🌈 Highlighting of properties for explainability
🎡 Multi-label support for properties and classification use cases
🗒️ Docs Additions: Data Hierarchy and SDK Guide
➕ Updates to Auto-annotation flow and to Steps upload

What’s New and Improved?

Highlighting of properties for explainability
- Explainability highlighting for more properties: PII, Information Density, Coverage
Multi-label support for properties and classification use cases
- Classification Demo: Movie Genre
Docs Additions
- Hierarchy & Data Upload Format
- SDK Quickstart
Updates to Auto-annotation flow and to Steps upload
- While new recalculation is in progress, previously estimated annotations will be changed to state “Pending”, and then overridden by new estimate
- Information retrieval support going forward: only as a designated field, and not as an information retrieval “step” ⚠️

almost 2 years ago

0.18.0 Release Notes

by Shir Chorev

This version includes improved understanding of your version’s performance with root cause analysis, added visibility the system’s usage, along with more features, stability and performance improvements, that are part of our 0.18.0 release.

Deepchecks LLM Evaluation 0.18.0 Release

💡 Version Insights Enhancements
🔎 Score Reasoning Breakdown
📶 Usage Plan Visibility
🦸‍♀️ Improvement to PII Property
⚖️⚖️ Versions Page Updates
📈 Production Overtime View Improvements

What’s New and Improved?

Version Insights Enhancements
- Explainable insights with analysis, and actionable suggestion. Insights are based on property values. They Can be seen in the "Overview" screen per version, and in "Versions" screen to see application-wide insights with link to the relevant version.
Score Reasoning Breakdown
- According to Annotation Reason. Click the “Show breakdown” next to the Score on the Dashboard.
Usage Plan Visibility
- Displaying your stats and limits (Applications, Users, Processed data tokens) in the "Usage Tab" in the workspace settings
Improvement to PII Property
- Combined several mechanisms to widen detection and improve detection recall and precision on a wide set of benchmarks
Versions Page Updates
- Added high level data on version: insights on specific versions, and widget showing the most recent version, best performing one, etc.
Production Overtime View Improvements
- Default view now loads time ranges of most recent weeks of data

almost 2 years ago

0.17.0 Release Notes

by Shir Chorev

This version which includes a new overtime view for property values and score for production data, and exciting property related improvements, along with more features, stability and performance improvements, that are part of our 0.17.0 release.

Deepchecks LLM Evaluation 0.17.0 Release

📉 Overtime Production View for Monitoring
💬 New Property: Information Density
🏃🏼‍♀️‍➡️ Ability to Rerun Annotation Pipeline on Multiple Versions in Application
🔤 OpenAI Support for LLM Properties
🥑 Improvements to Relevance and Grounded in Context Properties

What’s New and Improved?

Overtime Production View for Monitoring
- In Production Environment, annotation scores and property scores are displayed over time
- Timestamps are taken from the "started_at" field for each interaction. If no timestamp was give, current time of upload will be considered as interaction time.
New Property: Information Density
- Information density is a score between 0 and 1, measuring the ratio of statements that convey information (e.g. facts, suggestions), out of all statements in the output. Read more about it in the Information Density Property documentation.
- It helps finding places where the outputs aren’t actually useful, whether if the desired information is missing (e.g. the answer is not complete), avoided, or very general without directly addressing the user’s desire.
Rerun Annotation Pipeline on Multiple Versions
- After uploading a new customized “Auto-Annotation YAML” in the Annotation Config screen, the annotation can now be conveniently rerun on all versions and environment in application, with the “Run Annotation Pipeline” button on the top right.
OpenAI Support for LLM Properties
- LLM Properties can now run using OpenAI (vs. Azure OpenAI). This setting is organization wide, and can be enabled by request.
Improvements to Relevance and Grounded in Context Properties
- Improved recall and precision on a wide set of benchmarks. Improving accuracy of Grounded in Context score also when grounding information is distributed between different documents.

almost 2 years ago

0.16.0 Release Notes

by Shir Chorev

This version includes improvements to the grounded in context model, explainability for grounded in context and retrieval relevancy properties, supporting GPT-4o in the backend, english translation visibility, along with more features, stability and performance improvements, that are part of our 0.16.0 release.

Deepchecks LLM Evaluation 0.16.0 Release

🤫 Updated Model for the Grounded in Context Property
🧮 Properties Explainability
💬 Updated LLM Reasoning View
🔡 Translation to English is Viewable
🚦 GPT-4o Supported in LLM Properties

What’s New and Improved?

Updated Model for the Grounded in Context Property
- Improved recall and precision on a wide set of benchmarks, enabling accurate identification of problems.
Properties Explainability
- Upon click, Grounded in Context displays the least grounded part of output, and similarly Retrieval Relevance marks the most relevant part of the information retrieval.
Updated LLM Reasoning View
- LLM reasoning is now viewable upon click
Translation to English is Viewable
- For applications where translation is enabled, the translated english version can be viewed
GPT-4o Supported in LLM Properties
- LLM Properties can now run using GPT-4o. This setting is organization wide, and can be enabled by request.