Uploading the Data
Follow step by step to upload the data in two ways - python SDK or Deepchecks UI
We'll evaluate three base agent configurations plus multiple model variants to understand both architectural and model-specific performance differences.
Financial Agent Versions
Each version represents a different approach to agent design, allowing us to isolate the impact of specific improvements:
- Basic Configuration: Simple prompts with limited tools (basic price data, portfolio information, company news)
- Improved Prompts: Enhanced, context-aware prompts and tool descriptions while maintaining the same limited toolset as Basic Configuration
- Full Toolset: Combines improved prompts with expanded tools including analyst recommendations, technical indicators, advanced portfolio management, and comprehensive market data
We'll also test additional model variants using the Full Toolset configuration with different LLMs to optimize cost versus quality.
To evaluate both the "Tool Use" and the "Generation", we make sure each CSV contains the following columns:
Tool Use
Note: The session_id connects interactions within the same session.
session_id | user_interaction_id | interaction_type | input | output | full_prompt | history | tool_response | action | started_at | finished_at |
---|---|---|---|---|---|---|---|---|---|---|
Used to connect different interactions within the same session | Must be unique within a single version. | The type of the interaction - in this case, "Tool Use". | (mandatory) The user’s input. | The agent's thought process or planning. | The list of available tools and any other relevant instructions provided to the agent. | A list of past actions taken by the agent, including the thoughts, actions and tool responses. | Information returned from the tool. | The action itself—specifically, tool invocation. | The timestamp indicating when the interaction began. | The timestamp indicating when the interaction ended. |
Generation
Note: The session_id connects interactions within the same session.
session_id | user_interaction_id | interaction_type | input | output | full_prompt | information_retrieval | started_at | finished_at |
---|---|---|---|---|---|---|---|---|
Used to connect different interactions within the same session | Must be unique within a single version. | The type of the interaction - in this case, "Generation". | (mandatory) The user’s input. | (mandatory) The response generated for the user. | Relevant instructions that guided the response generation. | The actions previously taken by the agent, along with the corresponding responses returned by the tools. | The timestamp indicating when the interaction began. | The timestamp indicating when the interaction ended. |
Upload Options
Option 1: Use Deepchecks' Python SDK
Open the Demo Notebook via Colab or Download it Locally
Click the badge below to open the Google Colab or click here to download the Notebook, set in your API token (see below), and you're ready to go!
Get your API key from the user menu in the app:
If running locally, we recommend the best practice of using a python virtual environment to install the Deepchecks client SDK.
Option 2: Use the Deepchecks' UI
Click here to Download the Demo Data
You'll see there the nine demo datasets used in this example
Setup steps:
-
Click the Create New Application on the Applications page.
-
Name it "Investment_Agent" and set the default Interaction Type to "Tool Use".
-
Click The "Upload Data" button on the screen's bottom left corner.
-
Create a new version called Basic Configuration and upload the relevant CSV file
-
Repeat the process for versions Improved Prompts and Full Toolset
By assigning the same Session ID to related interactions, you can view the agent's complete workflow, including initial planning, tool execution, and final response, all in a unified session view.

Uploading Production Data
- Select the "Full Toolset" version
- Switch environment from "evaluation" to "production"
- Upload the production dataset

Success! The Investment Agent Application is now in the Deepchecks App
- Some properties take a few minutes to calculate, so some of the data - such as properties and estimated annotations will be updated over time.
- You'll see a ✅ Completed Processing Status in the Applications page, when processing is finished.
Updated 35 minutes ago