Uploading the Data

We'll evaluate three base agent configurations plus multiple model variants to understand both architectural and model-specific performance differences.

Financial Agent Versions

Each version represents a different approach to agent design, allowing us to isolate the impact of specific improvements:

Basic Configuration: Simple prompts with limited tools (basic price data, portfolio information, company news)
Improved Prompts: Enhanced, context-aware prompts and tool descriptions while maintaining the same limited toolset as Basic Configuration
Full Toolset: Combines improved prompts with expanded tools including analyst recommendations, technical indicators, advanced portfolio management, and comprehensive market data

We'll also test additional model variants using the Full Toolset configuration with different LLMs to optimize cost versus quality.

To evaluate both the "Tool Use" and the "Generation", we make sure each CSV contains the following columns:

Tool Use

Note: The session_id connects interactions within the same session.

session_id	user_interaction_id	interaction_type	input	output	full_prompt	history	tool_response	action	started_at	finished_at
Used to connect different interactions within the same session	Must be unique within a single version.	The type of the interaction - in this case, "Tool Use".	(mandatory) The user’s input.	The agent's thought process or planning.	The list of available tools and any other relevant instructions provided to the agent.	A list of past actions taken by the agent, including the thoughts, actions and tool responses.	Information returned from the tool.	The action itself—specifically, tool invocation.	The timestamp indicating when the interaction began.	The timestamp indicating when the interaction ended.

Generation

Note: The session_id connects interactions within the same session.

session_id	user_interaction_id	interaction_type	input	output	full_prompt	information_retrieval	started_at	finished_at
Used to connect different interactions within the same session	Must be unique within a single version.	The type of the interaction - in this case, "Generation".	(mandatory) The user’s input.	(mandatory) The response generated for the user.	Relevant instructions that guided the response generation.	The actions previously taken by the agent, along with the corresponding responses returned by the tools.	The timestamp indicating when the interaction began.	The timestamp indicating when the interaction ended.

Upload Options

Option 1: Use Deepchecks' Python SDK

📘
Open the Demo Notebook via Colab or Download it Locally
Click the badge below to open the Google Colab or click here to download the Notebook, set in your API token (see below), and you're ready to go!

Get your API key from the user menu in the app:

If running locally, we recommend the best practice of using a python virtual environment to install the Deepchecks client SDK.

Option 2: Use the Deepchecks' UI

📘
Click here to Download the Demo Data
You'll see there the nine demo datasets used in this example

Setup steps:

Click the Create New Application on the Applications page.
Name it "Investment_Agent" and set the default Interaction Type to "Tool Use".
Click The "Upload Data" button on the screen's bottom left corner.
Create a new version called Basic Configuration and upload the relevant CSV file
Repeat the process for versions Improved Prompts and Full Toolset

By assigning the same Session ID to related interactions, you can view the agent's complete workflow, including initial planning, tool execution, and final response, all in a unified session view.

Uploading Production Data

Select the "Full Toolset" version
Switch environment from "evaluation" to "production"
Upload the production dataset

👍
Success! The Investment Agent Application is now in the Deepchecks App

Some properties take a few minutes to calculate, so some of the data - such as properties and estimated annotations will be updated over time.

You'll see a ✅ Completed Processing Status in the Applications page, when processing is finished.

Uploading the Data

Financial Agent Versions

Tool Use

Generation