Agents Demo: Investment Agent Data

Evaluating and debugging an Agent application, step by step

🏃‍♀️

Jump right in by Creating an Application and Uploading the Data

Use Case Background

This demo shows how to evaluate and monitor a multi-tool financial AI agent using Deepchecks. Our example features a CrewAI-powered investment advisor that handles complex, multi-step financial workflows with real financial tools.

Agent Capabilities:

  • Market Data: Stock prices, financial fundamentals, technical analysis, and historical data
  • Research: Company news, information, and analyst recommendations
  • Portfolio Management: Trading execution and portfolio information retrieval
  • Currency Services: Real-time currency conversion

Interaction Types

The agent processes user queries through two distinct interaction types:

  1. Tool Use - Evaluates how the agent plans and executes tool calls to gather information
  2. Generation - Evaluates the quality of the final response the agent returns to the user

Interaction Input - Output Example

Tool Use:

  • Interaction Input - User's query
  • Interaction Output - Agent's thought process
  • Action - Agent's tool calling
  • Tool Response - The response from calling the tool

Generation:

  • Interaction Input - User's query
  • Interaction Output - Agent's final response

Demo Structure

This tutorial walks you through a complete agent evaluation workflow:

  1. Upload Your Data - Learn about different agent configurations and data formats
  2. Analyze Performance - Understand sessions, tool use, and generation metrics
  3. Root Cause Analysis - Add custom properties and refine evaluation criteria
  4. Compare Versions - Optimize agent architecture and model selection
  5. Monitor Production - Track real-world performance over time

Let's begin by setting up your data and understanding the different agent configurations we'll be evaluating.