Multi-Agent Demo: Content Creator Crew
Evaluating and debugging a Multi-Agent application, step by step.
The Use Case
This demo evaluates a CrewAI-powered multi-agent content creation workflow. The workflow is designed to create a blog post based on a given topic, target audience, and optional context. The workflow is evaluated using Deepchecks to identify performance issues and root causes.
The Workflow
The system consists of three sequential agents that collaborate to create blog posts.
Each agent has specific tools and capabilities to perform its role.
The Problem
The agentic workflow produces generic content despite having enhancement capabilities.
We use Deepchecks to trace the multi-agent workflow and identify why and where our workflow underperforms.
This transforms a black-box system into an observable, debuggable process.
Evaluation Workflow
This demo follows a systematic evaluation process:
-
Log evaluation data
Create the crew configuration and run test cases through the baseline workflow.
Log traces to Deepchecks. -
Analyze the Base Version
Examine the baseline version to identify performance issues through automated properties and auto-annotation. -
Root Cause Analysis
Use the Score Breakdown, Property Failure Analysis, and Insights to help us understand the issue and fix it. -
Compare Versions
Test the new configuration and see that our issue has been resolved. Try out multiple models to balance result quality with cost.
Updated about 11 hours ago