Root Cause Analysis: Investigating Performance Issues

Now that you've explored the Overview, let's dive deeper into understanding why certain interactions are failing. We'll use Deepchecks' Root Cause Analysis features to systematically investigate performance issues and implement targeted improvements.

Investigating Instruction Fulfillment Issues

In the Basic Configuration version from the Generation overview page, you can see various property scores for your agent. Notice that some properties like "Instruction Fulfillment" (2.9) and "Avoided Answer" (0.22) indicate areas needing attention.

To begin our root cause analysis, click the "Analyze Property Failures" button in the top right corner of the Generation Properties view.

Analyzing Instruction Fulfillment Failures

Press Analyze Property Failures

Choose Instruction Fulfillment from the dropdown

Click the Generate Summary button

The system will analyze all failed interactions and categorize them into common failure patterns. For Instruction Fulfillment, you'll see a detailed breakdown:

For more comprehensive information on root cause analysis techniques, see our complete Root Cause Analysis guide.

Investigating the "Avoided Answer" Problem

The Avoided Answer score of 0.22 is concerning, as it suggests the agent frequently declines to respond to questions, let's investigate this pattern.

Navigate to the Data Tab
Add the Avoided Answer property column
Sort the data by Avoided Answer

What we discover: A few avoided answers involve topics outside the agent's scope - cryptocurrency questions, tax advice, weather queries, and general knowledge requests that can't be addressed with available financial tools.

Automatic Annotation Configuration

Now we'll create a custom property and update our annotation logic to better handle these different query types.

Adding a Custom Categorial Prompt Property

Note: make sure you're in Generation interaction type on the top right.

Create a categorical property to distinguish relevant from irrelevant queries:

Property Configuration:

Name: "Relevant Topic"
Description:

"Classifies user queries as relevant to financial investment advisory services or irrelevant topics outside the agent's scope"

Categories:
- "Relevant": Financial queries addressable with available tools
- "Irrelevant": Weather, general knowledge, taxes, crypto, non-financial topics
Guidelines:

"Classify the query as 'relevant' only if it can be directly addressed using the following tools: stock price, stock financial fundamentals, company news, company info, analyst recommendations, portfolio management, or currency conversion. Questions about market hours, time zones, taxes, crypto currency, and tariffs or general knowledge should be classified as 'irrelevant'."

Set the property to evaluate Input only and test with sample interactions.

Recalculate Property

After saving the property's definition, recalculate it with the following definitions (by default properties are calculated only for data that is uploaded after they're defined, so recalculation is needed to get their results for the data that is already in the system):
Versions: Select All
Environment: Both

We can see our new property in the overview page in Generation:

Automatic Annotation - YAML Configuration

Modify the YAML configuration to account for your new "Relevant Topic" property:

New logic: Interactions with Avoided Answer scores above 0.3 that are classified as "Irrelevant" will be annotated as good interactions, since avoiding irrelevant questions is desired behavior.

We will upload the following YAML and upload it to recalculate annotations.

Read more about Customizing the Auto Annotation Configuration and how to upload the above configuration.

Results:

Important Discovery: While we successfully identified and properly classified avoided answer patterns, the overall score improvement was minimal. This reveals that the primary performance bottlenecks lie with Instruction Fulfillment on relevant financial queries rather than inappropriate handling of irrelevant questions.

Leveraging Automatic Insights

The Insights panel provides automated analysis of performance patterns, identifying the most critical issues affecting your pipeline.

Tool Use:

The analysis reveals inefficient planning affecting 19 interactions and irrelevant tool responses in 18 interactions, indicating the agent struggles with systematic task execution and information quality.

Generation:

These insights reveal that the agent's primary weakness isn't in understanding user queries, but in gathering sufficient information to provide comprehensive responses.

Systematic Improvement Strategy:

These insights reveal a clear progression of needed improvements:

Enhanced Prompts: First, we'll improve the agent's planning and reasoning through better prompt engineering that encourages systematic tool usage, comprehensive data collection, and explicit reasoning about information sufficiency.
Expanded Capabilities: Next, we'll enhance the agent's data gathering capabilities by expanding the toolset with more comprehensive financial tools, better retrieval mechanisms, and improved information processing capabilities.

You'll see these systematic improvements implemented and validated in the Compare Between Versions section, where Improved Prompts addresses the planning and reasoning issues, while Full Toolset tackles the information retrieval and capability limitations.

Armed with these insights about the real performance bottlenecks, you're ready to:

Compare Between Versions to see how improved prompts and expanded toolsets address instruction fulfillment issues
Monitor Production Data with these refined evaluation criteria to track meaningful improvements