Agent Properties
Deepchecks provides built-in properties tailored for agentic workflows, giving you structured, research-backed ways to evaluate how agents plan, act, and use tools effectively
Introduction
Evaluating agentic workflows requires a different lens than evaluating raw LLM outputs.
Agents don’t just generate text - they plan, decide which tools to use, and chain multiple steps together toward a goal.
To capture this complexity, Deepchecks provides built-in properties designed specifically for Agent and Tool interactions.
These properties help answer questions such as:
- Did the agent create and follow an effective plan?
- Did the tools used actually provide the coverage needed for the goal?
- Did each tool response fully satisfy its intended purpose?
The following properties are available out of the box:
- Plan Efficiency (Agent interaction type)
- Tool Coverage (Agent interaction type)
- Tool Completeness (Tool interaction type)
Agent-specific vs. general and custom properties
This page highlights properties built for evaluating agent workflows, but many of Deepchecks’ general built-in properties can also be applied to agents. In addition, just like in other use cases, you can define custom properties tailored to your unique goals and evaluation needs. See the full list of built-in properties here. and details for creating prompt properties here.
Agent Use Case Properties
Plan Efficiency
Interaction type: Agent
This property assigns a 1-5 score measuring how well an agent’s execution aligns with its stated plan. A high score indicates that the agent built a clear and effective plan, carried it out correctly, and adapted intelligently when needed, while lower scores highlight skipped steps, contradictions, or unresolved requests. For example, “All planned steps executed and evidenced with tool outputs [4],[6]. User request for a complete, detailed report was fully addressed. No omissions, fabrications, or procedural errors.” would receive a high score.

Example of a Planning Efficiency score and reasoning on an Agent span
Tool Coverage
Interaction type: Agent
This property measures, on a 1-5 scale, how well the set of tool responses produced by the agent cover the overall goal. It reflects whether the evidence gathered is sufficient to fulfill the agent’s main query: a high score means all relevant aspects were addressed with appropriate tool use, while lower scores signal partial or missing coverage. For instance, “All requested analysis units - products, competitors, market trends, and actionable marketing insights - are fully covered with current, multi-source evidence.” would receive a high score.

Example of a Tool Coverage score and reasoning on an Agent span
Using a span's children's data fields to calculate properties
Some Deepchecks built-in properties for complex AI pipelines use not only a span’s own data fields but also those of its child spans. For example, Plan Efficiency and Tool Coverage both evaluates different spans that descendants of the Agent span to calculate their score.
Tool Completeness
Interaction type: Tool
This property assigns a 1-5 score indicating how fully a single tool’s output fulfills its intended purpose. Strong completeness means the tool produced a usable, correct, and thorough response to its invocation, while lower scores indicate partial, irrelevant, or error results. For example, for the query “Retrieve the USD→EUR mid-market exchange rate for 2025-08-25”, the response { "base": "USD", "quote": "EUR", "rate": 0.9132, "timestamp": "2025-08-25T12:00:00Z" }
would score 5, since it fully satisfies the request.

Example of a Tool Completeness score and reasoning on a Tool span
Using a span's sibling's data fields to calculate properties
Some Deepchecks properties use sibling spans to add context during evaluation. For instance, Tool Completeness references the preceding LLM span to capture the intended tool description, even when it isn’t explicitly passed into the tool input, ensuring the response is judged against the correct request.
Updated 3 days ago