Selecting the Right Properties

Properties are the backbone of Deepchecks' evaluation system, enabling automated scoring across different interaction types. Each type comes with a default set of built-in properties chosen by Deepchecks to reflect common evaluation needs. However, these defaults may not always match your specific goals.

To build the right evaluation pipeline, start by defining your quality standards and business objectives. Then select the properties that best align with them. You can customize your setup by adding or removing properties on the Properties page. Deepchecks provides a rich catalog of built-in options and supports user-defined and prompt-specific properties—allowing you to tailor evaluations to your app's exact needs.

Beyond their role in automated annotation, properties can also help track performance trends or assist in Root Cause Analysis (RCA). Useful examples include Input Sentiment or Text Length.

Examples

Customer Support Q&A BOT

Consider a customer support Q&A chatbot that answers user questions through a RAG (Retrieval-Augmented Generation) process. In this use case, you may prioritize concise responses following a predefined structure over providing users with comprehensive information about their queries.

For such a use case, we would recommend removing the Retrieval Utilization property and adding Instruction Fulfillment as an evaluation property to better align the auto annotation pipeline with your quality objectives.

Medical Reports Summarization

Consider a medical AI system that generates summaries of patient reports for healthcare professionals. In this use case, accuracy and appropriate medical terminology are crucial, as healthcare providers need precise, professionally written summaries that maintain clinical accuracy while being concise.

For such a use case, in addition to classic summarization metrics such as Coverage and Grounded in Context, users may also want to add a custom property that evaluates the appropriate use of medical terminology when needed, rather than avoiding technical language. This would be implemented as a custom prompt property to ensure clinical accuracy and professional standards.