DocumentationAPI ReferenceRelease Notes
DocumentationLog In
Documentation

Datadog Integration

Datadog is a leading APM solution. In this guide we'll cover how you can use our dedicated server-to-server integration with Datadog to monitor your LLM application.

If you have Datadog in your organization, we warmly recommend using this integration to take advantage of deepchecks estimated annotations, properties and more to help you monitor the quality of you LLM application in production.

How to enable the integration?

Configuring Datadog API Key in Deepchecks

In Datadog, under organization settings, configure a new, dedicated API Key for the integration

Using Deepchecks REST API, enable the Datadog integration and pass Datadog's API Key.
Getting Started with our REST API
Set Datadog User Config

Make sure the integration is enabled - both in system_config and user_config, via theGet Config API. If system_config is disabled, please reach out to your account manager at Deepchecks to inquire about enabling it for your organization.

Upload production data to Deepchecks

To test the integration, you can upload data using CSV file from the UI (Make sure to upload the data into the "Production" environment).

For an ongoing production integration you can use our Python SDK or alternatively work directly with our REST API. Deepchecks will send the data about a specific interaction to Datadog only after estimated annotations have been computed for it. Make sure you have estimated annotations have completed for the data you have uploaded before moving to the next phase.

Verifying that the data reached Datadog

Deepchecks send the data as log entries to Datadog. Using logs, you can decide what log fields to use as measures or facets, and enable monitors and dashboard widgets.

Go to "logs" in Datadog and make sure deepchecks data reached successfully:

Configuring Log attributes as measures and facets

Go over log attributes and convert them into measure for numbers and facets for strings:

Import a Dashboard into Datadog

If you want to use our pre-configured dashboard for Deepchecks, copy the following json, save it locally and import it into Datadog:

{"title":"Monitor LLM","description":"[[suggested_dashboards]]","widgets":[{"id":4873129968030932,"definition":{"type":"image","url":"https://files.readme.io/9400bc7-logo-dc-llm.svg","sizing":"contain","margin":"md","has_background":false,"has_border":false,"vertical_align":"center","horizontal_align":"center"},"layout":{"x":0,"y":0,"width":6,"height":3}},{"id":6633413186649232,"definition":{"title":"Number of Interactions Per Application/Version","title_size":"16","title_align":"left","requests":[{"response_format":"scalar","formulas":[{"formula":"query1"}],"queries":[{"data_source":"logs","name":"query1","indexes":["*"],"compute":{"aggregation":"count"},"group_by":[{"facet":"@deepchecks_interaction.context.application_name","limit":10,"sort":{"order":"desc","aggregation":"count"}},{"facet":"@deepchecks_interaction.context.application_version_name","limit":10,"sort":{"order":"desc","aggregation":"count"}}],"search":{"query":"$application $version $environment $source"},"storage":"hot"}],"style":{"palette":"datadog16"},"sort":{"count":10,"order_by":[{"type":"formula","index":0,"order":"desc"}]}}],"type":"sunburst","legend":{"type":"automatic"}},"layout":{"x":6,"y":0,"width":3,"height":3}},{"id":3419482205985922,"definition":{"title":"Interactions Count by Estimated Annotations","title_size":"16","title_align":"left","requests":[{"response_format":"scalar","formulas":[{"formula":"query1"}],"queries":[{"data_source":"logs","name":"query1","indexes":["*"],"compute":{"aggregation":"count"},"group_by":[{"facet":"@deepchecks_interaction.estimated_annotation","limit":10,"sort":{"order":"desc","aggregation":"count"}}],"search":{"query":"$application $version $environment $source"},"storage":"hot"}],"style":{"palette":"datadog16"},"sort":{"count":10,"order_by":[{"type":"formula","index":0,"order":"desc"}]}}],"type":"sunburst","legend":{"type":"automatic"}},"layout":{"x":9,"y":0,"width":3,"height":3}},{"id":8336015370009554,"definition":{"title":"Good Estimated Annotation Ratio By Application/Version","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"alias":"good ratio","formula":"query1 / query2"}],"queries":[{"data_source":"logs","name":"query1","indexes":["*"],"compute":{"aggregation":"count","interval":600000},"group_by":[{"facet":"@deepchecks_interaction.context.application_name","limit":10,"sort":{"order":"desc","aggregation":"count"}},{"facet":"@deepchecks_interaction.context.application_version_name","limit":10,"sort":{"order":"desc","aggregation":"count"}}],"search":{"query":"@deepchecks_interaction.estimated_annotation:good $application $version $environment $source"},"storage":"hot"},{"data_source":"logs","name":"query2","indexes":["*"],"compute":{"aggregation":"count","interval":600000},"group_by":[{"facet":"@deepchecks_interaction.context.application_name","limit":10,"sort":{"order":"desc","aggregation":"count"}},{"facet":"@deepchecks_interaction.context.application_version_name","limit":10,"sort":{"order":"desc","aggregation":"count"}}],"search":{"query":"@deepchecks_interaction.estimated_annotation:(good OR bad) $application $version $environment $source"},"storage":"hot"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"line"}]},"layout":{"x":0,"y":3,"width":6,"height":3}},{"id":3925972544507286,"definition":{"title":"Bad Grounded In Context Ratio By Application/Version (\"under the threshold\" / \"all\")","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"alias":"bad grounded in context ratio","formula":"query1 / query2"}],"queries":[{"data_source":"logs","name":"query1","indexes":["*"],"compute":{"aggregation":"count","interval":600000},"group_by":[{"facet":"@deepchecks_interaction.context.application_name","limit":10,"sort":{"order":"desc","aggregation":"count"}},{"facet":"@deepchecks_interaction.context.application_version_name","limit":10,"sort":{"order":"desc","aggregation":"count"}}],"search":{"query":"@deepchecks_interaction.output_properties.output_grounded_in_context:[0.0 TO 0.4] $application $version $environment $source"},"storage":"hot"},{"data_source":"logs","name":"query2","indexes":["*"],"compute":{"aggregation":"count","interval":600000},"group_by":[{"facet":"@deepchecks_interaction.context.application_name","limit":10,"sort":{"order":"desc","aggregation":"count"}},{"facet":"@deepchecks_interaction.context.application_version_name","limit":10,"sort":{"order":"desc","aggregation":"count"}}],"search":{"query":"$application $version $environment $source"},"storage":"hot"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"line"}]},"layout":{"x":6,"y":3,"width":6,"height":3}},{"id":8506330137618792,"definition":{"title":"Unknown Estimated Annotation Ratio By Application/Version","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"alias":"unknown ratio","formula":"query3 / query4"}],"queries":[{"data_source":"logs","name":"query3","indexes":["*"],"compute":{"aggregation":"count","interval":600000},"group_by":[{"facet":"@deepchecks_interaction.context.application_name","limit":10,"sort":{"order":"desc","aggregation":"count"}},{"facet":"@deepchecks_interaction.context.application_version_name","limit":10,"sort":{"order":"desc","aggregation":"count"}}],"search":{"query":"@deepchecks_interaction.estimated_annotation:unknown $application $version $environment $source"},"storage":"hot"},{"data_source":"logs","name":"query4","indexes":["*"],"compute":{"aggregation":"count","interval":600000},"group_by":[{"facet":"@deepchecks_interaction.context.application_name","limit":10,"sort":{"order":"desc","aggregation":"count"}},{"facet":"@deepchecks_interaction.context.application_version_name","limit":10,"sort":{"order":"desc","aggregation":"count"}}],"search":{"query":"$application $version $environment $source"},"storage":"hot"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"line"}]},"layout":{"x":0,"y":6,"width":6,"height":3}},{"id":1648519253537176,"definition":{"title":"Number of Interactions By Application/Version","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"query1"}],"queries":[{"data_source":"logs","name":"query1","indexes":["*"],"compute":{"aggregation":"count","interval":600000},"group_by":[{"facet":"@deepchecks_interaction.context.application_name","limit":10,"sort":{"order":"desc","aggregation":"count"}},{"facet":"@deepchecks_interaction.context.application_version_name","limit":10,"sort":{"order":"desc","aggregation":"count"}}],"search":{"query":"$application $version $environment $source"},"storage":"hot"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"line"}]},"layout":{"x":6,"y":6,"width":6,"height":3}}],"template_variables":[{"name":"source","prefix":"source","available_values":[],"default":"deepchecks_llm"},{"name":"application","prefix":"@deepchecks_interaction.context.application_name","available_values":[],"default":"*"},{"name":"version","prefix":"@deepchecks_interaction.context.application_version_name","available_values":[],"default":"*"},{"name":"environment","prefix":"@deepchecks_interaction.context.env_type","available_values":[],"default":"*"}],"layout_type":"ordered","notify_list":[],"template_variable_presets":[{"name":"Deepchecks PROD data","template_variables":[]}],"reflow_type":"fixed"}

Import Example Monitor into Datadog

This monitor will alert you if less than 20% of your the samples that received an estimated annotation got "Good" estimated annotations over a period of 5 minutes. To define it, copy the following json and import it into Datadog:

{"id":146458791,"name":"Good Estimated Annotation ratio alert","type":"log alert","query":"formula(\"query / query1\").last(\"5m\") < 0.2","message":"To investigate in Deepchecks:\nhttps://app.llm.deepchecks.com?appName={{urlencode \"[@deepchecks_interaction.context.application_name].name\"}}&versionName={{urlencode \"[@deepchecks_interaction.context.application_version_name].name\"}}\n\nDeepchecks' dashbaord in datadog:\nhttps://app.datadoghq.com/dashboard/yge-c92-33z/monitor-llm?from_ts={{eval \"last_triggered_at_epoch-10*60*1000\"}}&to_ts={{eval \"last_triggered_at_epoch+10*60*1000\"}}&live=false\n\n@slack-test-datadog-dc-integration","tags":[],"options":{"thresholds":{"critical":0.2,"warning":0.4},"enable_logs_sample":false,"notify_audit":false,"on_missing_data":"default","include_tags":false,"variables":[{"data_source":"logs","name":"query","indexes":["*"],"compute":{"aggregation":"count"},"group_by":[{"facet":"@deepchecks_interaction.context.application_name","limit":10,"sort":{"order":"desc","aggregation":"count"}},{"facet":"@deepchecks_interaction.context.application_version_name","limit":10,"sort":{"order":"desc","aggregation":"count"}}],"search":{"query":"@deepchecks_interaction.estimated_annotation:good"},"storage":"hot"},{"data_source":"logs","name":"query1","indexes":["*"],"compute":{"aggregation":"count"},"group_by":[{"facet":"@deepchecks_interaction.context.application_name","limit":10,"sort":{"order":"desc","aggregation":"count"}},{"facet":"@deepchecks_interaction.context.application_version_name","limit":10,"sort":{"order":"desc","aggregation":"count"}}],"search":{"query":""},"storage":"hot"}],"new_group_delay":0,"silenced":{}},"priority":null,"restricted_roles":null}

Next steps

Once you have enough data in Datadog, go over the dashboard and tune it according to your needs.
Do the same for the monitors, tune the existing monitor to balance between "false alarms" and a "real issue", add more monitors based on other properties to help you capture production regressions on time.