CDC Observability Migration: From Datadog to Grafana with $XXX in Savings

Situation

I recently finished moving our monitoring system for the Change Data Capture (CDC) service from Datadog to Grafana as part of a company-wide cost reduction effort.

Before the migration, our CDC connectors sent custom metrics directly to the Datadog agent using UDP. We also used PagerDuty for alerts - when a Datadog monitor detected an issue, it would trigger a phone call. The company has now discontinued both Datadog and PagerDuty, so we needed to find replacement solutions.

Task

What needed to be migrated:
  • Enable CDC connector custom metrics in Prometheus
  • Recreate monitoring dashboards in Grafana
  • Set up alerts in Grafana Alerting

Action

I completed the migration in three steps:
  1. Set up metric collection
  2. Recreated dashboards
    • Built new monitoring dashboards in Grafana that matched our previous Datadog views
  3. Configured alerting
    • Set up alert rules in Grafana Alerting for critical conditions
    • Set up notification channel to Google Chat Channel

Result

The migration successfully reduced our observability costs by $XXX, supporting the company's cost-saving goals.

However, we did encounter some trade-offs. Without PagerDuty phone calls, we can't respond to alerts as quickly since we now rely on Google Chat notifications. We've also noticed that some custom metrics data is occasionally lost due to the current setup not being as reliable as Datadog.