Reduce the annual cost of tracking services by $42,696

Situation


My company is currently using Mixpanel to monitor events within our app. However, to reduce operational expenses, the company has decided to cancel this service subscription in the second quarter of 2026.

Mixpanel has proven to be an invaluable tool for tracking event. It has become the primary source of truth for several analytics dashboards that monitor the performance of our quick-commerce application. From a data engineering perspective, the most significant feature of this service is its seamless integration process. With just a few clicks, We can access the data available in BigQuery in near-real-time, eliminating the need to create an in-house pipeline for data access. 

The company wants to eliminate the Mixpanel cost but doesn’t want to lose our events data, which makes the performance dashboard stale. Therefore, the data engineering team needs to collaborate with software engineering and infrastructure to provide the data.

Task


To achieve this, I need to follow these steps:
  1. Request the software engineering team to cutover the events data from Mixpanel to another storage system.
  2. Develop a pipeline to import the data into our BigQuery.

Action


The software engineering team has published event data to Kafka Topics.

I developed a data pipeline using Spark and Scala and deployed it on Google Dataproc. I opted for this approach because it’s well-suited for handling large volumes of data, such as search user data in our application. Additionally, I opted Google Dataproc rather than Dataflow because it is cheaper because no need to keep the service run 24 hours.

Result


I successfully set up a data pipeline that ingests event data into BigQuery. This enables the company to achieve its cost reduction initiative and maintain an up-to-date performance dashboard.

The pipeline incurs a very low cost of $0.12 per day or $3.56 per month because it runs only once a day.

The only drawback of the new pipeline is that the data has daily freshness, which is acceptable to our data stakeholders.

A Pragmatic Approach to Importing MongoDB Data into BigQuery

Situation

My company has a stock data problem. The physical inventory in our stores doesn't match what's shown in our warehouse management system (run by a vendor) or in our various inventory services.
Since this issue is critical and is being watched by executives and investors, we need a fast solution. We'll focus on getting the data flowing quickly first, then optimize later.

The problem is that we use multiple platforms to track stock, which makes comparing reports difficult since the data isn't stored in one place. To solve this, we need to bring all the data from these different platforms into our data warehouse.

In this article, I'll focus on integrating data from our vendor-managed warehouse management system. The specific data we want to bring into BigQuery is the stock log collection, which is very large. This collection tracks all stock movements: inbound shipments, outbound transfers, customer purchases on our platform, and other stock-related activities.

We need to fetch data directly from the vendor's MongoDB database because of several integration issues with our internal service. Making things worse, our CDC platform can't help us because it has technical debt and bugs for MongoDB connector, so MongoDB support isn't available right now.

Task

To make this happen, we need to complete two key steps:
  1. Work with the infrastructure team to set up MongoDB replica set in our environment.
  2. Find the right tool for processing the data, since we'll be ingesting a large collection.

Action

The Infrastructure team provisioned a MongoDB replica set in our environment. This setup ensures our data ingestion doesn't impact the warehouse management system's performance. Kudos to the Infra team for always helping us related to the data engineering tasks.

I built the data pipeline using Spark and Scala. I chose Spark because it can handle large amounts of data efficiently, which we need for this massive stock log collection. I then deployed the pipeline on a Dataproc cluster so it can run reliably in our cloud environment.

Result

The stock log data is now available in BigQuery and ready for analysis and monitoring. Our team can now compare inventory across all platforms to find mismatches between physical store counts and what the systems show.

The ingestion cost is also very low at $0.12 per day or $3.56 per month because the pipeline only runs once a day.

CDC Observability Migration: From Datadog to Grafana with $XXX in Savings

Situation

I recently finished moving our monitoring system for the Change Data Capture (CDC) service from Datadog to Grafana as part of a company-wide cost reduction effort.

Before the migration, our CDC connectors sent custom metrics directly to the Datadog agent using UDP. We also used PagerDuty for alerts - when a Datadog monitor detected an issue, it would trigger a phone call. The company has now discontinued both Datadog and PagerDuty, so we needed to find replacement solutions.

Task

What needed to be migrated:
  • Enable CDC connector custom metrics in Prometheus
  • Recreate monitoring dashboards in Grafana
  • Set up alerts in Grafana Alerting

Action

I completed the migration in three steps:
  1. Set up metric collection
  2. Recreated dashboards
    • Built new monitoring dashboards in Grafana that matched our previous Datadog views
  3. Configured alerting
    • Set up alert rules in Grafana Alerting for critical conditions
    • Set up notification channel to Google Chat Channel

Result

The migration successfully reduced our observability costs by $XXX, supporting the company's cost-saving goals.

However, we did encounter some trade-offs. Without PagerDuty phone calls, we can't respond to alerts as quickly since we now rely on Google Chat notifications. We've also noticed that some custom metrics data is occasionally lost due to the current setup not being as reliable as Datadog.

2025 in Review: First Mandate Working with Quick-Commerce Company

Situation


I started working at a quick-commerce company a year ago, after leaving the first Indonesian unicorn company in December 2024. The company is a subsidiary of my previous employer, and I joined through an internal transfer.

At first, our data platform was managed by the holding company, and I supported it while on call. Later, the holding company stopped providing support. Because of this, my company asked me to build a new data platform from scratch in a separate Google Cloud Platform (GCP) environment.


Task


The first mandate was clear. The data manager asked me to create a data platform that could seamlessly integrate with the existing reporting platform. At that time, there were only two data analysts, and they were already busy supporting data stakeholders across the company.


Action


I built the data platform using a Lambda Architecture. I ensured that the outputs in our new environment matched the data warehouse tables used by existing reports in the holding company’s environment. This allowed us to migrate smoothly without disrupting reports.


Result


I successfully built the data platform in our environment, and completed the migration smoothly with a single click by switching the Looker data sources from the holding company’s GCP project to our own. The entire migration took less than 30 minutes, and there were no issues with existing reports.