Reduce the annual cost of tracking services by $42,696

Situation


My company is currently using Mixpanel to monitor events within our app. However, to reduce operational expenses, the company has decided to cancel this service subscription in the second quarter of 2026.

Mixpanel has proven to be an invaluable tool for tracking event. It has become the primary source of truth for several analytics dashboards that monitor the performance of our quick-commerce application. From a data engineering perspective, the most significant feature of this service is its seamless integration process. With just a few clicks, We can access the data available in BigQuery in near-real-time, eliminating the need to create an in-house pipeline for data access. 

The company wants to eliminate the Mixpanel cost but doesn’t want to lose our events data, which makes the performance dashboard stale. Therefore, the data engineering team needs to collaborate with software engineering and infrastructure to provide the data.

Task


To achieve this, I need to follow these steps:
  1. Request the software engineering team to cutover the events data from Mixpanel to another storage system.
  2. Develop a pipeline to import the data into our BigQuery.

Action


The software engineering team has published event data to Kafka Topics.

I developed a data pipeline using Spark and Scala and deployed it on Google Dataproc. I opted for this approach because it’s well-suited for handling large volumes of data, such as search user data in our application. Additionally, I opted Google Dataproc rather than Dataflow because it is cheaper because no need to keep the service run 24 hours.

Result


I successfully set up a data pipeline that ingests event data into BigQuery. This enables the company to achieve its cost reduction initiative and maintain an up-to-date performance dashboard.

The pipeline incurs a very low cost of $0.12 per day or $3.56 per month because it runs only once a day.

The only drawback of the new pipeline is that the data has daily freshness, which is acceptable to our data stakeholders.