ilyas ahsan: 2025

SITUATION

I was working at a holding company that owned more than twenty subsidiaries. The holding company managed the end-to-end data platform for all its subsidiaries.

Unexpectedly, the holding company decided to stop managing the subsidiaries' data platforms. This meant each subsidiary had to host and manage its own data platform. Adding to the surprise, I joined one of the subsidiary companies, an e-commerce business specializing in fresh products like butchery, vegetables, etc.

At that time, the subsidiary had no data engineering team, and we needed to build the data platform from scratch. However, the holding company offered us their data platform's source code, which could be installed in our environment.

TASK

As the newly formed data engineering team, we considered the pros and cons of building our data platform. Should we create a platform from scratch using the latest open-source tools, or should we adopt the holding company's stack, which consisted mostly of in-house applications and no longer maintained.

ACTIONS

Before deciding on the tools to use, we compared the following architectures:

Lambda Architecture

Layers: 3 Layers
Data Processing: Batch and Real-time
Complexity: High
Latency: High
Scalability: Moderate

Kappa Architecture

Layers: 2 Layers
Data Processing: Continuous stream
Complexity: Moderate
Latency: Low
Scalability: High

Medallion Architecture

Layers: 3 Layers
Data Processing: Incremental data transformation
Complexity: High
Latency: Moderate
Scalability: Moderate

RESULTS

We chose the Lambda Architecture for our data platform because it was more cost-effective, familiar to the team, and most of our reports did not require real-time data. Although it was more complex due to the need to maintain code for each layer, it aligned better with our requirements compared to the other architectures. Additionally, We are using the data platform's source code from the holding company which is also compatible with the Lambda Architecture.

Below is the architecture diagram of our data platform using the Lambda Architecture approach.

SITUATION

Looker was very helpful for our data scientists because its LookML feature made it easy to build data models, even with many dependencies. It also allowed our business users to explore and analyze data on their own, making self-service analytics simple with easy slice-and-dice features.

However, Looker is expensive. For our team, the cost was $48,000 every year based on the number of accounts we had.

Looker Annual Subscription Cost   = Number of Looker accounts × USD 100 × 12 months
                                                   = 40 × 100 × 12
                                                   = USD 48,000

Even though we liked using Looker, our company wanted a cheaper solution. After looking at other tools, we chose dbt to replace LookML and Apache Superset to replace Looker Studio.

TASK

As the data engineering team, my main tasks were:

Create a new dbt project.
Migrate Looker views to dbt models.
Set up and configure Apache Superset as our new data visualization tool.

ACTIONS

Create a new dbt project

To save costs, we chose to use free and open-source tools. For dbt, we used dbt-core (the free version) instead of dbt-cloud.

Dbt Project Architecture

We created a new repository in GitLab and set up GitLab CI/CD pipelines to run dbt-core. This lets us automatically build our data models in BigQuery without extra costs.

We built two CI/CD pipelines:

Staging pipeline: For testing and validating new changes.
Production pipeline: For deploying changes to production.

You can see the details of our CI/CD setup in the images below.

To automate building all dbt models on a schedule, we set up an Airflow DAG.

For documentation, we serve dbt docs as a Kubernetes deployment and use Istio for ingress, so our team can easily access the latest documentation.

Migrate Looker views to dbt models.

The migration process was quiet easy, and i've created a separate page for it:

https://ilyas-ahsan.blogspot.com/2025/05/how-we-automatically-convert-looker.html

Set up and configure Apache Superset as our new data visualization tool.

We compared several tools to replace Looker Studio. Our top choices were Metabase and Apache Superset. We chose Apache Superset because it offers a free caching feature, which helps reduce the workload on BigQuery, especially since most dashboards only need daily data updates.

For installation, we followed the official guide for setting up Superset on Kubernetes: https://superset.apache.org/docs/installation/kubernetes

RESULTS

As a result, we successfully migrated and managed 99 Looker views into dbt models. These dbt models are made up of 396 models, 297 sources, and 491 macros.

Below is the largest dbt model lineage, showing the dependencies needed to build the fact transaction models.

ilyas ahsan

How We Ingest OLTP data into OLAP System

SITUATION

TASK

ACTIONS

RESULTS

How We Saved $48,000 by Replacing Looker with Dbt and Superset

SITUATION

TASK

ACTIONS

RESULTS