SITUATION
I was working at a holding company that owned more than twenty subsidiaries. The holding company managed the end-to-end data platform for all its subsidiaries.
Unexpectedly, the holding company decided to stop managing the subsidiaries' data platforms. This meant each subsidiary had to host and manage its own data platform. Adding to the surprise, I joined one of the subsidiary companies, an e-commerce business specializing in fresh products like butchery, vegetables, etc.
At that time, the subsidiary had no data engineering team, and we needed to build the data platform from scratch. However, the holding company offered us their data platform's source code, which could be installed in our environment.
TASK
As the newly formed data engineering team, we considered the pros and cons of building our data platform. Should we create a platform from scratch using the latest open-source tools, or should we adopt the holding company's stack, which consisted mostly of in-house applications and no longer maintained.
ACTIONS
Before deciding on the tools to use, we compared the following architectures:
- Lambda Architecture
- Layers: 3 Layers
- Data Processing: Batch and Real-time
- Complexity: High
- Latency: High
- Scalability: Moderate
- Kappa Architecture
- Layers: 2 Layers
- Data Processing: Continuous stream
- Complexity: Moderate
- Latency: Low
- Scalability: High
- Medallion Architecture
- Layers: 3 Layers
- Data Processing: Incremental data transformation
- Complexity: High
- Latency: Moderate
- Scalability: Moderate
RESULTS
We chose the Lambda Architecture for our data platform because it was more cost-effective, familiar to the team, and most of our reports did not require real-time data. Although it was more complex due to the need to maintain code for each layer, it aligned better with our requirements compared to the other architectures. Additionally, We are using the data platform's source code from the holding company which is also compatible with the Lambda Architecture.
Below is the architecture diagram of our data platform using the Lambda Architecture approach.
