Ranting mode ON!
In the industry of Analytical System Design there are a lot of incorrect descriptions, bad semantics, and misunderstandings, at least on LinkedIn. But since it seems that people want to show off their competence on LinkedIn, I suspect it is also valid on the workplaces.
This Rant is about the misuse and confusion between Platform and Application but also that there are differences between different kind of Applications.
From time to time these kinds of posts show up on LinkedIn
- Snowflake(exchangeable with any platform) is a very good Data Warehouse
This is like saying, This Road is a very Good Car. If anyone said that people would go nuts.
But in our industry writing the above statement (about snowflake) on LinkedIn gets hundreds of thumbs up.
That one person thinks the statement above is correct is somehow acceptable, not everyone can understand how things work. But that hundreds of people believe that the statement is correct, scares me.
You Build a Data Warehouse on a Platform!
You drive a Car on a Road!
This Data Warehouse I have built run so smooth and fast on Snowflake (exchangeable with any platform). That can be a correct statement but…
Also remember that with bad design you can make any platform look bad.
It is not by default that an Application is good, just because it runs on a specific platform.
That would be like saying,
Any Application built with C# (exchangeable with any code language) is a Good Application independent how it was built. If you write that in a software design forum, they will probably kick you out, head first.
That leads us to the belief that everyone is building Data Warehouses.
There exists a clear definition from the early 1990 of when an analytical data application can be called a Data Warehouse, it’s when the application, in at least one layer, in the architecture represent the data in a format that is.
- subject oriented,
- integrated
- non-volatile
- time variant.
which means that the data is reusable for any analytical use case, that needs that data.
It is not represented for.
- a specific system or
- a specific report.
If you don’t have that layer, you do not have a Data Warehouse.
When you move the “raw” data into a platform and write dashboards, reports or what not, directly against the “raw” data layer, you are NOT building a Data Warehouse. Some people seem to believe that it is the case, it is most common when they move the “raw” data into a cloud platform. Again, the Platform does not define the Application.
I have raw data on snowflake (exchangeable with any platform) that I build Reports and Dashboards against so it is a Data Warehouse.
It is like saying
I am riding my Bicycle on the Road, so it is a Car! (my brain explodes)
You are free to build whatever you want or need on any platform, but please understand the meaning, semantics, and nuances in our industry before you go out and make a fool of yourself and your peers on LinkedIn.
Some Examples to help you
Platform: HDFS, Snowflake, Biqquery, Oracle Data Base, MS SQL Data Base, Postgresql…
Applications: Data Warehouse, Operational Data Store, Data Lake…
It is even more intricate, if you really want to go into details, we can talk about the difference between hardware, different layer of software on a platform etc, but I stop here, I hope I made my point clear and that we get better nuanced LinkedIn Posts in the future in this area of information.
Patrik Lager