Is Kimball's dimensional modeling approach still applicable in modern data warehousing?
In 1996, when Kimball modelling was first introduced, certain assumptions were made. However, with the passage of time and changes in the industry, some of these assumptions no longer hold true in 2024.
Databases can be slow and expensive
SQL language has limitations.
Joining Fact tables can be difficult due to one-to-many or many-to-one joins.
Businesses can also be slow to adapt.
A lot has happened.
Databases have become faster and more affordable nowadays.
SQL has evolved to incorporate a plethora of date and window functions, which have significantly reduced the effort required for ETL work.
Looker has introduced a feature called Symmetric Aggregates that has resolved the problem of joining Fact tables.
Business doesn't wait for you to finish your perfect data warehouse.
1. Avoidance of Many-to-Many Relationships
Traditional Kimball modeling suggests avoiding many-to-many relationships in the star schema. While this is still generally true, modern data warehousing and analytics platforms provide more sophisticated ways to handle such relationships, such as using bridge tables or resolving them through other means.
2. Batch Processing:
The original Kimball methodologies were developed in an era when batch processing was more prevalent. With the rise of real-time analytics, streaming data, and technologies like Apache Kafka, the data ingestion and processing paradigms have evolved. Modern architectures often incorporate real-time or near-real-time data processing and analytics.
3. Single Version of the Truth
While the concept of having a single version of the truth remains essential, the approach to achieving it has evolved. With the advent of big data technologies, data lakes, and multi-source data integration, organisations often deal with diverse data sources, requiring more flexible and scalable architectures.
4. Structured Data Focus
In the 1990s, relational databases and structured data were predominant.
However, with the proliferation of unstructured and semi-structured data sources like IoT devices, social media, and logs, modern data architectures need to handle a broader variety of data types.
This shift has led to the adoption of data lakes, NoSQL databases, and other technologies designed for handling diverse data types.
5. Centralised Data Warehouses
While centralised data warehouses based on Kimball principles are still prevalent and effective for many organisations, the rise of distributed computing, cloud platforms, and edge computing has introduced more decentralised or hybrid data architectures.
Organisations may leverage distributed data stores, cloud-based data platforms, or edge computing solutions based on their specific needs and use cases.
Summary
In summary, while the core principles of Kimball modelling remain valuable and relevant, the evolving technological landscape, changing business requirements, and advancements in data management and analytics tools have led to adaptations and reconsiderations in some areas by 2024.
Organisations often combine elements of Kimball's methodologies with modern practices to effectively design data architectures that meet their current and future needs.
Post Tags:
- Previous: Single version of the truth
- Next: Gaps and Islands in SQL