Junk dimension are dimensions that contain miscellaneous data such as flags and indicators. When designing a data warehouse, you might come across a source system that has a bunch of yes/no indicator fields. In below article we will discuss junk dimension in details.
Check here for Degenerate Dimension in Data warehouse
What is Junk Dimension ?
When developing a dimensional model, we often encounter miscellaneous flags and indicators. These flags do not logically belong to the core dimension tables.
A junk dimension is grouping of low cardinality flags and indicators. This junk dimension helps in avoiding cluttered design of data warehouse. Provides an easy way to access the dimensions from a single point of entry and improves the performance of sql queries.
Check here for Conformed Dimension in Data warehouse
Example of Junk dimension : For example, assume that there are two dimension tables (gender and marital status). The data of these two tables are shown below:
Here both the dimensions have low cardinality flags. This will cause maintenance of two tables and decrease performance of sql queries.
We can combine these two dimensions into a single table by cross joining and can maintain a single dimension table. The result of cross join is shown below. This new dimension table is called a junk dimension. This will improve the manageability and improves the SQL queries performance.
Check here for Role Playing Dimension in Data warehouse
Advantages of Junk Dimension
- Improves data warehousing performance by combining low cardinality objects into one , hence single table to join
- It does not affect end user perspectives whatsoever, he sees them as different entities as “dimensional perspectives” as defined in Reporting layer will be implemented in a BI tool like cognos
- De-clutter design
- Track history /SCD2 in simplified & easier manner
- Can be scalable to all future requirements which can come at different roles & at different grains , which can be easily handled in a single data object.
- The another benefit normally derived from any junk & intended here is to make maintenance easier (not difficult !!). It is normally easy to maintain 1 SK (Surrogate Keys) instead of 3 , 1 SCD (slowly changing dimension) tracking instead of 3 (the joins will not be more complex for SCD as only the queries pertaining to role flag will be queried which is same as querying it as standalone entity , however etl jobs for these scd would be easier doing in once instead in 3 times ), and to maintain a single data object in database instead of (More entities can be combined over three which can be even easier)
- Avoid making this model as “Centipede dimensional model” and is one of key reasons why we use junk dimension anywhere
So in above data warehouse tutorial ,we have gone through some basic concept of Junk dimension , along with importance and its advantages.