On this second day Dan and his crew are going to talk to us in a more technical manner. We are ready to dive deeper into the Data Vault and better understanding the advantages and disadvantages of the Data Vault.
First a little rephrase of my questions after reading the material on Dan’s website and listening to him on the first day;
- How would Data Vault handle / cope with REVIVAL / REUSE of business keys?
- Why is there not a future dated ENDDATE in the SATELITES?
- How would Data Vault handle / cope with multiple identifiers?
- SALES uses 123ABC
- MARKETING uses ABD467
- LOGISTICS uses DEFx175
- And all are refering to the same person…
- How would Data Vault handle / cope with multiple source systems delivering the same information?
- How would Data Vault handle / cope with versioning of the EDW data model?
- How would Data Vault handle / cope with valuta conversions?
- Does Data Vault have a solutions / is it recommended to change the source system to the primary one?
- How does Model Driven Architecture fit into place with Data Vault?
And new questions which I got this day.
- Why are the IDs not unique across the model?
- Is the Data Vault contributing to the possibility to use Function Point Analysis?
- How does a METRIX satelite look like?
- How is a DELETE handled?
Let start with the three basic components and have them drawn up in a visual manner.

Here we see the three elements used in a (partial) Data Vault, where EMPLOYEES, STUDENTS and SCHOOLS are the defined as the HUBS, there are two LINKS defined; Link Assign and Enrollments. Finally we have 9 different SATELITES with descriptive information about 2 HUBS and 1 LINK. (For the course / in Dans paper the NothWind datamodel is used, which is not a good data model, but it visualizes very well which problems you might face when translating SOURCES to DATA VAULTS).
During the course we started by defining the different approach between the 3NF, Dimensional Modelling and Data Vault and how this might affect the project. In 3NF we see a TOP DOWN architecture, followed by a TOP DOWN implementation, which makes the model inflexible for changes that we might have been given during the implementation (architecture does not match implementation => back to the drawing board and start implementing later). You might say that we THINK BIG and START BIG. The dimensional Modelling takes a BOTTOM-UP architecture and BOTTON-UP implementation approach. Which is equal to THINK SMALL and START SMALL. We are not aware of the total environment and this might be why the conformed dimensions might endup as deconformed dimensions in the end.
Data Vault on the other hand, supports the well known phrase THINK BIG and START SMALL. By architecturing all the sources of an organisation one might see more, then looking only from the reports and what is needed for that small amount of information to be presented to the (end)user(s). But the Data Vault architecture is as robust to support adding elements (HUBS, LINKS and SATELITES) later in the project, so we start one part at a time. So the TOP DOWN architecture brings a high level overview of what might become the Data Vault and the BOTTOM UP implementation starts with only one Unit of Work preferably.
When the architecture was clear, Dan started to talk about the first element; HUBS. One of the most fundamental parts of the Data Vault. All business keys that are used within the organisation and which are valuable to this businesspeople and there jobs, are potential HUBS. E.g. CustomerNR => HUB_CUSTOMER, ProductNR => HUB_PRODUCT, EmployeeNR => HUB_EMPLOYEE, etc. This information is valuable to the business because it is used in their daily business and is part of their processes and stores within their systems.
When looking at the Business Information Model (see picture), we see these elements as wel, PEOPLE, PROCESS and TECHNOLOGY. The element INFORMATION is right in the center of these elements, which all contribute and rely on this INFORMATION.
The Data Vault does the same, it follows the business processes and is closely related to the business process models within the company. It facilitates a horizontal integration of information within the company. How long was this customer in SALES before he got in the SHIPMENT systems.
The second element within the Data Vault, LINKS is purely about relationships between HUBS, which provides the business information for transactions, events, hierarchies and RollUp / Aggregation facilities. The third element (SATELITES) was taken to the third day of this course.
So which questions do I now have an answer for;
- How would Data Vault handle / cope with multiple identifiers?
- SALES uses 123ABC
- MARKETING uses ABD467
- LOGISTICS uses DEFx175
- And all are refering to the same person…
- Data Vault is registering ALL information that has the same grain, so one HUB would be the case here, and if there is a connection between these numbers that is delivered by a source system, Data Vault can even register the relationship in LINK tables for the end user(s).
- How would Data Vault handle / cope with multiple source systems delivering the same information?
- The answer to this questions is already given in the previous answer, Data Vault registers all incoming data and if it is the same grain, then it is even registered in the same HUB.
- Does Data Vault have a solutions / is it recommended to change the source system to the primary one?
- Yes and No. Data Vault will not change anything (only the enddate in SATELITES is updated), so the delivery of a business key in a non-primary source system will be registered in another row as the existance of that same key in the primary source system.
This still leaves a few questions open for Dan and his crew, which will be answered tomorrow I hope…






















Posted in
Tags: 




Hi,
I hope to answer your questions over the next couple weeks. I realize that squeezing a 4 day class in to 3 days really put us at a lack of time to answer all the questions, so hopefully we’ll get the information out quickly.
Thank-you for your kind words,
Daniel Linstedt