Browse: HomeData Vault → Thoughts about Record Source in Data Vault Modelling

Thoughts about Record Source in Data Vault Modelling

Today I was at the BDWM course at my customer. The teacher explained to us how we should use the SOURCE SYSTEM COMPONENT within the IBM Banking Data Warehouse Model (BDWM). The resamblence with the Record Source field in a Data Vault Datamodel stroke me.

Both are used, in essence, to provide information about which source system was responsible for delivering the information / data, but only one of them has a 3NF which is right in my (humble) opinion. I am actually saying that the Data Vault is not complete, from how I see it… Let me explain my view on this matter;

IBM’s BDWM contains a SOURCE SYSTEM COMPONENT for each row in, for instance, the Involved Party Identifier entity. The structure for this entity looks much like the structure for a HUB in a Data Vault Model. It contains the natural identifier, the providing source system, the type of identifier, and a datetime stamp on which the row was first loaded. For history reasons, there are also effective and enddate time stamps in this entity.


If you look at the standard elements for a HUB, you see a BUSINESS KEY, LOAD DATETIME STAMP, and a RECORD SOURCE. If we take a closer look at the elements SOURCE SYSTEM COMPONENT and RECORD SOURCE, these are both the delivering / providing / originating source systems. The only difference is that the SOURCE SYSTEM COMPONENT is identified by a surrogate key (OID) in the model and RECORD SOURCE within a DATA VAULT model contains the ‘natural key’ of a source system.

Does the natural key not get translated within the Data Vault as well ?!? No, it seems this natural key, is the only key that is allowed to be Data Vault Model Record Sourceused within the Data Vault datamodel. But doesn’t this RECORD SOURCE, also contain other descriptive information like the name of the source system, location of the source system, owner of the source system, etc., etc. And is the RECORD SOURCE / HUB combination (in this example) not a LINK between two HUBS, one with the business keys for e.g. Product and one with the business key for the source system.

In the figure I visualized the structure which, in my opinion, is the right presentation of the RECORD SOURCE attribute in the HUB. It is a LINK between the natural key of the HUB and the natural key for the source system. Most companies speak about their source systems in abbreviations or names, so it might be called a business keys, and these need to be in a HUB according to the Data Vault Philosophy.

The structure for the LINK might look like; 


Link LineItem ID HUBSequence ID Source RecordSequence ID Load DTS Source RecordLink Sequence ID
1 1 2 2007-11-05 22:30:00 3
2 2 2 2007-11-05 22:30:00 3
3 3 2 2007-11-05 22:30:00 3


The structure for the RECORD SOURCE HUB might look like; 
Source RecordSequence ID SourceName Load DTS Source RecordSequence ID
0 NULL 2007-11-05 22:30:00 1
1 Record Sources 2007-11-05 22:30:00 1
2 HUB source System 2007-11-05 22:30:00 1
3 LinkInfo HUB-Source 2007-11-05 22:30:00 1

Looking at the HUB ‘Record Source’, you see that Source Record Sequence ID = 1, is the source for all record source systems used within the Data Vault. It provides all the information about the RECORD SOURCE systems. In the LINK, there are combinations made between the Record Source and HUB. The record source for the HUB is a single Record Source System, namely HUB Source System (Sequence ID = 2), which contains all the business key information for the HUB. Finally the information about which HUB – Source Records are connected, comes from … this might be visualised during the load of the HUB.

Thinking about the whole idea of using a HUB-LINK-SOURCE RECORD HUB, it might be that SOURCE RECORD is candidate for a REFERENCE TABLE structure, because It is not possible to provide the source for the combination of HUB business keys and Record Sources in a single Source System. This information CAN be derived from the HUB, but is only valid untill the next HUB load is performed. So let’s update our picture then… Or is the picture still valid and does the LINK structure and example need some update…

I think the Record Source field in the HUB should point to the HUB Record Source without a LINK interconnecting these HUBs, it’s a reference table. But I would keep the history, so the Satelite information about the Record Source stays in…

Question;

  • Are there more people thinking about the Record Source field as I do, or am I the only one that tries to translate the business key in a sequence id, just for the smaller size of the record (More records can be record into one block!!!)

RELATED

SHARE

1 COMMENT

[...] Weekly Wisdom for 6 mar 2008 [...]

March 10, March at 15:10

Leave a comment

© 2009 NEWSPRESS. POWERED BY WORDPRESS AND WPCRUNCHY.
Design by Acai Berry. In collaboration with Online Gambling, Credit Repair and spielautomat