In 2007 I posted an article about Source System Component (BDWM) and the resamblance with the Record Source (See article). My colleague Rob Mol, asked a few questions which I did never answer, so here’s a post dedicated to his questions and my answers.
First let’s rephrase his questions;
-
Why keep history of the Record Source?
-
What should be done with the Record Source for LINKS and SATS, because the Data Vault doesn’t allow you to connect SATS to HUBS and vice versa?
-
What about the LOAD strategy? Do you want to load the RECORD SOURCES while LOADING the HUBS?
To anwer the questions, I must admit I had some good readings about both the Data Vault and the Banking Data Warehouse Model (BDWM), and most are positively answered in the sense that Rob is right asking me these questions. Let start my monologue then…
I wanted to keep history in the record source because it could provide me with the information of releases and possible defects that originated from these releases. It’s more the descriptive information I want to store in the SATs, aside from the history it creates for my source system. Administrators could change. Locations could change. SourceType could change (e.g. Oracle => Teradata). But these are all metadata and perhaps do not belong in the Data Vault Model. Let’s ask the question to the user / customer and let then decide (I got the feeling it”s only maintenance that can benefit from this information, so maybe it’s better to skip this requirement…)
Rob is absolutely right about the SATs and LINKs connecting to the RECORD SOURCE HUB. Especially the SAT linking to the RECORD SOURCE HUB will give a lot of unnecessary lines in the data model and even a recursive relationship, because the RECORD SOURCE of the RECORD SOURCE for the RECORD SOURCE HUB… You get the picture
The last questions is about the LOAD strategy for the RECORD SOURCES. Since the LOAD strategy for the HUB, LINK and SAT contains explicitely the mentioning of the RECORD SOURCE in these entitities, I had the idea that the RECORD SOURCE could be loaded in parallel, but since there is only hardcoded mentioning of the LOADing RECORD SOURCE, the parallel LOADing of these RECORD SOURCE HUBs is pure crap!!! Hardcoded => LOADing… (Stupid thinking of myself). It’s only the METADATA I mentioned before that’s interesting to be LOADED, but that can be done sequential instead of loading it parallel with the HUBs, LINKs and SATs.
I think my only conclusion can be that the suggestion to use a RECORD SOURCE HUB is not a good suggestion. Let’s just stick to the hardcoded RECORD SOURCE for the HUBs, LINKs and SATs and maybe load / store the metadata about these Record Sources in a independent database or entity within the data model (regardingless of this being a Data Vault model or not).
=== CASE CLOSED ===






















Posted in
Tags: 




Good discussion Walter. Especially your observation: ‘But these are all metadata and perhaps do not belong in the Data Vault Model. Let’s ask the question to the user / customer and let then decide (I got the feeling it’’s only maintenance that can benefit from this information, so maybe it’s better to skip this requirement…)’
I absolutely feel this information as being vital but not vital in the sense that we should connect it to a record-source hub. Aside the fact that the loading of this meta information is cumbersome, I feel that it does not serve the purpose.
So how is this meta-data vital? Well – every object in your warehouse should be made meaningfull in terms of definition, domain-values, data-owner, ‘whatever-the-business-needs-to-in-terms-of-context-to-understand-the-meaning-of-the-objects’.
I call this Business metadata and I propose a Data vault like model for this kind of data (peeps need to report/query on it). However – this business metadata should ideally come from the source systems – but source systems nowadays does generally not provide this kind of metadata (although with SOA-like architectures this should be the case). So the business metadata-owner should maintain this data.
The Data Vault is excellent in storing this kind of metadata:
- it’s temporal (e.g. what was the definition a year ago – who altered it – etc? – what was the version of the record source then!!!)
- it’s extensible (You wanna have a model that can evolve. You wanna add attribute, construct new taxonomies)
- It’s normalized (you wanna define each object once and be able to link it to verious location in the warehouse)
So…my point; you are right in saying it’s vital information. You are also right in concluding that relating it to the hubs, aint the solution.
But we must store it….Business metadata…..there are no silver bullets here – you have to design it as an integral part of the warehouse.The Data Vault is truelly excellent for this purpose.
Oh by the way…..to make it even more complex (or not…). Handle your business metadata the same way as you handle your ‘normal’ data…
Like your post Walter…
Just wanted to let you guys know that we’ve put up a site where we will be putting up all the information about anchor modeling here: http://www.anchormodeling.com