In my post about modelling the RECORD SOURCE in a data-vault-like way, in 2007 (See POST), I got a reaction by Rob Mol (see Comments on the POST). I reacted by posting my answers and conclusions to his views (See POST).
One of my colleagues in the Business, Ronald Damhof, posted on his BLOG a reaction to this POST (see his article). His view is that the business metadata is data, just like the ‘normal’ data regarding e.g. Marketing or Finance. It could even be handled like the ‘normal’ data, and can be stored in a Data Vault-like model (TEMPORAL, EXTENSIBLE, NORMALIZED). I agree with him on that, but one of the question that remains is how these information streams should be linked
?
We are able to built Data Vault models for the Business Intelligence needs, and visualise the CSF’s and dashboards for the end users. We can even built Data Vault models for storing the Business Metadata, which in fact brings up a memory to an other project of mine, where the customer wanted to store this Business Metadata in a central spot. I have done an toolinvestigation (very small), and we found out that the Microsoft Sharepoint Solution was the cheapest and best-fitting solution, since the other reports where also built towards a Microsoft Sharepoint Portal (Using Cognos Cubes / Reports). Unfortunately this business metadata was not modelled in the data vault way, so … the solution was not used (until recently they started to ask for this again). Writing this experience I come up with an other thought / insight / question; how do we handle unstructured data in the data vault model?
So there are multiple questions in my opinion to answer;
-
Do we need to link
the Business Metadata to the ‘normal’ data? -
If the first questions is answered positively, how would we relate the Business Metadata to the ‘normal’ datamodel?
-
How does the Data Vault model cope with unstructured data, e.g. WORD, PDF, Pictures, Sound, etc.?
To answer the first question from my own perspective; I think it would be wise to do so (I think Ronald agrees with me on that). It gives context to the ‘normal’ data and provides valuable information about ownership, maintenance, definition changes, etc. => Data Lineage: How did the element I am looking at originate from the source after all.
The second question is very tricky in my opinion. If we use for instance the RECORD SOURCE as an example, we are not allowed to LINK to the SATs, where the RECORD SOURCE is used, also linking to the LINK would result in a recursive LINK (the RECORD SOURCE of the RECORD SOURCE of the LINK…). So, if we follow the discussion about RECORD SOURCES on this BLOG and the BLOG by Ronald Damhof, we conclude to use the natural key for RECORD SOURCES instead of LINKING to a RECORD SOURCE HUB. Thinking about this, I conclude that the DATA VAULT model for the ‘normal’ data, is a source in itself for the Business Metadata for this item. If we want to know more about the RECORD SOURCE, we just create a stand alone HUB in an other DATA VAULT model and LOAD it with all unique natural keys in the DATA VAULT model for the ‘normal’ data.






















Posted in
Tags: 




Walter,
these same discussion we have over at the DVI forums. The issue on Satellite “links/FKs” depends on the DV loading. It is not possible to resolve links to satellites within a DV load effectivly. However this is only an issue for the *active* DV being loaded. Creating links to other external (meta/master/metrcis) DV/datastores are not affected by this limitation as long as you do not destroy source data in your sats. This means that for source data business keys are the dominant foreign keys in satellite. For metadata columns you can(and should) use surrogate keys.
IMO a good solution to these metadata issues is a hybrid “one (metadata) column to rule them all” approach. This resolves most these metadata referencing issues you are seeing.