Currently I am working on one of the functional designs for a Enterprise Data Warehouse, namely the one about Data Lineage. It stroke me to see so many definitions of this subject on the internet and in books. The way I saw Data Lineage before was that it enables you to see STRUCTURAL lineage of an item within the information chain. Simply select a source column and get the impacted target objects or the other way around; select a target object and see where it came from… That was before I read the internet and found some articles and books about the subject. Even a colleague of mine had written something about it. It turns out there are three levels of lineage known at the moment;
- Structural Level
- Occurence Level
- Data Level
Structural Level (WHERE) This view of lineage is targeting at structural data lineage level, which means it is possible to track e.g. a report item (e.g. SalesValue) all the way back to its source elements (OrderAmount, SalesPrice and DiscountGiven). The other way around is also true, when OrderAmount is taken, it is possible to see where the source element is used and which report items for instance it contributes to.
Occurrence Level (WHY) The second view on Data Lineage is the occurrence level view. It has the same characteristics as Structural Level lineage, but it also provides information about the precise derivation of elements; in other words it interpreters any conditional transformations in the structure. E.g. people might be interested not only in the structural lineage of CustomerLevel (it originates from CheckAccountBalance and TotalHouseHold income for example), but also why it is ‘GOLD’. They are interested in seeing that a value of $10.000 for CheckAccountBalance and a value of $25.000 for TotalHouseHold lead to an value of ‘GOLD’ for CustomerLevel.
Data Level The final level is the most detailed level, it contains both transformation rules AND the data that went thru the ETL process. This implies that not only the metadata is retained, but also the real life data that contributed to the target value. E.g. end users might be interested in why the TotalAmountOfSales is $100.000 instead of the expected $ 95.000. When data level lineage is used, it might become clear that AmountOfSales for Europe, used for the TotalAmountOfSales (Occurrence Level Lineage) was not the expected $25.000 but only $10.000 and the AmountOfSales for Asia was $25.000 instead of $5.000.
Ok, now we know three levels of lineage, but it also starts to become complicated. Most of the tools now-a-days have the ability to support STRUCTURAL LEVEL LINEAGE, this implies they support a two way visualization of objects, impact and usage. But the other two seem to lack the two way visualization…






















Posted in
Tags: 



