Extracting active data from operational sources (Legacy Systems, ERP, Flat Files, Non-Relational, etc) can be a major chore. Typically the extract process grapples with a gamut of challenges from locating appropriate data sources, data structures and analyzing and understanding them to sourcing, cleansing and transforming the data per the target system needs. This entire extract process is usually run on a regular and frequent basis.
But there is another extract process run only once, essential to load a data warehouse with all the off-line historical data. Historical data in this context refers to the non-active operational data, i.e. data representing previous months, quarters and years. Most operational systems do not hold more than a few weeks or month of historical data, i.e. active data. So where then does this operational data originate? Well, from a nice archived library of back-up tapes usually.
Can the extract process designed for the current release of operational sources be used to extract this (initial load) historic data? Probably, but usually with some changes. These changes are essential since the back-up tapes may represent an earlier version and release of the source system. This earlier version may have a completely different normalized data model, different entities, attributes and relationships. There may be certain attributes missing, additional attributes, unexpected data values and new codes. These are just a few examples of the changes.
If this issue is discovered unexpectedly while trying to load back-up tapes, just prior to the production date it can lead to an embarrassing delay in the production date, or absence of promised data at launch. To address this issue unplanned extract code is to be written and tested!
To avoid this situation a back-up tape data audit/analysis should be planned and executed at the same time as the source system(s) analysis. Document findings and highlight the differences between the current operational version and the back-up tapes. In addition, make sure that all back-up tapes are available, so as to provide a continuous data history. With this information as a basis, the need for separate code and routines to extract data from the archived back-up tapes can be identified early and potential production impact avoided.
For more information, check out SearchCRM's Best Web Links on Business Intelligence and Data Analysis.