This is the sixth post in a series I am writing to convey lessons I have learned about Data Migration and the third post on ETL. A 3-part discussion of Knowing Your Data can be found here. The first post on ETL is here.
We at ArgonDigital are always happy to discuss data migration with you and explore ideas about making your project go smoothly.
This week, I want to continue discussing Extract, Transform, and Load (ETL).
- Extract is about getting data out of the legacy system.
- Transform is about preparing it for the new system: mapping fields, forming new record structures, and cleansing errant data.
- Load is about putting your clean, properly structured data into the new target system.
In my previous posts, I posited that ETL is not a process in and of itself, but rather a set of activities which will need to be organized into a process design that is specific to your project. I discussed the extract and transform activities, and how the conditions and assumptions concerning how to extract and transform data varied widely across the three very large projects that helped customers replace legacy systems (a digital rights system, a financial system, and an asset management system.) This week, we’ll complete the tour of ETL by describing how different conditions affect the way teams approach the loading activities.
Loading Activities in the ETL Pattern
Loading is the set of activities your team will need to complete the migration of your data to the target system. The complexity of loading depends largely on the capabilities and assumptions of the target system. Loading may include file uploads, inserting data using SQL tools, inserting data via application programming interface (API) calls, or even running the upload through a vendor-provided tool. It might include reestablishing links between business data objects: for example, when migrating an Order Management system, each order must be linked with a customer record, and the records of the products on the order. Linking might be done in the intermediate environment, or in the target system. For business data objects (BDOs) with a small number of records, linking could even be done manually through user interfaces.
The general pattern of loading activities is, again, highly dependent on the way the target system stores your data, and its options for loading high volumes of data. It’s critical to understand the order you will need to load BDOs – e.g., Customers, then Products, then Orders, etc. It’s also critical to understand how this process layers with your environments such as Development, Test, and Production. Load may include turning on connections with other production systems, so you’ll need to decide when and in which environment(s) to do that.
Next, you will need to design the loading process, including any final refresh from source systems, and how you will link records in the new system. If human validation is required before the new system is “open for business”, that must be part of the process, and you will need to involve business leaders in planning for that phase. As you can see, Load will depend on many factors, which you can assess by answering these questions:
- How much data will you need to load in this increment of migration?
- Are there any system architecture considerations which might affect loading performance?
- Is there a defined time window for the migration event?
- Does the target system need to run calculations or computations during the loading process?
- Will linking be done during loading? If so, will it be done manually?
- In which environments will loading activities take place? If in a lower environment, what additional arrangements are needed to push the data into production?
- How will you validate the data once loading is complete? Is there a planned activity with committed resources?
Each of the three large migration efforts in which I have served has had distinct flavor in its loading activity.
Project 1: Digital Rights System
A large media company was replacing its home-grown digital rights systems with a new SaaS built and maintained by a third-party vendor. The team designed loading activities according to key facts on the ground such as:
- Source data for many of the BDOs changed frequently, so that it was essential to begin migration by performing a final pull and scripted cleanse prior to loading.
- Records for the cornerstone BDO were manually constructed in the intermediate environment and would have to be validated in the target system post-migration.
- Other BDOs would need to be linked to that cornerstone BDO – some by script in the intermediate environment and some via the target system’s UI.
Data could be moved successfully between lower environments and production, so loading could be performed and validated in Test.
Project 2: Financial System
A large financial services company was replacing its obsolete lease accounting system with a new on-premises system built and maintained by a third-party vendor. The source system would be deprecated immediately upon start of migration, and the business would be run on the target system from completion of migration. The team had to design Load activities in such a way to minimize the downtime accordingly. Factors affecting the loading time included:
- Data changed constantly on the source system, so migration would involve the full cycle of extracting all in-scope records, applying transformations in the intermediate environment, and then loading to the target system.
- Linking would be done in the vendor-provided intermediate environment.
- Loading would be executed by the vendor’s migration toolset and had to be loaded directly to the production environment.
- Loading activities included calculating financial values, with a requirement to be accurate within tight tolerances compared to the old system’s calculations.
- All business units would need to run extensive validation to ensure all in-scope records were properly loaded and financials met the strict variance requirements.
- Each business hour spent doing the migration represented downtime for the business: new contracts could not be created, customers would not be invoiced, and payments could not be posted. All daily system chores would need to be run in catch-up mode at the conclusion of migration.
Project 3: Digital Asset Management System
A media company was replacing its in-house digital asset management system with a new cloud-based system built and maintained by a third-party vendor. As with the Transform activities, the team’s design for loading was influenced by the massive volume of digital assets to migrate:
- Millions of assets would need to be moved from the source system to the target. Each asset was print-ready, and thus very large on disc, so even a modest increment required pushing a lot of bits.
- The target system performed some computation during Load, limiting the top speed of loading the assets.
- Each asset had an associated metadata record, with dozens of user-entered fields, which would be cleansed after the Transform phase.
- Many assets contained embedded metadata from the camera, which the system normally would copy into attribute fields. The team wanted to suppress this behavior in favor of loading cleansed data from the intermediate environment.
- The source and target systems both lived in the same vendor’s cloud, albeit in different system spaces, and the team wanted to avoid having to incur time to download and then upload.
- Fortunately, there was no business need limiting the migration to a narrow window. Extract and Transform activities were not done “on the clock”.
Project Conditions Drive Approach
The conditions under which each team needed to plan the Load (and overall ETL) drove very different approaches to the migration process. The table below summarizes conditions and implications on extraction process:
Project | Conditions | Approach |
Digital Rights System Replacement |
|
|
Financial System Replacement |
|
|
Digital Asset Management System Replacement |
|
|
We’ve spent a bunch of time examining Data and ETL patterns. We’ll change course next week and discuss People, through whom you will succeed or fail in your data migration.