Introduction
Previously I wrote an article about a number of problems we have seen with typical migration approaches. These problems boil down to the following major issues:
- Poor Data Quality: Most organizations planning a migration do not understand how bad the quality of their source data is. Engaging users to manually clean up this data is the most common source of delays.
- Source System Freeze: The typical migration process includes a “freeze” on source and target systems. During the freeze, users are prevented from changing or accessing critical business content. The source system freeze causes the loudest complaints from users. They need their content and they need it now! Compounding the issue is the fact that freezes often go on long beyond the expected timeline due to the time it takes to resolve the multitude of data quality issues.
- Migration of Critical Content Delayed by Non-Critical: Often users are most concerned with a small subset of their important content. Traditional methods require that the migration team must complete the migration of the entire repository before users can have access to their critical data in the target system.
- Data Problems Discovered Late in the Process: Using traditional migration methods, users don’t get a chance to see their data until it has been moved into the target, late in the process. Unfortunately this means that most of the time-consuming and difficult to solve issues are not addressed or even identified until very late in the project lifecycle.
After repeatedly working with our clients to overcome these issues we have discovered a number of interesting insights.
- Rigid One-to-One Mappings are Limiting: Attribute transformation schemes involve mapping one attribute in the source to one in the target, often incorporating a data transformation. This is classic ETL thinking, which really isn’t effective in content migrations. It assumes a uniformity of data that just doesn’t exist. In the real world users are not consistent in applying conventions, populating mandatory attributes, etc. Mapping and transformations should be designed to mirror the way users think about their data.
- Problems Not Discovered Until Seen in the Target: Many problems with data quality in the source aren’t recognized until the users see their data in the target. Because this happens late in the traditional migration process, it leads to extensive rework and extended timelines.
- High-Priority Data is Better Quality Than Low-Priority Data: The critical data that users demand quick access to is also data they use on a daily basis. This means it has more eyes on it. This data is frequently reviewed and updated, often leading to better quality data. This data will have fewer data problems and will be easier to migrate.
Design of a Better Migration Approach
Blue Fish has developed a new migration approach and tool, called Migration Workbench, based on our experiences and insights. They are designed around the following concepts:
- Incremental Migration
- Live-to-Live Migration
- Dynamic Cutover
- Rules-Based Attribute Transformations
- Total Visibility into Migration Content
Let’s take a look at some of these concepts.
Incremental Migration
One of the biggest problems with traditional migration approaches is that problems usually aren’t discovered until the very end of the migration process when users start to see how their documents are filed in the target system or realize that some documents were “lost” during the migration. When this happens, the typical fix is to roll back all migrated content from the target, fix the underlying problems in the source repository, and then rerun the migration. This process may be repeated hundreds of times until all of the errors are corrected and the process executes flawlessly. No matter how much you focus on trying to get the migration requirements right up front, there is no practical way to do this without letting the users see their data in the target system.
Migration Workbench lives with this reality and operates on the expectation that data problems will be spotted in the target system. It expects that attribute mappings will need to be changed, that source data will need to be updated, and that documents will be added and deleted over the course of the migration effort. Rather than forcing an organization to roll back the migration each time one of these things happens, Migration Workbench will detect these changes and re-migrate the affected documents. As data errors are discovered, Workbench flags the relevant documents and continues processing other content that is not in error.
Live-to-Live Migration
The concept of live-to-live migration means that both target and source systems remain available during the migration. Documents migrated into a live target system can be individually ‘hidden’ until the time is right to make them available. If those same documents are changed in the source system, the target is simply updated with those changes.
This means users are not locked out of their critical systems while the migration is underway. In fact, they will be blissfully oblivious to the fact that, behind the scenes, millions of documents are making their way to the target system. When it’s time to cut the users over to the new target system, they don’t need to know that all you did was hide the documents in the source and exposed the documents in the target. Let them be amazed at how quickly you ‘moved’ all of their content to the new system!
Dynamic Cutover
Another way Workbench makes the migration process less stressful is through dynamic cutover. Like your business users, we understand that not all content is created equal. Quickly migrating a small subset of key documents is often the key to keeping users happy. And since these documents are often the most viewed and used, they are generally the best quality data, requiring the least amount of cleanup.
Workbench allows you to identify, map, migrate, and validate your most valuable content first. The bulk of lower priority content can then be addressed on a longer timeline without holding the critical content hostage.
Rules-Based Attribute Transformation
Typical migration tools use many different schemes to map content from source to target, but they all essentially boil down to taking data from field A and putting it into field B. There may be transformations or conversions involved, but the data always goes from a field in the source to a field in the target. But what if field A was only populated for documents created in the past few months? What if some departments used field C instead of field A? In the real world, more data seems to follow the exceptions than follow the conventions. Basic transformation schemes, common to most ETL based tools, simply don’t have the flexibility to address real-world content management issues.
Migration Workbench expresses attribute mappings and transformations as simple migration rules. Complex mappings that used to take Jedi-level manipulations of page after page of mapping expressions are often reduced to one or two simple rules. And, as an added bonus, those rules match how the business analyst thinks. One requirement statement documented by the business analyst becomes one easily recognized (and traceable!) migration rule in Workbench.
Total Visibility into Migration Content
Many times in a migration, the real problem isn’t what users see in the target – it’s what they don’t see. It might be easy for a user to detect that a migrated document in a list of 100 documents has the wrong title. However, it is not as easy to see that a list of 100 documents should’ve been 101 because a document was missing.
Migration Workbench provides “Total Visibility” into your migrated documents. Every document is traceable throughout the migration process. This includes what rules triggered what data transformations or exceptions. This data can be retrieved in a set of migration reports that allow for detailed tracking of the migration’s progress, including full audit traceability.
True, it’s not easy to detect that a single document is missing from a list of 100 by looking. And even if you run a query that tells you one document is missing, it’s hard to tell why. However, if the migration immediately generates a report stating 100 documents were migrated, and 1 was not because it was missing data in mandatory field X, you suddenly have total visibility into the status of your migration.
Benefits of a Better Migration Approach
Example Scenario: Acme Plastics
To help show how this new approach to migration works, I’ve put together an example scenario outlining a few key benefits. This example looks at Acme Plastics, a company moving their Finance Department’s documents from an old legacy repository to an updated one with a revamped object model. I’ll discuss some of the problems Acme encounters along the way and how these issues are resolved with Migration Workbench and the Blue Fish approach.
Source Repository
Acme’s finance documents consist of Purchase Orders (POs) and Invoices, currently stored in two folders in a Documentum repository, /Acme/POs and /Acme/Invoices, respectively. All documents are represented by the object type of “acme_document” which, in addition to the standard attributes of owner_name and object_type, have the following custom attributes (see Figure 1):
- contact_name – the name of the contact at the vendor or customer,
- retired – a Boolean indicating the file is over two years old, and
- contract_date – the date of the PO or Invoice.
Acme personnel are supposed to follow the naming convention of including the PO number or Invoice number at the beginning of every document. This follows the format: “POxxxxx” for POs and “INVxxxxx” for invoices. At the end of the fiscal year, all documents older than two years are supposed to be filed in a subfolder called “old”.
Figure 1:
Source Repository Structure
Target Repository
The target repository has a new object model and folder structure (see Figure 2). Acme is anticipating expanding use of the system to other departments, so the root object_type for the documents to be migrated is changing to be “finance_document”, with two subtypes called “po_document” and “invoice_document”. Additionally, the PO and Invoice folders contain subfolders for each year, indicating the fiscal_year to which the document belongs.
Figure 2:
Target Repository Structure
Migration Process
Acme’s migration analyst has configured Workbench to evaluate all documents in the /Acme cabinet and has identified the following set of migration requirements:
- If the document exists in the /Acme/POs folder or its subfolders, map it to type “po_document”
- If the document exists in the /Acme/Invoices folder or its subfolders, map it to type “invoice_document”
- For type “po_document”, parse the PO number from the object_name and put it into po_number
- For type “invoice_document”, parse the invoice number from the object_name and put it into invoice_number
- Parse the year from contract_date and put it into fiscal_year
- Put all POs into a subfolder under /Acme/POs based on their fiscal_year value (i.e., /Acme/POs/2007).
- Put all Invoices into a subfolder under /Acme/Invoices based on their fiscal_year value (i.e., /Acme/Invoices/2007).
Acme enters these rules into Workbench and runs the migration. However, after the users see their data in the target repository, they discover a number of issues with the migrated documents. Below is a description of these issues and how Acme is able to quickly and easily resolve them.
Problem 1: Missing PO Documents
The first issue discovered is that there are not nearly as many POs in the target system as the business users expected to see. After looking back at the source system, Acme realizes that some POs were mistakenly filed in another folder called /Acme/Purchasing/POs.
If Acme were using a traditional ETL tool, fixing this issue would require defining an additional batch for the newly discovered folder, configuring attribute transformations for the new batch, then executing the new batch. The difficulty with this approach is that the new batch definitions will overlap with older batch definitions, causing documents to be unnecessarily migrated twice.
Solution
Workbench makes resolving this problem this simple. The migration analyst simply adjusts the existing rule to also include the newly discovered folder:
- If the document exists in the /Acme/POs folder or its subfolders, or /Acme/Purchasing/POs or its subfolders, map it to type “po_document”
Figure 3:
Discovering the Missing PO Documents
By that one simple rule modification, Workbench now knows to also look for POs in the /Acme/Purchasing folder and will identify the additional documents that are impacted. The incremental migration feature ensures that only those newly discovered documents are moved, and rules-based mappings mean that all the additional attribute transformations will be applied to the new documents just as they are for the first set of migrated documents.
Problem 2: Additional Missing PO Documents
Although adding the additional folder corrected some of the missing files, Acme users think there are still more POs to migrate. The team discovers that during a previous summer, an intern had filed a number of documents somewhat randomly throughout the Finance Department’s folder structure. But he did at least apply the naming convention of starting all file names with the PO number. As in the example above, if an ETL tool were employed, a new batch would need to be defined, mappings replicated for the new batch, and due diligence performed to make sure no documents are duplicated.
Solution
Using Workbench, this issue can be resolved by updating one simple rule. The migration analyst updates the mapping rule as follows:
- If the document exists in the /Acme/POs folder or its subfolders, or /Acme/Purchasing/POs or its subfolders, or is owned by the intern and the object_name starts with “PO”, map it to type “po_document”
Figure 4:
Discovering the Additional Missing PO Documents
When this updated rule is applied, the missing POs show up in the target. But something unexpected happened – Workbench marked them as being “partially” imported. This is because the intern is not a valid user in the new system and can’t be assigned as the documents’ owner. In this case Workbench will migrate the document, assign a default owner, identify the issue for resolution, and continue processing. The migration team notes the documents that were partially migrated and adds a rule to map the documents owned by the intern to a valid user. Then on the next iteration Workbench sees the updated rule and migrates the document again, this time completing the import.
Problem 3: Users Delete Documents in the Source
For business reasons, Acme must keep the source system live during the migration process. While the Acme migration team is performing the migration, users delete several documents from the source system. Using traditional migration approaches, there is no easy way to keep the target system in synch with changes made in the source.
Solution
Live-to-Live migrations are possible with Workbench’s incremental migration and history features. Each time the tool runs, it identifies documents that have been added, updated, or deleted since the last execution. Then it performs the necessary functions on the impacted documents automatically – adding new documents, updating changed documents, creating new versions, or deleting documents as necessary. In Figure 5 below, see how Migration Workbench identifies the documents deleted during the migration and removes these documents from the target repository.
Figure 5:
Users Delete Documents in the Source
Workbench also provides the ability to have total visibility into the life of each document passing through the migration process. This helps users understand which documents have migrated and which ones have not. More importantly it can record why a document might not have been migrated. If a rule excludes a document from being migrated Workbench will record why.
Problem 4: Some POs are More Critical than Others
Acme has a critical need to get the POs associated with the current year into the new system in order to meet internal reporting requirements. Because of the large number of documents in the system, they estimate that migrating the entire repository will cause them to miss this deadline.
With traditional migration approaches it is very difficult to migrate one subset of documents and make it available in the target while continuing to migrate additional lower priority documents.
Solution
Workbench’s Dynamic Cutover feature anticipates this problem and makes it easy to cut over content based on the users’ schedule. As Workbench migrates documents into a target system it uses repository-specific permissions (i.e., ACLs in Documentum) to make the documents invisible to the general user population, but visible to the test team to perform quality assurance. Based on the quality of the data and how quickly the data needs to be in the target the user can define the criteria that identify high priority documents and assign an appropriate cutover schedule.
In Acme’s case the analyst can add a rule stating all documents with fiscal_year equal to the current year should be set to be cutover much sooner than the rest of the documents.
- If the fiscal_year is 2008, then set cutover priority to high.
When Workbench cuts a document over, it changes permissions on the document in the source repository so that all users can no longer access the document. At the same time, it updates the permissions on the document so that it is accessible in the target.
Figure 6:
Some POs are More Critical than Others
Conclusion
If you found yourself out on the ledge after reading my previous article about the many problems encountered during a typical migration, I hope reading this has talked you back in! Here at Blue Fish, we’ve been working hard to create an approach and tool designed to overcome those problems and tame even the most complex migrations. I’ve introduced a few of the important points here. If these have caught your interest, please give us a shout or request an evaluation of Migration Workbench.