This article explores six hypothetical migration scenarios from a high-level. The intent of the article is to help administrators determine the type of migration they are facing down and what the effort will involve. This article should also raise awareness about the different types of ‘migrations’ and clarify the difference between content migrations vs. system upgrades.
Note: All of the scenarios discussed in this article involve a content management system. Documentum is used as the example throughout the article, but the general concepts apply to all enterprise-level content management systems.
In the content management field some confusion exists surrounding the definition of the term ‘migration’. This is probably due to overloading of the word by software vendors and industry publications. Merriam Webster defines migration as, “movement from one country, place, or locality to another.” As it applies to corporate data, a migration really means the physical movement of the data from one place to another. This is not to be confused with system upgrades which, although they may alter the structure of underlying data, are typically done in-place purely for the purpose of updating software.
To make things a little more confusing, system upgrades and content migrations are often carried out simultaneously as part of enterprise-wide software “overhauls” that involve multiple software subsystems. Nevertheless, the two are distinct efforts that although they must be coordinated,involve discrete consideration and planning. This article tries to clarify this distinction and help administrators determine the size and nature of the migration efforts they face.
Six common scenarios are described below. Each scenario includes a high-level description of the procedures involved, the major considerations and dependencies, and the potential risks.
The in-place upgrade requires least effort and yet poses the greatest risk of any of the scenarios covered in this article. An in-place upgrade involves the installation of new or patched software over an existing production system. This type of task is typically performed after-hours by a single system administrator. Although it sounds simple, this scenario should not be considered without a full, recent, and proven backup of the affected data. If the upgrade process fails and cannot be rolled-back or recovered, then the original system must be restored from backup.
In-place upgrades, when they work as planned, are normally quick operations that can be executed overnight or on a weekend when the organization can tolerate a few minutes or hours of down-time. One reason that in-place upgrades are fast is that, by definition, they do not involve the export and re-import of any managed content or metadata. System upgrades by themselves are not content migrations.
Clone and Upgrade
The additional step of cloning or copying the original system before an in-place upgrade helps to mitigate the risk of a failed upgrade process by providing the ability to perform one or more proof-of-concept upgrades and if necessary an instant roll-back to the original state of the system. In the clone and upgrade scenario, a complete functioning copy is made of the original system such that it can be brought up in parallel to the original system. System administrators can practice the upgrade process on the cloned system until they are comfortable that the upgrade can be executed without error. Finally, when a comfort level is reached that the upgrade will work successfully, an up-do-date clone is created and then upgraded to become the new production system.
The clone and upgrade process increases the required downtime for an upgrade by the amount of time needed to create the copy of the system. This is the safest approach for system upgrades.
This common migration scenario occurs when an organization finds the need to import a large amount of content from a file system into a managed repository. This scenario is known as a bulk import or a file ingestion process. Many vendors of content repository software include a built-in facility for performing bulk imports of content and metadata into the system.
One major consideration for planning a bulk import is that it is often useful to populate the attributes (metadata) of imported objects and map them into appropriate repository folder locations. Unfortunately, there may be no readily available source for the required attribute value and folder location information. There are several options for providing metadata during a bulk import.
Solutions can use one or more of these options:
- Extract metadata values from the file system objects that are imported. For example the Windows OS stores the creation date, modify date, title and other information for each file on disk. Although it requires custom code, this information can be programmatically extracted through the API for the OS.
- Defaulting metadata values based on folder path and/or file name. This option requires custom code to decide on default metadata values by using a provided mapping or other business logic based on the location and file name of imported objects.
- Provide metadata values for each imported object in a spreadsheet or database. In this scenario, custom code marries the provided metadata to the files as they are imported.
- Queue objects for manual metadata population. This can be done during the import process or in the repository after the bulk import is complete. Expert users key in the necessary metadata before or after the files are imported into the repository.
Option One: Backend Copy – This option requires that the underlying database schema is the same for the source and target repository. Generally, this means that the source and target systems are using the same software and software version (For example, moving from a Documentum 5.3 SP2 repository to another Documentum 5.3 SP2 repository). In this scenario, the system administrator works with a DBA to extract the backend tables from the source system and then import them into the target system. Afterwards, the source repository’s filestore (content on the file system) is copied to the corresponding location in the new system. All of this work is performed at the database and file system levels.
A complete technical description of the backend copy process is covered in another article titled, “Moving a Docbase.” Because of its low risk and simplicity, this is the preferred scenario for a simple content migration. It is important to note that this procedure bypasses the software API of both the source and target systems. Depending on the size of the source repository and its proximity to the target system, this procedure can usually be completed in one or two days.
Option Two: Extract and Re-import – This option involves using the repository system’s API to dump the contents of the source repository to an intermediate location on a file system, and then importing all of the contents into the target. Some repository systems include a built-in utility for extracting all of the contents and metadata to the file system. For example, Documentum provides a utility called ‘dump and load’ (see “Moving a Docbase article for more information). Third parties also offer extract and re-import utilities, such as DIXI (Documentum Import eXport Interface) for Documentum from ArgonDigital. Some third party utilities offer the advantage of dumping the metadata to an accessible format such as XML or CSV. This allows for intermediate data inspection and transformation in more complex migration scenarios.
Complex migration scenarios occur when content and metadata must be moved from one (or more) source repository to a target system that has a different object model and/or folder structure. This introduces the need to map locations and attribute values between the source and target models. Sometimes attribute values must be transformed, defaulted, or validated against new constraints in the target system. This scenario requires quite a bit more planning and preparation than a simple ‘one-to-one’ migration. Depending on the level of complexity, such a migration may take a team of 2-5 system administrators, business analysts, and developers several months to design, implement, rehearse, and complete.
One proven method of attack for complex migrations is to export the source data to XML using a utility like DIXI, and then perform the necessary mappings and transformations on the XML programmatically (for example using XSLT or Java programs). Finally, the re-tooled XML is used to import the appropriately attributed objects into correct locations in the target system.
Another consideration in complex migrations is the need for manual data enrichment or inspection while the data is in the intermediate state (manually checking or changing the data before it gets imported). This can be accomplished by using custom data indexing and inspection database applications that can import and export the metadata in its intermediate state, XML. See the article, “Business Considerations for Content Migrations” for more detailed information.
This scenario involves both a content migration and a system upgrade. These are separate efforts that, because of their inter-dependencies, should be planned and coordinated together. For example, the content migration may need to map attributes based on a changed object model in the upgraded system.
The high-level tasks in this scenario can be approached in the following order:
- Create an empty instance of the new/upgraded repository system. Install the new object model and folder structure in the target system, but do not create any content.
- Based on the new repository’s structure and object model, determine the requirements for mapping source objects to target objects (attribute values, folder locations, security model, etc.).
- Export the source content and metadata to an intermediate format that can be accessed programmatically (like XML).
- Create scripts or programs that can transform the intermediate data based on the specified mappings.
- Optionally create enrichment and/or inspection databases that can view or edit the intermediate data.
- Import the transformed metadata and its corresponding content into the target repository. Manually or programmatically validate the migrated data. Rehearse this step as many times as necessary.
A general understanding of the tasks at hand and their inter-dependencies is critical to being able to plan and execute a successful content migration, system upgrade, or both. Determine which scenario(s) describes your problem, address the risks, and plan and coordinate the required activities with inter-dependencies in mind.