Introduction
Failover is important in critical applications where it is important that users continue to have access to the application even if components of the application fail. In most cases, however, failover capability is only one of the objectives that guide the design of a solution. A number of other capabilities, such as managing loads and providing scalability, are also provided for critical applications. Often the means through which failover is achieved also provides a number of the other related benefits. Any discussion of failover therefore necessarily involves discussion of other capabilities.
Components of a Documentum Application
Documentum applications can consist of a number of components and configurations and enumerating failover options for all of them is beyond the scope of this article. This article discusses failover options in the context of the following three layers that are present in most Documentum application architectures and the components within each layer.
- The Web based Documentum client layer containing typically a Web Server and an Application Server;
- The DocBroker layer containing a DocBroker; and
- The Content Repository layer; containing Content Servers and Docbases.
Web Based layer
With the release of Documentum 5, the Web based layer has become an important and integral component of the Documentum Platform and nearly every failover strategy will need to provide options for this layer.
The web based layer of the Documentum 5 platform is distinguished from the earlier versions of the platform in two critical aspects.
- It uses the J2EE platform as the standard for all web based applications; and
- It provides in the Web Development Kit (WDK) a base framework for building all client applications.
All core Documentum 5 platform client applications, Documentum Webtop, Web Publisher and Documentum Administrator (DA), etc., are implemented using WDK and are designed to run on the J2EE platform.
Failover in the Web Layer
Web Applications provide failover through the use of application server clusters. Clusters contain multiple servers that can serve client requests, and their usage ranges from the provision of load balancing capabilities to failover. In a simple load balancing configuration, HTTP requests are distributed across the web server cluster using a variety of mechanisms that differ with respect to the algorithm used to distribute HTTP requests across servers.
While failover and load balancing both use multiple servers in their configurations, their concerns are fundamentally different. While load balancers need only distribute loads by assigning clients to servers, failover needs to actively move an existing client connection from a failed server to another running server.
For most web applications, including all WDK applications, this means ensuring that all active session state information associated with the client is available on the server to which the connections from the failed server are transferred. This is achieved through some form of state replication mechanisms built into server clusters.
The state information carried by these servers has two distinct components, state information related to the HTTP session and state information related to the application components and objects. HTTP session state consists of information such as the current page requested (for example the page that displays the contents of a particular folder) and information remembered by the HTTP session (in the form of cookies, etc.) such as the current Docbase, current user etc. Application component and object state consists of the state of underlying objects such as DMCL sessions, connections, etc. For seamless failover, both the web state as well as the application state must be replicated across the clusters.
WDK and Clustering
Presently (as of Q4 2004 / WDK 5.2), WDK applications do not provide support for failover as state information of a WDK application can not be replicated (For this reason even Load Balancing of WDK applications does not work with Balancers that can redirect users to a different server in the middle of a session). Implementing WDK in an application cluster and then simulating failover will result in the user seeing a timeout and being redirected to the login page instead of being able to seamlessly continue their web session.
The lack of failover capability for WDK applications has primarily to do with the DFC/DMCL state not being replicated and consequently the lack of session failover. Documentum has plans for providing session failover in a near term future release of WDK.
The DocBroker layer
DocBrokers play a relatively transient, albeit important, role in a Documentum application by providing clients with connection information to a particular Docbase. DocBrokers keep track of Content Servers and their Docbases.
The mechanisms through which this is achieved is fairly straightforward. Whenever a Content Server process starts it notifies its availability to the DocBroker and continues to do so periodically. The DocBroker in turn makes this information available to clients that need to connect to a Docbase.
An Content Server can notify many DocBrokers of its availability and multiple Content Servers can notify a single DocBroker. The first of these forms the foundation of providing failover in the DocBroker layer.
DocBroker failover
Given their transient role, the term failover is a bit of a misnomer in the case of the DocBrokers; the term being normally used in the context of resources or processes that are continually being accessed. Nonetheless, the failure of a DocBroker in an environment that has only one DocBroker will lead to failure in clients being able to get information on Docbases after the point of failure and hence the need to provide some basic failover capability.
Failover in the Docbase layer is achieved by providing means through which:
- A Content Server can notify multiple DocBrokers; and
- Clients can check availability using multiple DocBrokers.
Documentum provides for means to achieve both the objectives, with these capabilities being built into the Content Servers as well as the client layer (DMCL) and allows for configuration using simple settings in the respective configuration files.
Configuring multiple DocBrokers
Documentum provides two means by which the DocBrokers notified (the Documentum term for this is projection) by the Content Server can be specified. The traditional way of doing this has been through the server.ini, file but this can also be done using the server config object.
The DOCBROKER_PROJECT_TARGET sections of the Content Server initialization file (server.ini) are used to specify the projection targets. Up to 51 projection targets can be specified with the first being specified in a section named DOCBROKER_PROJECTION_TARGET and additional targets specified in sections ranging from DOCBROKER_PROJECTION_TARGET_0 .. DOCBROKER_PROJECTION_TARGET_49. Alternately, Documentum Administrator can be used to make similar changes to the server config object.
This simple setup provides servers with the ability to project to multiple DocBrokers. The next step is to setup clients to use multiple DocBrokers; the client configuration file, dmcl.ini, contains sections for defining multiple DocBrokers.
Each client configuration must have one section named DOCBROKER_PRIMARY and can have 256 backup DocBroker sections named DOCBROKER_BACKUP_0 to DOCBROKER_BACKUP_255. These two sections allow for the definition of available DocBrokers, two addition keys control how connections are handled when multiple DocBrokers are available and are of more importance in the design of failover.
- auto_forward_request: If this paramter is set to T requests will be forward down the line of DocBrokers in the event that the primary DocBroker fails to respond. If set to F, only the primary DocBroker will be checked and in the event of failure of the primary DocBroker a connection error will be reported to the user.
- docbroker_search_order: This specifies the order in which DocBrokers are searched. The possible values are sequential and random. If set to sequential the client layer selects the next DocBroker in the list and if set to random it selects a DocBroker randomly. If the auto_forward_request parameter is set to true the following DocBrokers will continue to be searched based on this parameter.
The Content Repository Layer
The Content Repository Layer comprises of the Content Server and the Docbase. The Content Server is a process that provides content related functionality, while the Docbase represents the underlying data of the repository.
The Content Server processes and manages all the resources associated with the Docbase. It does not, however, directly deal with storage of the data. All attribute data associated with the content (as well as base server configuration data) is stored in a Relational Database with the content itself being stored in file stores typically implemented as Network Storage devices.
The primary failover need from a Documentum perspective is to manage the failure of the Content Servers. The responsibility of managing failures of stored data can be delegated to the Relational Databases and Storage Devices all of which provide failover capabilities, most of which are quite robust. While Documentum does provide Docbase replication capabilities, they are primary useful for managing performance in a distributed environment.
Content Server Failover
There are a number of options available for providing failover at the Content Server level. The use of underlying operating system level clusters was covered in the previous article, and it remains an attractive option for Content Server failover.
Documentum also provides an inherent means through which some form of failover capability can be provided at the Content Server without the use of clustering operating system level configurations. The Content Server and Docbase architecture provides for multiple Content Servers for a single Docbase. The multiple server configurations are designed primarily for Load Balancing and High Availability and provide failover support in the sense that users continue to have access to the Docbase in the event of a server failure albeit with loss of the current session / transaction.
Multiple Servers
The process of installing a new Docbase sets up the primary server associated with the Docbase. Additional servers can then be setup to provide load balancing, high availability and some associated failover. There are a few restrictions that apply to these additional servers:
- They must be setup on homogenous machines with respect to the operating system; and
- They must run the same version of the operating system.