Introduction
WebCache is a Documentum tool that allows quick access to
content and (optionally) it’s associated attributes by storing
them on a flat filesystem and an RDBMS, respectively.
WebCache is intended to be used to allow a website to use
content that exists in Documentum, but without the overhead
of talking directly to the server. Once you have configured
WebCache in your environment, you can write programs for your
website that do anything you want with the content and it’s
associated attributes.
WebCache consists of two major components:
WebCache Source: The machine on which the docbase resides.
This component of WebCache is responsible for sending changed
data and attributes to the WebCache target, either at periodic
intervals (nightly, daily, hourly), or triggered by the manual
invokation of a dm_job.
WebCache Target: This consists of at least one component:
A copy of the content from the source Docbase which is to be
cached. Alternatively, it may also consist of a Database component
(the RDBMS doesn’t have to reside on the same machine), which
provides attribute information on the content that has been
cached to the filesystem.
This article describes several of the details about
Documentum WebCache in a question/answer format.
Where do documents go on the target?
After a WebCache operation, the content files are saved
on the filesystem of the WebCache target. The root
location of where these are stored can be configured to be
anywhere. They are structured in directories, based on
the path within the Source docbase.
How, exactly, are attributes stored in the Target database?
All attributes of WebCached content are stored somewhere inside
two special tables. The first part of the name of the tables
can be configured to whatever you want. In this example,
and throughout this document, we will assume WebCache has
been configured to name these tables starting with PROPS.
Single-valued attributes are stored in a table called PROPS_S:
A_WEBC_URL VARCHAR2(544)
I_CHRONICLE_ID VARCHAR2(16)
R_OBJECT_ID VARCHAR2(16)
I_CONTENTS_ID VARCHAR2(16)
OBJECT_NAME VARCHAR2(255)
R_VERSION_LABEL VARCHAR2(32)
R_FOLDER_PATH VARCHAR2(255)
I_FULL_FORMAT VARCHAR2(32)
More columns are added to the single-valued attribute table if you configure
WebCache to use additional attributes from your source documents (any
attribute can be used, but you must specify which ones)
The A_WEBC_URL column is the unique identifier of the content that is
being described. It looks like a path. The A_WEBC_URL is a key into
the multi-valued property table, too.
Multi-valued attributes are stored in a table called PROPS_R
A_WEBC_URL VARCHAR2(544)
STATES VARCHAR2(32)
In this example, STATES is the name of a repeating attribute of a
custom object type. There will be as many rows in this table for
a given A_WEBC_URL as there are STATES for that document.
Things to keep in mind about repeating attributes:
- The order of repeating attributes is preserved. That means,
if you put a bunch of values in a specific order in documentum,
you can expect to find them in the same order within the WebCache
Target DB’s PROPS_R table. - If you have multiple repeating attributes, for each A_WEBC_URL,
there will be as many rows as the maximum number of populated
attributes. Empty repeating attribute row’s entries are NULL.
For example, if you have repeating attributes
ABC and XYZ, if, for a certain document you have 3 ABC’s and
10 XYZ’s, there will be seven rows for the associated A_WEBC_URL
in which the value for ABC is NULL.
How do you control which documents are published to the cache?
In the WebCache configuration object in the docbase, you define a
starting folder, a version, and an effective label (optional) for each
WebCache configuration object.
There is no configurable “where” clause. However, if you have Documentum
WebPublisher installed, you can publish one document at a time, given it’s objectID.
So, you could emulate “where” clause behavior by writing your own
program to return a list of content in documentum, then call this
special WebPublisher publish method for each other those object.
WebCache optionally pays attention to the a_effective_label, a_effective_date, and
a_expiration_date attributes of each document. If a document has an
a_effective_label matching the effective label specified in the
WebCache configuration object, it will be made available on the target
only for the period of time occurring before the a_effective_date and
a_expiration_date specified for that document.
Note: Because Documentum WebPublisher uses these special attributes,
they shouldn’t be used in conjunction with WebCache if WebPublisher is
running on the source docbase.
How does WebCache support multiple renditions?
Multiple formats for objects are supported. In the webcache
configuration object in the docbase, you specify which formats
should be published when multiples exist.
When multiple formats exist, they are placed in the same directory.
The ‘primary’ format’s filename is the object name. Other formats of the
same object are named as the object name (minus the extension, if one exists),
plus the dos_extension of the format (from dm_format).
For example, say you have a document called testdoc2.txt.
In the docbase, here are the relevant attributes:
DM_DOCUMENT table DM_FORMAT table
______|______ ______|______
| | | |
| | | |
OBJECT_NAME I_FULL_FORMAT DOS_EXTENSION
------------ ------------- -------------
testdoc2.txt crtext txt
testdoc2.txt html htm
Target Database gets a unique A_WEBC_URL entry for each
format. The I_FULL_FORMAT value is also propagated to
the target DB:
A_WEBC_URL I_FULL_FORMAT
---------------------- -----------------------
TestFolder/testdoc2.txt crtext
TestFolder/testdoc2.htm html
The A_WEBC_URL represents the path to the document from the root directory
for webcache’s file dumps on the target.
Note: Because documents are given more or less “standard” extensions
during the webcache process, if you’re serving them directly from a webserver,
the target webserver should deliver them with
the correct MIME type. For non-standard extensions, you may need to add
those manually to your webserver’s configuration. Relevant MIME type data can be
gathered from the the mime_type and dos_extension fields in the docbases’s
dm_format table.
Is it possible to publish from two distinct WebCache sources
to one target?
Documentum says that this shouldn’t be attempted because
files will end up over-writing each other and it will end up
being a big mess.
Sometimes it’s desirable to have data and attributes from
separate docbases available on the same website.
You could do this by:
- (for content files) Setting up multiple targets,
and pretending they are one target from the webserver
site. You can create symbolic links into the target
content directories from your webserver, or (a more
drastic approach) write a website front-end that doesn’t
hit the filesystem based on the URL, but instead takes
the request and decides which target webcache file area
to retrieve it from. - (for attributes) Publish to differently named
tables for each of the multiple targets. Set database
triggers on these tables which will reflect changes to
a master table on-the-fly. This way, you’ll have only
one table to query for attributes, instead of two.
What gets copied to the cache when a source document is linked
to another folder?
If the links reside under the same webcache configured root source
folder, a copy is made on the target for each instance of the document,
and for each copy, a set of attributes exists in the Database, if
RDBMS functionality is enabled for WebCache.
$ ls -la
total 48
drwxr-xr-x 3 /articles/dmin staff 512 Jun 1 13:57 .
drwxr-xr-x 3 /articles/dmin staff 512 May 24 15:51 ..
drwxr-xr-x 2 /articles/dmin staff 512 Jun 1 13:57 InnerFolder
-rw-r--r-- 1 /articles/dmin staff 23 May 24 16:07 testdoc1
-rw-r--r-- 1 /articles/dmin staff 18603 May 24 15:51 testdoc2.htm
-rw-r--r-- 1 /articles/dmin staff 46 May 24 15:51 testdoc2.txt
$ cd InnerFolder
$ ls -la
total 6
drwxr-xr-x 2 /articles/dmin staff 512 Jun 1 13:57 .
drwxr-xr-x 3 /articles/dmin staff 512 Jun 1 13:57 ..
-rw-r--r-- 1 /articles/dmin staff 23 Jun 1 13:57 testdoc1
Can contentless objects be exported?
Although the documentation states that they can, it is currently not
possible (as of WebCache version 4.2). This has been reported as
a bug.
A workaround is to attach a 0-byte piece of content to items that don’t have
to have content. If a more recent version than 4.2 exists since the
publishing of this article, the workaround may not be needed.
Does WebCache copy virtual documents or multiple versions of
the same document?
When a document is copied, only one version gets pushed to a given
webcache target.
When a virtual document is copied, if all it’s components are
present in the to-be-webcached directory, they will be copied,
but the parent/child relationships will not.
To get around the virtual document limitation, you could:
- Not use/rely on virtual documents for your website
— or — - Instead of using the built-in relationship management mechanism
for parent/child VDoc relationships, you could use your own
attribute (i.e. a new attribute called child_object_ids and/or
parent_object_ids)
To get around the multiple versions limitation, you could:
- Create a job in documentum to split up the versions
beforehand, into separate folders. Then do multiple
WebCache jobs… one for each source folder.
— or — - Create a job in documentum to split up the versions into
different objects beforehand, each with a name that indicates
it’s version. Then copy them out into the to-be-webcached folder
and do a WebCache job. You should get all versions that way.
— or — - Create multiple webcache targets for the same source. Each
target would be configured to copy a specific version.
Of course, this creates the problem that multiple targets
aren’t seen as one cohesive set.
See the answer to the question “Is it possible to publish
from two distinct WebCache sources
to one target?” for ideas on making two targets appear
as one.