Victor Spivak gave his annual Documentum Architecture presentation today. The main themes he covered were Service Orientedness (my term), Enterprise Readiness, and Complexity Reduction.
He also dropped a real bombshell about a potential change in the architecture.
Service Orientedness
Documentum Foundation Services (DFS) is Documentum’s web services API. It’s been around for a few years, and EMC is promoting as the best way to interact with the Content Server. Developers are encouraged to use DFS instead of Documentum Foundation Classes (DFC), which they now consider a low-level API. You use DFC to create services, and you use DFS to create applications.
EMC is taking this advice to heart. Centerstage and Media Workspace are two recent Documentum products that are built on DFS (as opposed to Webtop, which is built using DFC).
There are 30 Core Services in DFS, but not all the functionality in Documentum is exposed through DFS. They only add the services that they need (or for which there is popular demand).
DFS is a SOAP-based web service, which is suitable for strongly typed languages such as Java and C#, but is apparently not so good for dynamic languages like Ruby or JavaScript. Developers that use dynamic languages prefer RESTful web services, which EMC does not currently support.
Victor is a fan of RESTful services, and he was looking for some support from the attendees that he could use to convince his managers to invest in adding support for REST. Victor asked how many people would really use RESTful services if he were to build them, and in a room of over 200 people, only TEN raised their hands. Victor was crestfallen.
This raises a question. Why don’t Documentum developers care about REST when other developers in other fields are so passionate about it? My (completely speculative) opinion is that the day of the Documentum developer is mostly behind us. Most people in the room were Documentum implementers, not developers. They are not tasked with building interesting new applications that mash-up data from different sources while integrating the result into some internal portal or other kind of web application. They don’t spend all day every day developing, so they don’t value using scripting languages like JavaScript to decrease development time.
Contrast this with Alfresco’s developer community, which is as likely to be building a content-enabled web application as they are to simply implement the technology as a document repository for their office documents. As a result, Alfresco’s SOAP API has being placed on the back burner while their RESTful services get constant attention.
CenterStage – An Example of Using DFS
Victor cited Centerstage as the shining example of how EMC wants to build apps from now on.
He noted that one of the big limitations of WDK is that it does not separate presentation logic from application logic. This is a problem because presentation technologies evolve very quickly, changing every couple of years. Keeping up with these changes would mean re-writing the entire application constantly, but ignoring these changes makes your application seem outdated and clunky.
With Centerstage, the presentation logic lives in the browser, while the application logic lives on the server. EMC uses a JavaScript toolkit to make requests to the DFS services which return their results as XML files. The browser then renders the user interface by parsing the XML file as a data file. This is different from WDK which generates the UI on the server, renders HTML files and the browser simply displays the HTML file.
One of the benefits of this approach is that it allows the presentation tier to use a mix of technologies, such as Ajax and Flex. Centerstage is mostly a JavaScript application, but it has a few Flex components for doing things like viewing thumbnails.
Because DFS is a SOAP-based web service, and those are hard to use in applications like this, EMC has written their own middle tier in the browser. This allows the browser to use direct web remoting (DWR) to communicate with DFS.
Victor told us about a couple of interesting technology decisions they made. When Centerstage began development 3 years ago, ExtJS was the JavaScript toolkit they chose. But if they were to make the decision today, they would probably use JWt (Java Web Toolkit) GWT (Google Web Toolkit).
The other interesting decision they made was to use Mozilla XUL to define the layout of their user interface. Then they translate the XUL XML file into ExtJS in the browser via XSLT.
Enterprise Readiness
A big focus area for Documentum 6.6 was to improve what EMC calls “Enterprise Readiness”, which I took to mean “making it easier for big giant enterprises to run and support Documentum.”
One way they have done this is to improve Documentum’s ability to ingest very large volumes of data. Version 6.5 is already very good at this, and with 6.6, they are making it even better.
In version 6.5, Documentum added batching (importing multiple documents in a single batch), scoping (batching related documents together), partitioning (storing related documents together), and lightweight sysobjects (storing fewer attributes when the complete set of metadata is not needed). Together, they allow Documentum to ingest 5 times as many documents in an hour than in previous versions.
In version 6.6, Documentum has improved the batching and partitioning and has added some additional improvements:
- True null support – reduces the storage required for attributes with no value
- Use of small int (2 bytes) and tiny int (1 byte) where possible to save database storage
- Storing boolean values more efficiently to save storage space
- Storing ID attributes more efficiently in MS SQL Server
Documentum also added some DQL Enhancements that improve sub-selects and joins (they added support of left outer joins).
I’m excited about the new support in DQL for pagination, or returning only a certain subset of the rows that result from the query. This is useful in web applications when you have “pages” of results that you want to the user to navigate through.
In the past, you had two options, neither of them good. You could keep the database cursor open and wait for the user to request the next page of results (this is expensive and you never know if the user is going to request another page or not). Or you could rerun the entire query and manually discard the rows that you’ve already returned. Pagination lets you rerun the query each time while specifying row number to start with. DQL will only return the small result set you want to show to the user.
Documentum Search Services
Documentum is releasing a new search engine called Documentum Search Services (DSS). This will replace the FAST search engine. Apparently, FAST makes it much too difficult to support a really large number of documents (100 million+).
DSS stores the metadata in xDB (EMC’s XML database) and the full text index in Lucene. This allows it to issue queries across both metadata and full text very quickly. One advantage is that since xDB contains all the ACL metadata, it can weed out any results that the user does not have permission to see. With FAST, each item in the search result had to be checked against the Content Server’s ACL metadata before it was returned to the user. EMC says that this alone makes DSS 10 times faster.
DSS has other “enterprise readiness” features as well.
- It provides low-cost high availability (n+1 servers instead of 2n servers)
- Support for vmware (FAST didn’t support it)
- Support for NAS
- Support for Disaster recovery
- If you are running a distributed architecture, less data will be shipped across the wide area network during indexing because the text extraction takes place at the remote server and only the “raw text” is sent across the WAN to DSS
Other Platform Improvements in 6.6
- Documentum added support for ATMOS, EMC’s cloud storage solution.
- Java Messaging Service (JMS) will failover will to a JMS server on other content server if one is available
- DCTM Messaging Service will now failover (this is mostly used in distributed content architectures)
- 6.6 adds Kerberos support at the content server, DFC, and DFS layers
- In response to requests from IT shops that don’t want to support Java on their Windows desktops, EMC is releasing UCF.net, a pure Microsoft version of Documentum’s content transfer solution
Complexity Reduction
The big talk around Momentum is about xCP, Documentum’s xCelerated Composition Platform. Victor says that everything about xCP is about reducing complexity so that it’s easier to create and deploy Documentum-powered content applications. He called it the “Visual Basic of ECM.”
xCP uses a model driven (declarative) approach to reduce the need to write code. You “declare” what you want the application to do (via configuration) and xCP will assemble the application that you have described.
xCP 1.5 is based on WDK, but it’s being shelved for a new version, xCP 2.0 that will be based on a rich internet application model similar to Centerstage. Victor hinted that the architectural patterns will be similar to Centerstage, but the implementation technologies are likely to be different in xCP 2.0.
Victor called this a “very ambitious project”. xCP 2.0 is targeted for the end of next year (2011).
Documentum Rejecting the Relational Database?
The final item I’ll leave you with is the most exciting from an architecture perspective. Victor told us that they have been thinking about ways to use xDB as the metadata storage mechanism for Documentum, replacing the relational database that they use today.
This is 3-5 years away, if it happens at all, but it’s an incredibly intriguing idea. Eliminating the database would dramatically reduce the complexity and overhead required to operate a large, distributed Documentum environment, and while Victor said that he personally thinks it’s the right thing to do, there is a lot of thinking and testing to be done first. He has a small team working on it now, and Victor told the audience that while he’s making no promises, he hopes to be standing at an EMC World 2-3 years from now revealing it as an upcoming architectural change.