EMC World 2009 – Day 4 Recap and Day 5 Plans
Today, it seemed like every session that I attended was talking about xDB, EMC’s high-performance XML database. xDB is the product formerly known as X-Hive/DB, acquired by EMC in 2007. For the past couple of years, EMC has been selling xDB as a stand-alone product, but at the same time, they were busy integrating it with several of their existing products. But before we talk about that, I should probably bring you up to speed on what an XML database does.
An XML database is similar to a relational database except that instead of storing structured data in rows and columns, it stores semi-structured data in the form of XML files. xDB allows the use of XQuery to query the database and return chunks of XML, from an entire document (or several documents) to a single element. xDB treats the elements in XML files like the columns of a database.
Documentum has supported XML documents for over 10 years, but xDB brings that support to a whole new level. Take the following scenario, for example. Imagine you have a large XML document, like a product catalog from one of your suppliers. You get the catalog in XML format from your supplier, and you use it to generate pages on your web site that list the products you sell.
This scenario poses a challenge. You need to somehow get the data in this XML file into a form that can be displayed on your web site. Your web site is dynamic, with product catalog that can be sorted and searched, so you need to be able to query for individual products. What you need is a list of products, but all you have right now is a giant XML file with a <product> tag delineating each product.
In the past, to deal with this scenario, you would create an XML Application – a set of rules for processing the XML when it is imported into the repository. Your XML application would do two main things: first, it would “chunk” the XML file into smaller XML files, one for each product. These would all be linked into a virtual document that represented the main XML file. Your XML application would also parse out certain elements of the product (description, price, image URL, etc.) and write the values into metadata attributes on the document object. Now you have a list of product objects in the repository with attribute values that can be searched and displayed in a list. You would then write code on your web site to query the docbase and display your product catalog.
This approach has been used for years, and I used to build applications for the airline industry that used these techniques to manage huge Aircraft Maintenance Manuals. But there are some drawbacks.
First, in order for this to work, you must have a custom object type that contains attributes for all the elements of the XML file that you want to be able to search. That means you have to know which attributes will be displayed on the web site before you start managing your content. That’s often not the case. Second, you have to have a different object type for each type of product, because you’ll receive different XML files from different suppliers – the schemas will be different and the relevant attributes will be different (for one type of product, color may be important while for another product, size is important). Third, if the supplier changes their XML format or adds new properties, you have to update your object model and do it all over again.
With xDB, you don’t have to do any of this planning or object modeling. xDB can handle any arbitrary XML format, so you just import your document when you receive it from the supplier. And it doesn’t need to be chunked into smaller files, because xDB can return XML fragments when you query it. In effect, you can ask xDB to return you a list of all the products of type “Shoe” where size is “11” and color is “Black”, and xDB will return a collection of XML fragments, one for each product element that matches that criteria. No chunking or attribute population is required, and there’s no database design involved, either, because xDB does not need any up-front knowledge of the XML schemas you plan to use. Just import your documents and xDB does the rest.
It turns out that xDB is very fast, very scalable, and has very high performance. It’s fast to import documents into it, and it’s fast to query it. I saw a demo today where xDB processed thousands of documents a second on some crappy laptop hardware. Some say that xDB is faster and better than a relational database, not only for hierarchical XML data (which it certainly is), but also for normal structured data. I’m beginning to think that those people may be right.
EMC has decided to make xDB a huge part of its technology strategy, and xDB is now being integrated and embedded in several applications.
- As Part of the Content Server – In one of the first major architecture changes to the Content Server since the introduction of the Method Server, Documentum will now contain an embedded instance of xDB where XML documents will be stored. You’ll be able to combine DQL and XQuery to get access from within Documentum to the features I described above. This is a big deal.
- As Part of Enterprise Search Services (ESS) – Documentum is replacing the FAST full-text engine with ESS, and ESS is powered by xDB. You heard me right. For the past several months, they have been adding full-text capabilities to xDB (leveraging Lucene, an open source search engine), and in an upcoming release of Documentum, the embedded full-text engine will be xDB.
- As Part of Dynamic Delivery Services (DDS) – DDS is a new product built on top of Interactive Delivery Services (IDS). It allows you to publish your web content into xDB and then query it from your web site. Intel is using this to run their entire Intel.com web site, and I can see how something like this would add lots of power to our web content management solutions while actually simplifying the solution.
Tomorrow is the last day of the conference, and I’m only planning on attending two sessions before I head off to the airport:
- THURSDAY 8:30 – Customer Experience – Web 2.0 and Personalized Customer Communications Product Overview and Strategies
- THURSDAY 10:00 – CMIS – Changing the World One Application at a Time