Resource Center | Support | Contact Us

« Technical Publications and Content Management: An Interview with Joan Lasselle | Main | Analyst Reports on ECM Don't Tell The Whole Story - Why You Should Research What's Not Said »

XML Content Management Systems: Are They Are Right For Technical Publications?

As more and more technical publications teams make the move to content management, it's important to examine and understand the difference between relational database and object-oriented XML content management systems. Determining which type of system will work best for you depends on several factors. In this Astoria Blogs interview with Eric Kuhnen, Director of Product Management for Astoria Software, we'll help you understand which type of system is most appropriate for managing granular, topic-based technical documentation and we'll explore the reasons why one type of system is more desirable than another.

Eric and I also examine several alternative approaches some creative tech pubs departments have considered (for instance, using a code database as a content management system -- or using Microsoft SharePoint) and why they're not appropriate solutions for technical documentation teams. And, to close out the interview, we discuss the issue of cost (what is the cost of free?) when considering whether to implement an open source or a commercial content management system. This is a technology interview every technical publications professional needs to read.

CHIP: There's some confusion in the technical publications space about the various types of content management systems on the market today. A common question relates to the appropriateness of one type of content management system over another. I was hoping you would help our readers better understand why some types of solutions are appropriate for the demands of technical publications.

Our first question is a basic one. What is a content management system and what is it designed to do at its most basic level?

ERIC: That's a good place to start. In its most basic form, a content management system controls the storage, indexing and retrieval of things, be they text, images, or some combination of the two. The basic functions of any content management system are the same: open (or select), close, read (or retrieve), write (or update), create, copy, and delete. It's not surprising then that several classes of systems each qualify as a content management system, from common file systems to highly specialized native-XML databases. What separates one class from another is the type of content under management. For example, a file system gives you the basic tools to manage files; a relational database is well suited to managing characters and numbers; and an XML content management system is optimized for XML-based content.

CHIP: Okay, so can you provide us with a basic description of an XML content management system?

ERIC: Well, like all content management systems, an XML CMS controls the storage, indexing and retrieval of XML objects. Nevertheless, and this is a key point, precisely because the content is XML, an XML CMS is optimized around hierarchical and attribute management of XML content. Hierarchical refers to the parent/child relationships that exist between XML objects in a document, document type definition or schema. Attribute refers to the characteristics of an XML object itself. It's the ability to handle basic CMS functions combined with a specialized implementation for managing XML hierarchies and attributes that defines an XML content management system.

CHIP: XML content management systems are often misunderstood. How are they different than content management systems built upon relational databases?

ERIC: I like this question not simply for what it asks but also what it reveals; specifically, the inherent difference between what an XML content management system is, and how an XML content management system is implemented. Let me draw on a common example: transportation. In considering the problem of urban traffic congestion, it would be reasonable to compare two modes of travel, perhaps trains and cars, without considering which automobile manufacturer is better than another. Similarly, while charting the rise in structured content, it's reasonable to debate the various methods for managing structured content, and then address various implementations separately. No question; an XML content management system is the best method for managing structured content encoded in XML. And there are both established and fledgling vendors offering management of XML content. But, which implementation is best suited to the problem of managing structured content? Logically, you would have to give the nod to a system that handles XML object hierarchies and attributes through pointers versus a system that uses tables and relational calculus.

CHIP: Why? Doesn't that lead eventually to a question of preference?

ERIC: Not at all because you can measure the technical and business advantages of an object-based XML content management system over relational-based XML CMS. Look, in an object-based XML CMS, hierarchy and attribution are represented by object pointers. If you borrow text from another document in a content reuse exercise, or add attributes to an object to allow for a new output format, an object-based system merely rewrites or adds object pointers; the objects themselves remain intact. But a relational-based XML CMS uses tables, rows and SQL statements to mimic object hierarchies and attributes, and the resulting overhead to manage and join tables and rows imposes processing bottlenecks. Altering object relationships, particularly when revising an XML schema, involves the expense and time of a relational database administrator. Driving higher levels of content reuse means managing more rows in each table, which drags down the performance of SQL table joins and overall query processing. These costs and penalties simply don't exist in an XML CMS built on a true object database. Any business executive will appreciate metrics that show significantly more efficient output from Team Z using an object-based XML CMS than Team Y using a relational-based XML CMS, and simply subordinate any claims to preference. The metrics will show that he's got the right tool for the right job.

CHIP: Tech pubs departments are some of the first to embrace content management and XML technologies. But, they are not very experienced tool shoppers. Can you help us understand the types of organizations that would need an XML content management system?

ERIC: Speaking broadly, any organization or department looking for higher levels of content reuse needs an XML content management based on object technology. The thinking goes like this: it costs less to reuse high quality content than to create new, high quality content; so the more you reuse, the more high quality output you generate per unit of time. Then someone will ask, How granular does my object model need to be because there have to be trade-offs between reuse of entire documents versus reuse of paragraphs and sentences. The answer to that question tells you which departments and organizations need an object-based XML CMS. For example, marketing communications departments really need an object-based XML CMS because they constantly snip a sentence here and sentence there from different sources to build data sheets, product brochures, and similar non-narrative collateral. Similarly, customer support organizations and application engineering teams have a huge need for topic-based, non-narrative content for their product notes, problem-resolution knowledge systems, and other customer-specific documentation.

CHIP: Can't relational databases handle these tasks? If not, why not?

ERIC: Oh, it would be wrong to say that they can't handle object reuse, but if you read closely to what XML CMS vendors who use an relational database are saying, they always talk about "chunking" or "minimum reuse object size", or words to that effect. This is where the problem with a relational database comes to the fore. As you break up content into smaller and smaller elements of reusable content, the system adds more rows to relational tables and there is an increasingly negative impact to query processing and table joins. This means that organizations who need high levels of object reuse are limited by the underlying architecture of their relational-based XML CMS. Hence, there is a tool-based limit to the cost savings of reusing high quality content. It's not that these systems cannot manage XML content at high levels of granularity; they can. It's just that the dollar costs for efficiency -- you know, high quality output per unit of time -- rise dramatically when object granularity increases to the point where it doesn't make financial sense to pour more resources into the relational-based system. These dollar costs include being forced to purchase additional processing, storage, and memory capacity in order to maintain acceptable response times to SQL queries over an exponentially increasing data set, or limiting the granularity of reusable objects (to maintain performance on existing platforms) to the point where cut-n-paste operations nullify the efficiency improvements of content reuse. Basically, the relational-based tool is ill-suited to managing structured content where output efficiency relies on significant content reuse.

CHIP: That seems like a logical explanation. But what about software code databases? They promote reuse, too. Why cant we just jimmy-rig our software code database to be a content management system? If that's not a good idea, why exactly it is a bad idea?

ERIC: Yeah, why can't we? After all, software reuse and content reuse are cousins, right? Yes, they are. But the distinctions around what is an object and what is reuse highlight their different parentage. One distinction: a good code management system supports the concept of information hiding, which means that you choose an object for reuse by the nature of its inputs and outputs, not by what it contains. In fact, the information about how the object is implemented (i.e., what is inside of it) is hidden from you. Not so in with XML content, where the opening and closing sentences of a paragraph are just as important as the sentences and images within the paragraph. Another distinction: a code management system is a file-based vault; the interactive development environment must be employed to provide an object-based overlay to the files so that developers can find and reuse objects. As with all overlays, the interface between the object-based development environment and the file-based code management system introduces a bottleneck during any storage, search, indexing, or retrieval operation as the object names are mapped into file names. An object-based XML content management system suffers no such performance bottlenecks because there is no overlay. XML content objects are stored, searched, indexed, and retrieved in their native form; assembly into document form occurs only during a publishing operation. Perhaps a third distinction is that code management systems do not have innate capabilities for handling object hierarchies and attributes; such capabilities must be grafted into the system, which introduces another awkward interface layer. As I said earlier, native support for object hierarchies and attributes is the defining characteristic of an object-based XML CMS.

CHIP: With the entry of Microsoft into the content management space, some IT directors are telling their business clients that they don't need to buy a content management system because they already have SharePoint. Is SharePoint an XML content management system? Can't we just use SharePoint for technical publications management functions? If not, why not?

ERIC: SharePoint, Xythos, and other such systems offer basic content services: check-in, check-out, file-level versioning, highly granular access control, workflow routing, and so on. SharePoint is the natural result of a file system married to WebDAV. The user sees folders and files, just like on his local disk, but he also gets these other services that a file system alone doesn't have, and everything is accessible over the Internet. But SharePoint cannot offer anything higher than document-level reuse, so the dollar costs for efficiency are very high for the organizations I mentioned earlier that have a compelling need to reuse topic-based, non-narrative XML content. The fact that Microsoft Office 2007 products use an XML-based file structure is incidental to SharePoint; SharePoint doesn't manage the XML fragments within these files, just the files themselves; so it isn't an XML CMS.

CHIP: It seems like there's another battle sidetracking some technical content creators when they start to think about tools. One of the most common questions we hear is "What about open source?" Can't open source XML content management systems do the same things as their commercial counterparts?

ERIC: First of all, in talking about open-source software, we should all agree that were living in the golden age of open source. And with any golden age, what is popularly perceived to be possible outstrips what is practically accomplished. Java went through its golden age, and open source is in it now. Java matured and found its appropriate application, and so will open-source once the golden age has reached its zenith. Second, my sense is that open-source will yield its biggest gains when it follows after monopolization cycle. The respective rises of Linux, the Firefox browser, and OpenOffice seem to bear this out, as does the relative obscurity of open-source databases like PostgreSQL, Ingres, and MySQL (although MySQL is making a good run). I expect that after object-based XML content management systems reach near-monopoly pricing conditions, enterprise-quality open-source alternatives will become viable alternatives to their commercial counterparts. Until then, technical content creators will read about pilot implementations based on open-source, but they won't be able to talk to anyone who bet the farm on an open-source, object-based XML content management system and came away unscathed.

CHIP: There also seems to be some confusion around the open source concept. For instance, it's often argued that open source is free? Is this true? And, if not, what are the costs of an open source solution that may not be apparent to the average technical communication professional.

ERIC: Open-source is free the way water is free. You don't pay for water; you pay for the service to deliver water to your location. Open-source is similar. You don't pay for the technology; you pay for the services to support that technology. These services take many forms. For an open-source, object-based XML CMS, you probably need to keep programmers on staff to tweak the software to meet specific requirements, compile and maintain updates to the source code itself, communicate feature requests to the source community, and tasks like that. Some open-source XML CMS systems can be installed and executed without the need to compile. However, since open-source solutions are relatively new compared to their commercial counterparts, it will be impossible to find an object-based XML CMS from the open-source community that is as thoroughly tested in enterprise-class conditions as a commercially available solution. I mean, more than a few of these object-based XML content management systems have almost 15 years of commercial service; any open-source alternative could hardly approach that level of rigorous improvement.

CHIP: We've talked a lot about when tools are appropriate. Now, Id like to ask the reverse. When is an XML content management system NOT a good option?

ERIC: This is a very good question. An XML CMS yields few benefits to organizations that need document-level management. By document-level management I certainly mean the basic content services I mentioned earlier. But I also mean services like records management, form-based routing (such as moving an insurance claim form through a an approval process), or traditional data warehousing. There are scores of enterprise-quality solutions optimized for document- or file-level management, and customers have extracted enormous value by embracing and integrating document-level management solutions. Interestingly, though, you will note that there is something of a shake-out going on in this segment of content management, signaled by the acquisition of Hummingbird by Open Text and FileNet by IBM, respectively. A shake-out period immediately precedes an era of monopoly pricing, which should then trigger the rise of viable open-source, document-level content management alternatives.

CHIP: If an organization is planning to move to XML content management, what new skills would their staff require? And, can they get started developing these skills before they've selected tools? Can a head start (for instance, the development of structured XML authoring skills) help an organization get their content management project completed faster?

ERIC: To your second and third questions, yes and yes; a head start in structured authoring is imperative because it is the first in a sequence of skill upgrades that ultimately lead to higher quality output within a unit of time. In fact, recent survey data from Aberdeen Group (The Next Generation Product Documentation Report: Getting Past the 'Throw It Over The Wall' Approach), indicate that best in class companies are 46% more likely to author structured documentation. These companies meet documentation deadlines 92% of the time (on average) or more; take half as long as some others to translate product documentation content; and make 2/3 fewer post-product release documentation changes. To your first question, the skill upgrade sequence starts with structured authoring and progresses quickly to adoption of topic-based authoring and minimalism. The last upgrade is effective reuse, which answers the question, How minimal does a topics text need to be to drive the highest level of reuse within the organization?

CHIP: Is there any thing else you'd like to add?

ERIC: Well, yes there are a few. First, the arguments in favor of an object-based XML CMS over a relational-based XML CMS can drift into the realm of religious warring if you ignore the axiom that in the long run, the best software tool harmonizes the object model with the object implementation. Second, our company deploys open-source software in its Astoria On-Demand product; notably, the Lucene search engine. Lucene enjoys increasingly widespread adoption to check Google's benevolent despot but otherwise near-monopoly status. Finally, what works well for document-based content management cannot work well for XML content management. After all, an elephant cannot count to five on one hoof, even if you change the size of the hoof.

CHIP: Eric, thanks for your time today. You certainly helped us clear up a few misconceptions. I appreciate it.

ERIC: No problem, Chip. It's been my pleasure. If your readers have any questions about XML content management systems they would like to ask me, they can do so by leaving a comment at the end of this post. I'll gladly answer any questions posted.

TrackBack

TrackBack URL for this entry:
http://astoriablogs.com/blog-mt/mt-tb.fcgi/22

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)