XML Support for DB2

Shelton Reese, ISC Chicago

New XML support and the DB2 XML Extender will give DB2 a whole new range of e-commerce and Web-publishing possibilities.

You’ve just concluded a successful business meeting in New York City and you’re in a cab on the way to LaGuardia airport. Things are looking good ?you got out of Manhattan and through the midtown tunnel in record time. Just then a page comes in on your mobile phone: The phone’s display shows that your flight has been canceled. However, as you page forward in the display, you see that there’s another flight out of Kennedy airport just a little later. You tell your cab driver to head for Kennedy. Meanwhile, you use your mobile phone to change your reservation.

How will such convenience be possible? XML is the key. A new travel service, based on the XML grammar Wireless Markup Language, is being developed by The Sabre Group, IBM, and Nokia. The Sabre Group makes the computerized reservation system used by many traditional travel agencies and the Web-based service Travelocity.com. Once The Sabre Group converts its travel-related information into XML, the information can easily be filtered and adapted to different output devices so that, for example, a mobile phone can read the same information as Web browsers.

This travel service is just one example of the kind of application XML makes possible. In fact, ever since the World Wide Web Consortium approved the XML 1.0 specification in February 1998, support for XML has grown rapidly among companies in computer and other industries.

XML’s popularity results from its usefulness in Web publishing and content management, application integration, and e-commerce, particularly as an enabler of data interchange for business-to-business e-commerce applications. To support these capabilities, DB2 Universal Database (UDB) v.6.1 for Unix, OS/2, and Windows includes XML support. And a DB2 XML Extender is now in beta testing.

XML is a text-based document formatting language. The term XML is often used to refer to a collection of related specifications including Extensible Style Language (XSL), XML Linking Language (XLink), XML Pointer Language (Xpointer), and Document Object Model (DOM).

XML syntax is implemented as a set of tags used to mark up a document. Although it’s similar in style to HyperText Markup Language (HTML), XML allows content authors to extend the language by defining their own custom tags. This feature makes XML a meta-language, meaning that it can be used to define markup languages, or grammars, specialized to the needs of a company, industry, or discipline that can be read by any XML-enabled system.

HTML tags are meant for Web browsing; they enable interaction between humans and computers. To understand the differences between HTML and XML, consider these examples.

<p><b>Mrs. Juanita Darby</b> <br>
200 East Randolph  <br>
Chicago, IL 60681

If you’re familiar with HTML, you know that this code looks something like this when rendered by a browser:

Mrs. Juanita Darby
200 East Randolph 
Chicago, IL 60681

The HTML tags don’t contain any information about what the data is, they only describe how it should look. If you want to extract the ZIP code from this address, you could write an algorithm like this:

Look for a paragraph tag that contains text in boldface, followed immediately by two text strings preceded by a break tag. In the text following the second break tag, assume everything up to the comma is the name of the city; it will be followed by two tokens, the second of which is the ZIP code.

While this algorithm works for this HTML sample, it’s easy to think of perfectly valid addresses that break this algorithm. Now let’s look at some sample XML code that represents the same data:


<street>200 East Ranolpht</street>


Our algorithm for finding the ZIP code is now much simpler:

The ZIP code is the text of the <zipcode> tag.

XML is a simplified subset of SGML, the international text-processing standard that has been in use for more than a decade as a format for large, complex documents such as aircraft design specifications and automotive manufacturing parts catalogs. XML is also becoming the successor to HTML for com- mercial Web content. As the demands for more complex Web content have increased over the past few years, HTML has been enhanced and extended to its limit.

XML tags describe document content, not layout. In fact, this separation of data and layout is a key concept in XML. XML documents do not contain any explicit rendering instructions. To be viewed in a Web browser, XML documents must be translated to another form, usually HTML, using style sheets. Style sheets are collections of rules for how to render various entities and document types. You can define these style sheets with Extensible Style Language (XSL), a companion language to XML.

The separation of data and layout provides key benefits to both the server and the client. Because the data and its presentation logic are separate, the same XML document can provide data to a number of different client applications, with each client’s presentation controlled by a different style sheet. Content providers wanting to specialize content for specific clients only have to maintain different style sheets, not multiple versions of an entire application or Web site. For example, the same content could be used with style sheets customized for a text-only environment, a handheld device, or an advanced multimedia workstation.

XML documents can act much like simple data structures or database tables, and their standard format makes them accessible to anyone with an XML parser or XML-enabled Web browser. If you’re using DB2 v.6.1, you can use the XML Parser for Java that’s included with the WebSphere Application Server that comes with v.6.1.

Document Type Definitions (DTDs) define the kinds of XML elements that can appear in a document, just as a schema defines what can appear in a database. Because a document contains or points to its DTD, XML is a self-describing data format. Any XML parser can dynamically parse the embedded DTD to learn which XML tags are valid in the document. Using that information the XML parser can then verify that the document only contains valid tags and that the tags appear in the correct order with the correct nesting. In the previous address example, the XML parser can verify that ZIP code tags appear only within an address tag.

XML provides significant benefits in three solution categories that are experiencing significant, if not explosive growth:

Web publishing. XML initially sparked the most interest among groups publishing and managing Web content. It was generally viewed as the successor to HTML. Already convinced of the value of structured information, the SGML community in particular had been looking for a way to leverage such information on the Web. Most initial XML-based products, including those from Inso Corp., Vignette Corp., ArborText Inc., Textuality, and Interleaf Inc., were designed for Web publishing and content management.

There are a number of advantages to using XML for Web publishing and content management applications. Once you structure data with XML tags, for example, you can easily combine data from different sources. And, once XML documents are delivered to the desktop, they can be viewed in different ways as determined by client configuration, user preference, or other criteria. For example, you could look at a product manual in “expert?mode, where only reference information is displayed, or in “novice?mode, where tutorial information is also displayed. XML tags also enable more meaningful searches, because searches can be restricted to specific tagged parts of a document based on the content contained within different tags. For example, rather than searching full text documents for ZIP codes that match 94402, you can simply search the <zipcode> field.

Some companies are using XML as part of their Internet or intranet information portals. For example, Dell uses a background XML application designed for content management and personalization on 17 different sites in Europe, the Middle East, and Africa. Before moving to XML content, Dell had to create separate HTML pages for each country-specific site.

Other companies are providing content syndication and subscription via the Web. For example, Dow Jones Interactive Publishing collects data feeds in various formats from publishers of 6,000 periodicals and converts the data to XML before sending it to the intranets of about 100 business customers.

E-commerce. The real excitement over XML is not because of its Web publishing or application integration capabilities, but because of its potential as an enabler of the data interchange necessary for business-to-business e-commerce. Forrester Research has projected that business-to-business e-commerce in the United States will explode from $43 billion in 1998 to $1.3 trillion by 2003, with an annual growth rate of 99 percent. With this kind of money at stake, any technology that has the promise of making this kind of solution easier to implement, as XML does, is bound to achieve rapid adoption.

Many businesses today want to automate procurement of office supplies, for example, to lower costs and to take advantage of emerging Internet-based auction-style spot markets. Companies also want to automate their production supply chains. Both of these cross-organizational business processes are accomplished by passing electronic documents (such as purchase orders, invoices, inventory queries, and shipment tracking requests) between organizations.

Electronic Data Interchange (EDI) has been around for a number of years and can handle many of these processes. So why do we need XML to define documents with a new data interchange format?

Data interchange formats with XML are very flexible. Without XML, two communicating applications must predetermine the format of the messages sent between them, the data elements that will be passed, and the order in which the data elements are arranged. However, when XML is the message format, the two applications can dynamically interpret the message format using an XML parser. And XML message formats are extensible: Using the same application that created the XML document, you could add an additional data element to support another application. The original application that used the document would be unaffected. For example, let’s say you defined a document in XML containing the results of an inventory query. This document could include an element called “Part,?which includes “Part-Number,?“SKU,?and “Quantity.?Even if a new element (such as “Price? was later added to support a new application, the original applications would be unaffected because they use an XML parser to look only for “Part-Number,?“SKU,?and “Quantity.

Dozens of industry-specific XML mark-up languages have been defined, including:


XML lowers the technical barriers to data interchange over the Internet because it is easier to understand and implement than standards such as ASN.1 and EDI. The base specification is only 30 pages long and is easily understood by those already familiar with HTML. And because XML is a text format rather than a binary one, anyone can read it. Designed with the Internet in mind, XML documents are compatible with Internet infrastructure elements, including HTTP protocols and firewalls. In contrast, EDI formatted documents are not compatible with Internet standards such as HTTP and require custom (and expensive) value-added networks.

Application integration. XML plays a significant role in the efforts many companies are undertaking to integrate e-commerce and CRM applications with their enterprise systems. The data to support real-time e-commerce is contained in legacy, back-office systems. And comprehensive CRM solutions require access to data in a variety of disparate systems to achieve a complete picture of customer relationships. XML’s usefulness as a data interchange format also applies here: Individual companies can define specific XML grammars for their internal use.

So where does DB2 fit into the XML story? For starters, DB2 UDB v. 6.1 debuted XML support earlier this year, and a DB2 XML Extender is currently in beta testing. DB2 UDB supports the XML application areas I mentioned: It can act as a repository of XML documents for Web publishing and content management, and it facilitates business-to-business e-commerce and application integration using XML documents as an interchange format for business data.

Web publishing and content management. Let’s say an airline currently stores all of its aircraft maintenance documentation as SGML documents and wants to convert all documentation to XML. The SGML documents are stored in files with a traditional “green screen?application, so only the aircraft mechanics on the shop floor can access them. If the documents are converted to XML, the company can use Internet technology to allow easy access to this documentation by other users, such as engineers defining new inspections, reviews, and maintenance procedures.

The first step in such a transition is to bring all XML documents and related metadata (such as document number, document title, author or owner, date, type of procedure, and revision level) under DB2’s control. Some metadata (such as document title or document author/owner) is contained within the documents, and some is encoded in file names and directory structures. There are two ways to place documents under DB2 control: store large documents (up to 2GB) as character large objects (CLOBs) in DB2 or leave the files where they are and manage them with the DB2 Data Links Manager. Related metadata can be stored in DB2 columns using traditional SQL data types (such as character, numeric, and date), which allows you to use SQL queries to locate desired documents. Search performance may be improved if the metadata contained within each document is extracted into separate columns.

The DB2 XML Extender includes a visual tool for mapping the extracted elements from the XML document to the columns and tables where they are stored. This mapping, called document access definition (DAD), and the DTD describing the XML document are stored in tables managed by the XML Extender, so the applications accessing the document don’t have to keep track of the DADs and DTDs.

Once you’ve defined the DAD and enabled the XML column, you can use the UDFs provided with the XML Extender to simplify the load process. When the XML document is inserted into a column defined as type XML_Column, the elements specified in the DAD are automatically extracted from the XML document and loaded into the specified columns and tables, eliminating the need for the load application to handle this parsing, extraction, and insertion. When the contents of an XML_Column are updated, the elements that were extracted and stored in side tables are automatically updated.

DB2 Text Extender provides a rich set of search functions that can be used to find specific documents based on their text content. Because Text Extender supports XML documents, you can specify that the text search functions be limited to a specific section in the document, which may significantly improve the search’s precision. For example, if you want to search a database of articles for those written by President Clinton, you could search for “Clinton? only in the author section. This approach results in faster performance and returns far fewer results than a non-XML search on documents in which “Clinton? appears in the body.

Once the documents are all stored in or managed by DB2 UDB, they can be made available over the Internet. You can use IBM’s Net.Data, which supports XML, to quickly build applications to deliver the XML documents to a Web browser. An XML-capable browser (such as the latest versions of Internet Explorer and Netscape Navigator) can display the documents. Alternatively, you could build the application as Java servlets and use WebSphere Application Server to deliver the XML documents.

Business-to-business e-commerce. In business-to-business applications, XML documents are generally more like forms (purchase orders, for example) than true text documents. When XML is used to define a specific interchange format, the XML tags point to traditional data elements stored in a relational database, such as customer name, address, part number, description, price, order date, and so on. The XML document is a transitory form; for permanent storage, the XML document is separated into its various elements, which are stored in a relational database.

For example, suppose your company supplies parts to a number of manufacturers. Your major customers want to be able to submit inventory queries on these parts from their automated materials planning system and receive results as XML documents. All the information they need is available in your internal systems—your company just needs to provide ways for your customers to get to that information. To do so, your company is setting up an extranet application for your major customers that lets their automated materials planning systems submit requests (queries) and receive results from your systems. This application will be written as Java servlets, running in WebSphere Application Server.

The first step is to establish a common grammar (vocabulary) that describes the information involved in this business process. In other words, you need to define what data elements are needed and how are they represented in the DTD. Once you and your customers agree on these points, you can proceed with the implementation.

Elements in the agreed-to DTDs must be mapped into the existing tables and columns in your DB2 databases that represent those data elements in your systems. You can do this with the visual mapping function in XML Extender’s administration tool, which supports mappings for both simple and complex data structures. This will create DADs, which along with the DTDs, are stored in tables managed by XML Extender.

Once the DAD is defined, you can use DB2 XML Extender’s stored procedures in the application to simplify the programming needed to process a request document. XML Extender’s stored procedures can be used to compose XML documents from data stored in DB2 tables (based upon the previously defined mapping). This reduces the amount of application logic (in Java servlets) that needs to be written. The resulting XML document will be sent over the Internet by WebSphere Application Server to a corresponding application server at your customer’s site.

As support for this new standard grows, XML will become more and more a part of the database environment. Just as DB2 was Web-enabled a few years ago, it is now being XML-enabled, allowing you to store new data types, perform more powerful searches, and use XML-based interchange formats. I’ve identified some of the ways DB2 UDB v.6.1 and XML Extender can support business-to-business e-commerce applications. However, we can only guess at all the ways XML interchange formats will facilitate new e-commerce, CRM, and other e-business solutions. It should be fun to watch.

Shelton Reese was a senior programmer at IBM’s Boca Lab, with more than 20 years of experience in all aspects of software development, including design, implementation, testing, and management of software projects. He has worked on DB2, DB2 Extenders, DB2 Digital Library, and IBM’s business intelligence products. You can reach him at sjreese@isc-consulting.com

You won't find a lack of Rolex replica watches on people in all parts of the Monterey Classic Car Week. Then again, it is a great place for watch spotting of all types. Though yeah, if you are anywhere where cool cars might be Carmel-by-the-Sea, Pebble Beach, or Monterey during Classic Car Week, you'll see a very healthy assortment of Rolex replica watches. It makes me wonder what it would be like for Rolex to make a car. The Swiss have done it before... BlogtoWatch received a note from Rolex replica watches today on the appointment of a new brand CEO. The news release from Rolex replica watches sale was a scant, one line document on official letterhead that explained Jean-Frederic Dufour would be appointed as the CEO of Rolex SA in Geneva. Mr. Dufour would be replacing Gian Riccardo Marini. Many in the watch industry are familiar with Jean-Frederic Dufour as the CEO of Zenith replica watches, a position he will now be leaving.