c-- styles for logos and headline links do not modify internet, red, or black styles -->

Intranet Journal   Earthweb  
Events Jobs Premium Services Media Kit Network Map E-mail Offers Vendor Solutions Webcasts

   Intranet Journal Subjects
Search Earthweb

Privacy Policy



internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers

internet commerce
Be a Commerce Partner
















 

[ Home | Discussion Forum | How Do I... | Lotus Notes Intranets | Microsoft SharePoint | Products | Shopping  ]

free news!

An XML Development Kit
Professional tools enable the next web generation

by Norbert H. Mikula
Senior Online Information Architect, DataChannel
 
Executive Summary
The need for XML documents
Where HTML falls short
XML - Preserving the Richness of Information
XML as metadata
XML as data
XML for object serialization
DataChannel XML Development Kit
DataChannel XDK - Professional
XDK for databases
XDK for spreadsheets
XDK for generic applications
XML power for professionals

Executive Summary

Even as Internet and intranet technologies gain acceptance in the enterprise, they continue to fall short of solving a crucial business problem: How to get the right information to the right person at the right time. DataChannel's products are designed to enable effective communication between humans and for business-to-business transactions. XML is the strategic format for the encoding information. It allows for the preservation of the richness and contextual information needed in order to provide an active channel for communication.

XML per se, however, does not solve any problems. This article describes the state of the practice in XML and presents DataChannel's XML Development Kit (XDK), a product suite that enables software developers and integrators to make their applications XML-aware. In doing so, the XDK enhances the web value proposition by ensuring that rich information reaches its appropriate audience.

The need for XML documents

 

The Web can be seen as a worldwide file system. Anybody with access to the Internet can use this storage system to publish and access electronic documents. Organizing this information by building indexes and databases that enable people to find the information they need was an important first step toward successful exploitation of the medium. But as anyone who has spent valuable time hunting for specific answers with a search engine knows, more effective techniques are needed.

XML attacks this problem at the source: the lack of semantic structure in most content. Documents typically contain information that goes way beyond the actual letters and pictures comprising a text. Consider this article, for example. It has a specific title, a specific subtitle, a specific abstract, etc. If a user could make this information explicit and available to search engines and other agents, that user could simply ask the database to locate all documents that contain "XML" in the abstract. Such a query would ensure more relevant results than a full-text search on the same keyword.

Here's another example: You have a collection of brochures about different sorts of used cars collected from different dealers web sites. Wouldn't it be nice to be able to pose the query, "I am only interested in a Saturn or a Dodge costing between $10,000 and $15,000"? This kind of search would clearly add value over a series of far less efficient full-text searches on "Saturn OR Dodge," followed by labor-intensive cost comparisons.

Where HTML falls short
Current web standards do not enable this kind of semantically rich search. HyperText Markup Language employs tags - "markup" - to add explicit structural information to your document over and above its core content. HTML made it simple for everybody to create documents that can be read by particular software programs - called browsers - across all platforms. HTML was designed for creating documents that contain simple instructions defining specific components of a text. For example:

<HTML> <H1>Intranet Design and XML</H1> <P>A discussion of XML and its role for Intranet design</P> </HTML>

Here we are saying that the phrase "Intranet Design and XML" is a header (H1) at outline level 1, while the phrase "A discussion of …" is a normal paragraph (P). These and a growing number of structural and formatting instructions comprise the evolving HTML standard, which has been used to design and publish millions of web pages.

HTML's success is in its simplicity: it has fixed, relatively small number of tags. HTML was powerful enough to enable web designers to engineer the current generation of search engines. However, as the Web has grown so has demand for features that enhance business communication. In this regard HTML's simplicity is a weakness.

For instance, standard HTML contains the tag <TITLE>, which allows an author to specify the title of the document published. Apart from the fact that many did not use it, a new problem started to emerge. Certain words are very ambiguous and can be used in a variety of different contexts. Think about the string "1999." Does it refer to a date, to the price of a television set, or to the serial number of product? HTML does not allow an author to distinguish these meanings in terms of markup.

For a human being, this is less of problem. We can understand the meaning of "1999" because we understand the context, the phrase in which it was used. For a machine it is difficult, if not impossible, to capture the intended meaning.

XML - Preserving the Richness of Information

XML - the Extensible Markup Language - was designed in order to go beyond the boundaries of HTML. In XML, a document author can use his own tags. Take, for instance, the example of "1999." An author could write <Price>1999</Price>, and suddenly the context of the string "1999" would be clear.

Other intuitive examples are <Author>Norbert H. Mikula</Author> or <Abstract>About this paper</Abstract>.

XML provides a means to preserve information that would otherwise be lost. A typical word processor does not know whether you are talking about a title or a subtitle, for instance. To a word processor it is simply a text in a large font.

Let's look at a few examples comparing XML and HTML. First, consider a simple example for a book in HTML:

<HTML>
<TITLE>From HTML to XML</TITLE>
<H1> From HTML to XML</H1>
<H1>By Norbert H. Mikula</H1>
<P>Once upon a time …</P>
</HTML>

A simple example for a book in XML:

<BOOK>
<COVER>
  <TITLE>A book about XML</TITLE>
  <AUTHOR>Norbert H. Mikula</AUTHOR>
</COVER>
<CHAPTER>
  <CHAPTERTITLE>From HTML to XML</CHAPTERTTLE>
  <PARAGRAPH>Once upon a time …</PARAGRAPH>
</CHAPTER>
</BOOK> 

XML can be used for any kinds of documents, in any kind of industrial or commercial environment. For instance if we look at a document that describes cars, a typical XML segment might be:

<CAR>
  <MAKE>Dodge</MAKE>
  <MODEL>Caravan</MODEL>
  <PRICE>20000</PRICE>
  <COLOR>Red</COLOR>
</CAR>

Clearly, the inclusion of context-specific XML tags makes possible programmatic search and analysis of this kind of document.

XML as metadata

Metadata is data about data. Very often, we can say something about a document that is not actually part of the document itself; for instance, how many pages does a document have, who is the author, when was it last saved, etc. metadata can be attached to a document or can be a document by itself.

Metadata is very valuable information as it helps us to catalog, search, and index information about a document without having to have access to the document itself. Metadata is being used widely in libraries. Whenever you look for a book, you query the computer or look in the physical registry, based on metadata. You can search in a variety of ways for the same object: Who is the author, who is the publisher, etc.

For metadata to be a real utility in the World Wide Web, it needs to be exchangeable. XML has been designed to make information exchangeable. Thus, the use of XML to express metadata is a very logical step.

To make the remainder of the discussion somewhat more concrete, I will illustrate how XML is being used today in a successful implementation: DataChannel RIO. This product uses metadata in a variety of different ways. DataChannel provides information about documents and about the way documents are organized in order to provide easy access and easy navigation.

Metadata about documents

DataChannel RIO is a metadata router. That means that it informs subscribed parties about new documents or document updates, not by sending the actual document but by sending a small metadata packet that describes the actual document. Information that is passed along might include the title of a document, a description of the document, and the place where the document is located (its URL). This makes DataChannel RIO a very fast system to use, since, instead of large documents, only small metadata objects are being sent across the wire. DataChannel RIO uses XML as the core for metacontent to ensure that the data transmitted can be reused and repurposed by client software in the most effective way.

Metadata about structures

DataChannel RIO provides a sophisticated navigational interface that allows a user to access information with only a few mouse-clicks. This navigational framework can also be seen as metadata. The Channel Definition Format (CDF) is an XML-based standard that can be used for the description of a navigational paradigm as being used by DataChannel RIO.

DataChannel RIO uses CDF in order to send this sort of metadata to client software. CDF is also being supported by Microsoft Internet Explorer 4.0. Thus Internet Explorer can be used as one of the many possible clients to the DataChannel system architecture. CDF is one example of how XML can be used to describe metadata that can be used and interpreted by different systems regardless of operating system and platform.

XML as data

Imagine a database that needs to exchange data with another database operating on a different system from a different vendor. Think about data on a spreadsheet that needs to be sent to a database, or a workflow application that needs to integrate with a spreadsheet.

It is a simple concept: Take data from system A and transfer it to system B. Not being able to perform this action severely hinders any attempt to make the World Wide Web a platform for open information exchange. However, we do not yet have the ability to exchange data from system to system. The exchange of pure data, not within the context of a document, is still a fundamental challenge for the software industry.

XML promises to be the platform that can make this vision become reality. XML is about preserving context and semantics. Let's take a look at a very simple database example, a (very) simple employee database. An employee has a name, an address (street, number, zip code, state), and a position.

How can we use this data in order to send it to a spreadsheet located across the Atlantic? Here's one way:

<Employee>
  <Name>Norbert H. Mikula</Norbert>
  <Address>
    <Street>108th Avenue NE</Street>
    <Number>155</Number>
    <Zip>98004</Zip>
    <State>WA</State>	
  </Address>
  <Position>Sr. Online Information Architect</Position>
</Employee>

This information can easily be interpreted by any XML-aware spreadsheet and used to populate its cells.

XML for object serialization

XML-based object serialization can be done with two objectives in mind. We can serialize objects in order to provide persistency for our data-structures, or we can make our in-memory objects interchangeable over the Internet/Intranet.

The DataChannel XML Parser (DXP) and the DataChannel DOM Module form the two components that are needed in order to use XML for object serialization.

XML for distributed computing

Distributed computing, especially in the context of a global Intranet needs to be build upon open industry standards. One of the biggest challenges is to find a means for the effective exchange of information.

XML can be used as a format for message interchange between software components. The idea is to have the actual messages coded in XML. This allows any application that can read XML messages to participate in this global computing model.

Figure 1
Figure 1: Distributed Computing using XML

The XML modules found in the DataChannel XML Development Kit form some of the basic software components needed to create such inter-networked architectures.

DataChannel XML Development Kit

DataChannel has put together some of its award-winning products and tools in order to provide a development kit that enables system integrators as well as software developers to unleash the full power of XML. The DataChannel product suite was designed as an aid to create next generation systems and to integrate existing systems into the new world of open information exchange based on XML.

The DataChannel XML Development Kit contains all the important components that are needed to XML-enable applications. The core components are:

DataChannel XML Parser (DXP)

This Java-based validating XML parser is the leading cross-platform system of the industry. A parser is a module that an application needs in order to consume XML data and integrate it into its own data structures. Based on NXP (Norbert's XML Parser), this parser has been with XML since its early days.

DXP supports standard industry interfaces such as SAX (Simple XML API) and the W3C Document Object Model (DOM).

DataChannel DOM Module

The DataChannel DOM Module is a component that can be used to create XML data as well as to consume XML data. The DataChannel DOM Module can be used with DXP or without DXP. Used as an XML creation API, the DataChannel DOM Module can be used to build and export XML data from within virtually any application.

Through the DataChannel DOM Module and the DataChannel XML Parser you can use XML as a standard format for your object serialization.

DataChannel XML Generator

The DataChannel XML Generator is built on top of DXP and allows you to transform comma-delimited files into rich XML. Comma-delimited files had to be used for decades in order to exchange information between applications. With the DataChannel XML Generator, these applications can be integrated to make use of the new world of XML.

DataChannel XDK - Professional

To be able to XML-enable applications is the first step toward unleashing the full power of the Internet/Intranet. The biggest challenge is to publish documents/data and to route the appropriate metadata to the right consumer of information. In other words, the challenge is to get the right information to the right person at the right time.

The DataChannel XML Development Kit - Professional contains, in addition to its XML-enabling components, DataChannel RIO and the XML-based DataChannel RIO integration APIs.

DataChannel RIO is a next-generation publishing, metadata routing, and navigation system based on XML message exchange. DataChannel RIO's API allows software developers as well as integrators to build powerful applications based on DataChannel RIO functionality and the tools provided by the DataChannel XML Development Kit.

There are many different ways in which the development kit can be employed. In the following sections we will look at a few, generic examples.

XML Development Kit for databases

Enabling all kinds of databases systems, be it relational or object-oriented, to exchange data in an open fashion opens up a new world for Internet and Intranet applications. One of the biggest obstacles to overcome is to find a single, accepted and powerful encoding convention that is rich enough to provide the syntactical elements needed to achieve this.

XML is a solution to this basic problem. The DataChannel XML Development Kit provides the building blocks that database vendors and database developers can use to allow easy integration of these powerful information sources into the world of XML.

Figure 2
Figure 2: Reading and writing XML using database systems

XML Development Kit for spreadsheets

Spreadsheets are the heart of millions of office applications throughout the computing world. It must be one of the primary goals for an XML development kit to enable these kinds of applications to read and write XML data. Using open import and export facilities or programmatic interfaces that allow dynamic creation and consumption of XML information enable these applications to migrate toward powerful Intranet and Internet integration.

Figure 3
Figure 3: From a spreadsheet to the open world of XML

Having spreadsheet data serialized into XML does not only allow spreadsheet-to-spreadsheet communication. XML data does not make any assumptions about its intended use. Conversely, XML content created from a spreadsheet can be consumed by database systems, workflow systems, and any other kind of application that can read XML data.

XML Development Kit for generic applications

Having a standard format for data exchange between applications, for configuration files, and for the persistency of application data, was on the wish list of application developers for decades. XML seems to offer solutions to this type of problem.

Figure 4
Figure 4: XML helps in many different ways

The XML Development Kit provides the components and documentation an application developer needs in order to use XML to solve some of the problems mentioned above.

XML power for professionals

XML-enabled applications will certainly change the way information is being dealt with across the World Wide Web. However, the real power lies in the effective dissemination of information. DataChannel RIO offers a system that can be used for powerful XML-based metacontent routing.

What does that mean? When two or more applications need to exchange information, they often are able to do it based on the type of information each recipient needs and when they need it. For instance, a system that monitors supply chains might need to inform certain parties, be they software processes or people, that a certain component is running low and needs to be ordered. Not everyone in the entire organization needs to know this. Only certain parties need to be informed, but they MUST be informed, and promptly. Again, recipients involved are machines and/or humans (via machines).

Figure 5
Figure 5: Profile based routing

The challenge is twofold, how to integrate custom applications into this framework and how to administer these multidimensional relationships. DataChannel RIO provides means to help with both.

Figure 6
Figure 6: Information can be routed to different kinds of applications

A powerful XML-based API allows for easy integration. In conjunction with this, via a sophisticated administrative user interface, the flow of information can be controlled directly by a human party as well.

A recipient/group approach allows for the bundling of processes, as well as human users to groups, based on a common interest (i.e., data required in order to carry out a job).

© 1998 DataChannel, Inc.

XML is the strategic format for the encoding information. It allows for the preservation of the richness and contextual information needed in order to provide an active channel for communication.
 
Metadata is being used widely in libraries. Whenever you look for a book, you query the computer or look in the physical registry, based on metadata.
 
Because it supports Channel Definition Format (CDF), Internet Explorer can be used as one of the many possible clients to the DataChannel system architecture.
 

XML can be used as a format for message interchange between software components. The idea is to have the actual messages coded in XML, which allows any application that can read XML to participate in the global computing model.

 

Having a standard format for data exchange between applications, for configuration files, and for the persistency of application data has been on the wish list of application developers for decades. XML seems to offer solutions to this type of problem.

The Company

DataChannel Inc., based in Bellevue, Washington, is the leader in XML-enabled active content technology. DataChannel's flagship product, DataChannel RIO, simplifies the process of delivering critical information to the right people at the right time through instant distribution of organized content, the ability of anyone to save content directly to the Web, and the provision of an open API (application program interface). To find out more, visit the company's web site at www.datachannel.com.
Of Interest
· Intranet eXchange Discussion Board

· Advice and Opinions