c-- styles for logos and headline links do not modify internet, red, or black styles -->
|
|
|
|
|
|
Little has captured the imagination
of publishers like the idea of disseminating information over
the Internet. This paper looks first at the requirements that shaped
Internet publishing in the past, then at some of the more successful
formats. Next, the focus shifts to emerging stylesheet standards
and XML [eXtensible Markup Language], each of which fills important
gaps in Internet publishing. Finally, the conceptual model of Internet publishing is challenged and revised.
Network administrators are facing the demand to accommodate Internet publishing
with existing bandwidth, while for business units the mandate is making
sure that the right information gets to the right person at the right time.
The next generation of publishing standards will satisfy these goals. New standards for document formatting, metadata, document rendering, and
for document/information interchange protocols all point naturally to the
next conceptual leap: meta-content routing. Delivery of structured information via the World Wide Web has
long been a challenge to professional publishers. There are four basic requirements
that determine the optimal format for electronic publishing:
Structure/Markup Markup enables the creator of a document to preserve information about parts
of a document. For instance to be able to mark explicitly the title enables
powerful processing of the document. Document formats vary in terms of how much structure and markup they allow. Standards A standard should also be independent, not under the control of a
single vendor. Industry consortiums or similar groups of widespread participants
are better suited to control a standard. Rendering Rendering information should always be kept separate from the content of
a document. This guarantees that the rendering can be context driven. The
content of a document might be rendered as a paper print-out from a laser
printer, audio from a speech-synthesis program, or the textured output of
a Braille device. As rendering is of special importance to electronic publishing, a separate
section is devoted to the discussion of this particular subject. Acceptance/Support This paper evaluates four document formats with respect to the
criteria defined above: PDF, SGML, HTML, and XML. All are important
approaches to Internet Publishing. PDF This rules out the employment of PDF in business applications that require
machine driven post-processing and other tasks that require indexing and querying.
Finally, PDF does not separate rendering and content, which makes it unsuitable
for many application scenarios. SGML Transferring SGML documents from one place to another is ideal since no information
is lost during that process. An SGML based electronic publishing system seems
an obvious solution. The problem with SGML is that it is complex. SGML based Internet publishing
requires that all participating parties use SGML systems in their existing
infrastructure. SGML is as complex as it is powerful. The complexity of SGML
and the effort it takes to understand and build SGML systems makes the use
of SGML very costly. SGML software is traditionally very expensive and SGML
experts hard to find (when found, their salaries tend to be "upper level").
For these reasons, SGML is not widely used. The SGML solution breaks down
on the acceptance criteria. HTML As successful as HTML has been, certain problems have arisen. The simplicity
of HTML resulted in documents with limited rendering information. There is
very little opportunity to include information about the document itself.
Finally, HTML has only limited support of hierarchies. Publishers that maintain their documents in SGML format can translate these
repositories to HTML for publishing on the Internet. Much is lost, however,
in the translation from SGML to HTML; and once converted to HTML, the richness
of the original document is lost. XML XML is a subset of SGML which to a large extent is formulated by simply leaving
out features which are rarely used or cause problems in terms of processing
speed. The W3C is the organization that controls the standardization of many WWW
related formats and protocols such as HTML and HTTP. XML has all the characteristics
of a standard as it is controlled by the W3C. There is further reason in considering
XML as being a reliable standard since it is explicitly compatible with SGML
in the sense that any system conforming to the SGML standard is also able
to read and process XML documents. Since its first days in the public light, XML has been very successful. This
success can be seen in the presence of a multitude of free or inexpensive
software systems. XML has the clear potential to become the lingua franca
for information exchange on the WWW. XML is not in competition with SGML or HTML. Rather,
XML has been designed to fill the gap between the two standards. Go to page 2 of 2 © 1998 DataChannel,
Inc. |
The simplicity of HTML results in documents
with limited rendering information; there is very little opportunity
to include information about the document itself.
SGML
is as complex as it is powerful, making it a very expensive solution.
As a result it has found acceptance in only a few document-
|