The Content-type: header is used to identify the
content value contained within a message. A content type is identified by
the following properties:
a type, which gives general guidance as to the resources required
in order to process the content;
a subtype, which refines the content; and,
zero or more parameters, which allow for the customization of
the content.
By convention, when people talk about a content type they say both the type
and subtype. The two are separated by a solidus (aka forward slash, "/"),
for example text/plain or application/MSWord. There are seven
predefined types and a several subtypes associated with each content type.
The definition of the original content types is such that there probably won't
be anymore than the original seven.
The multipart type is the most complex. It is used to convey a content value
that contains subordinate parts. Basically, a multipart content, regardless
of its subtype, contains zero or more body parts, each separated by a delimiter.
Each of the body parts is structured in a similar fashion to an electronic
mail message. Unlike a message, however, no header fields need be present.
Hence, any of the body parts could start with a blank line. However, there
are usually headers present and they should all be named with a prefix of
Content-. If no Content- type: header is present, then the value text/plain
is used as a default, which means that the body part contains unstructured
ASCII text.
There are eight subtypes of multipart in common use. We'll describe five
here.
multipart/mixed, which indicates that the subordinate body parts
should be processed in sequence.
multipart/parallel, which indicates that the subordinate body parts
should be processed in parallel. However, if more than one body part requires
exclusive access to a common resource (e.g., if two or more body parts requires
access to the user's keyboard when rendering them), or if the software processing
the message is incapable of simulating parallel processing, then sequential
processing is acceptable.
multipart/digest, which indicates that each subordinate body part
is an electronic message, having type message/rfc822 (discussed in just
a moment). When messages are forwarded, this is the content type to use.
Unfortunately, much "modern" desktop software simply includes
the message as text-without actually structuring it as an included message.
As a result, humans can figure out what's going on, but programs can't.
multipart/alternative, which indicates that while there are multiple
subordinate body parts present, they all have identical semantic content.
As such, only one should be processed. The body parts are ordered in terms
of expressive power, with the least expressive content being the first,
and the most expressive content being the last. The reason for this is to
make things simpler for pre-MIME software. That desktop software will display
the entire message to the user; hopefully, the first body part will be legible
to a human.
multipart/report, which indicates that the message is an error
report.
The easy way to think of the multipart type is that it is interpreted directly
by the desktop software and the user should be completely unaware of its existence.
The same is largely true of the next type, message, which has three commonly
used subtypes and one unpopular subtype:
message/rfc822, which indicates that the content value is an electronic
mail message. When forwarding messages, the multipart/digest content type
is used and each subordinate body part is of type message/rfc822.
message/partial, which indicates that the content value is part
of a fragmented message. When a message is too large to send, typically
due to administrative controls, it can be divided into several fragments.
Each fragment has a common id and a unique number. The final fragment must
(and the other fragments usually do) have an indication as to the total
number of fragments. Upon receiving all fragments, the original message
can be reconstructed. The only particularly tricky part about the process
is that the Content- headers and the Message-ID: of the original message
is placed at the front of the value put in the first fragment. This prevents
any confusion between the headers identifying each fragment and the headers
in the original message.
message/delivery-status, which is contained inside a structured
error report. As described in Chapter 2, it's the second part of an error
report that contains machine-readable information about the problems in
delivering the message.
message/external-body, which indicates that the content value
is a pointer to the content, rather than the actual value. This subtype
is falling out of use. The reason is that it is proving easier to send a
message containing HTML, which embeds a link to the external content rather
than constructing a separate external body part.
As a note for protocol historians, this last subtype was developed
at approximately the same time as the Web technologies. For various reasons,
it didn't use the same syntax as the Web. This was, in retrospect, a mistake,
given that an HTML fragment has equivalent functionality to an external body
part.
The remaining content types are meaningful to the user. The standardized
ones in common use are:
text/plain, which indicates that the content value is plain text.
A parameter indicates which character set should be used when rendering
the text. In general, the simplest character set that faithfully represents
the value should be chosen. For example, the characters contained in the
US-ASCII set are a subset of those contained in the ISO-8859-1 repertoire.
If a message makes use of only those characters in the former character
set, then that should be the character set indicated by the e-mail program.
However, as we'll see later in this chapter, not all products have been
implemented in this fashion.
text/html, which indicates that the content value is from the
HTML used by the Web. The same characters set issues apply as for text/plain.
text/richtext, which indicates that the content value is input
to a simple text formatter. This is another casualty of the early development
of MIME not foreseeing the popularity of HTML.
image/gif, which indicates that the content value is image data
encoded using the Graphics Interchange Format (GIF).
image/jpeg, which indicates that the content value is image data
encoded using the Joint Picture Experts Group (JPEG) format.
audio/*, which indicates that the content value is audio data
encoded using the indicated subtype (and parameters). Originally, there
was the audio/basic content, which was phone-quality, single-channel audio,
but this lacks the sizzle required by the people marketing today's Internet.
video/*, which indicates that the content value is video encoded
using the indicated subtype (and parameters).
As might be imagined, there are many subtypes of text, image, audio and
video used for specialized applications throughout the Internet. However,
we haven't yet described the seventh content type, which is where most of
the customized behavior is found-the application type. Although the original
intent of the application type was to convey a content value for mail-enabled
applications, in practice anytime something needs to be sent that is more
complex than one of these four types (text, image, audio or video), then the
application type is used.
For example, if you need to send a spreadsheet, a word processing document
or a slide presentation, then the company that wrote the authoring program
has already registered the application subtype that conveys the appropriate
kind of file.2 Among other things, the
MIME standard documents the procedure wherein a vendor may register content
types with a registration authority. In addition, there is one other common
subtype:
application/octet-stream, which indicates that the content is
arbitrary binary data. Parameters indicate a textual explanation of the
contents. This subtype is generally used when the appropriate company has
registered a specific application subtype.
In effect, the application/octet-stream type provides a simple file transfer
facility over e-mail. Let's look at two examples.
First, let's combine the foregoing concepts to deconstruct at a typical structured
error report:
A structured error report consists of two or three subordinate body parts.
So, we know that it's going to be a multipart content type. The particular
content type is multipart/report.
The first part is a textual explanation as to the problem and includes
the three-digit reply code. The particular content type is text/plain.
The second part looks like a small message-it has a collection of headers.
These headers include precise information as to what the problem was and
where it occurred. This information is carefully generated to be machine
readable. The particular content type is message/delivery-status.
The third part, if present, is the original message. The particular content
type is message/rfc822.
As a second example, consider the the correct way to generate a Bcc: message:
Strip the Bcc: header out of the message, but remember the addresses
contained therein.
Send that message to the recipient addresses in the To: and cc: fields.
For each address in the Bcc: header, construct a new message of type multipart/digest.
It should have one subordinate body part, message/rfc822, which contains
the message that was sent in the previous step.
The headers of each new message sent should be identical to the original
message sent except that the Content-*: headers should be removed and replaced
with a Content-Type: of multipart/digest and the To: and cc: headers should
be replaced with a To: header containing the address of the Bcc: recipient.
2 Of course, there is still plenty of room for user error when sending
attachments.