Lossy compression
Regardless of the advances in UDP and RTSP transmission protocols, streaming media would not be possible without the rapid innovation in encoding algorithms or codecs that compress and decompress audio and video data. Uncompressed audio files are huge. One minute of playback of a CD-quality stereo audio file requires 10 MB of data, approximately enough disk space to capture a small library of books or a 200-page web site.
Standard modem speed connections--including cable modems and xDSL systems--do not have the capacity to deliver pure, uncompressed CD-quality 16-bit, 44.1 kHz audio. In order to stream across the limited bandwidth of the Web, audio has to be compressed and optimized with codecs, which are compression-decompression encoding algorithms. In general, compression schemes can be classified as "lossy" and "lossless."
Lossy compression schemes reduce file size by discarding some amount of data during the encoding process before it is sent over the Internet. Once received on the client side, the codec attempts to reconstruct the information that was lost or discarded. The benefit to this sort of compression lies in the smaller file size that results from discarding the "lost" information. The JPEG image format uses lossy compression to sample an image and discard unnecessary color information. Similarly, lossy audio compression discards frequencies on the high and low end of the spectrum and attempts to locate and remove unnecessary audio data. The technique is often referred to as "perceptual encoding" since the user is unlikely to notice the absence of this information. Lossy compression offers file savings on the order of 10:1.
Since small file size is so important on the Internet, practically all of the formats we're interested in employ lossy compression. Here's how it works. First, the client player decompresses the audio file as it downloads to your computer. Then it fills in the missing information according to the instructions set by the codec. To illustrate why lossy compression is so crucial, consider the phrase, "Now is the time for all good men to come to the aid of their country". One way to compress this would simply be to remove all the vowels and spaces: "Nwsthtmfrllgdmntcmtthdfthrcntry".
That cuts the message from 71 characters to 31, a 56% file savings, but of course our compressed message is unintelligible. Imagine that our codec, however, has appropriate rules for decompressing this message with minimal distortion. The conversion likely wouldn't be perfect, but it would be good enough to understand the message, something like, "Now's tha ti'm for oll gudm en to com to the aad of their country".
This is exactly what happens with lossy audio compression. The compressed file is unintelligible to the listener; the decompressed file is intelligible but of a lower quality than the original.
For example, a RealAudio speech file encoded from a standard AIFF or WAV file is generally one-tenth the size of the original file after encoding. To reduce that file's size, first you preserve the integrity of the 1,000 Hz to 4,000 Hz frequency spectrum of the human voice and then discard the frequencies above and below those ranges. By eliminating the unnecessary low- and high-end frequencies, the encoder is able to reduce the file size while maintaining speech intelligibility. It should be noted that speech tends to have aural characteristics (sound) that extend into the 7,000 Hz range. When the area between 4,000 Hz and 7,000 Hz is reduced or removed entirely, encoded speech will sound intelligible, but it may lose clarity and sound unnatural. Furthermore, since some voices and sounds often reach into even higher frequency ranges, lossy compression and encoding can result in dull, muted, or abrasive sounds.
Lossless compression
In contrast, lossless compression squeezes data into smaller packets of information without permanently discarding any of the data. Instead of permanently discarding information, lossless compression discards it temporarily but provides a "map" with which the codec can reconstruct the original file. Lossless compression results in superior audio quality, but lower compression rates.
In the lossy example, our codec had some general rules for reconstructing the message--basically to add vowels and spaces in order to form English words. It wasn't perfect because it didn't know which English words to choose, and it wasn't always sure where one word ended and the next began.
Lossless codecs, on the other hand, are perfect. To reconstruct our message perfectly, however, would mean having a much more sophisticated set of rules. A lossless text codec would have to reproduce not only words but sensible phrases. It would have to be able to break words correctly. And it would have to have a mastery of the English language's inconsistent spelling patterns. It would in fact be, as the computer scientists say, a nontrivial endeavor.
The same goes for lossless audio codecs. They are difficult to develop (and thus expensive to license), they require substantial computing power on the user's machine, and the file savings are not as great as with lossy compression. Sadly enough, it appears that for the current time, lossy compression is necessary for knocking large audio files down to Internet-appropriate size. The good news is that lossy compression schemes are becoming more advanced, and over time the differences will become less and less noticeable to the human ear.
Now that we have discussed lossy and lossless compression and the types of protocols that enable the efficient delivery of compact audio files across the Internet, let's review the audio formats available on the market. Most of these formats will be discussed in greater detail in the rest of the book.
About the Author
Josh Beggs is co-founder and president of
Raspberry Media, a Design Firm in the San Francisco Bay Area specializing in
Web-smart architecture, interface design, and brand development for Internet
start-ups. Josh began his career in the multimedia industry as a recording
engineer and sound designer. In 1995 he produced the interactive soundtrack for
EMI Records flagship CD-ROM, Queensr˙che's Promised Land. After receiving
impressive reviews from Billboard Magazine (March 1996) for the soundtrack, Josh
went on to explore interactive media design with Raspberry Media. In addition to
designing some of the top Web sites on the Internet, he also follows his musical
passions as a pianist and recording artist.
Dylan Thede's multimedia experience began
in the cultural mecca of the San Francisco Bay Area in 1985. At a young age, he
was designing sound systems and multimedia presentations for the University of
California at Berkeley. At the University of California at Santa Cruz, Dylan
became a pioneer in the emerging fields of Digital Audio, Digital Video, and
Multimedia and later graduated with a degree in Multimedia and Psychology. He
was one of the pioneers in web design when the World Wide Web burst onto the
scene in 1994. In 1995, Dylan founded AudioVisualize, a multimedia consulting
company that caters to companies who wish to implement multimedia into their web
sites and corporate operations. Besides writing and creating multimedia
projects, he is also a musician and is currently composing and recording music
for an upcoming multimedia CD release.
Printer Friendly Version