Intranet Journal
The online resource for intranet professionals
Josh Beggs & Dylan Thede
Internet streaming media changed the Web as we knew it-- changed it from a static text- and graphics-based medium into a multimedia experience populated by sound and moving pictures. Now streaming media is poised to become the de facto global media broadcasting and distribution standard, incorporating all other media, including television, radio, and film. The low cost, convenience, worldwide reach, and technical simplicity of using one global communications standard makes web broadcasting irresistible to media publishers, broadcasters, corporations, and individuals. Businesses and individuals once denied access to such powerful means of communication are now using the Web to connect with people all over the world. The remarkable technology that allows a web site visitor to click on a button and seconds later listen to a sporting event, tradeshow keynote, or CD-quality music is the result of a rather simple but powerful technical innovation--streaming media. Streaming works by first compressing a digital audio file and then breaking it into small packets, which are sent, one after another, over the Internet. When the packets reach their destination (the requesting user), they are decompressed and reassembled into a form that can be played by the user's system. To maintain the illusion of seamless play, the packets are "buffered" so a number of them are downloaded to the user's machine before playback. As those buffered or preloaded packets play, more packets are being downloaded and queued up for playback. However, when the stream of packets gets too slow (due to network congestion), the client audio player has nothing to play, and you get the all-too-familiar drop-out that every user has encountered. The big breakthrough that enabled the streaming revolution was the adoption of a new Internet protocol called the User Datagram Protocol (UDP)
and new encoding techniques that compressed audio files into extremely small packets of data. UDP made streaming media feasible by transmitting data more efficiently than previous protocols from the host server over the Internet to the client player or end listener. More recent protocols such as the RealTime Streaming Protocol (RTSP) are making the transmission of data even more efficient. UDP and RTSP are ideal for audio broadcasting since they place a high priority on continuous streaming rather than on absolute document security. Unlike TCP and HTTP transmission, when a UDP audio packet drops out, the server keeps sending information, causing only a brief glitch instead of a huge gap of silence. TCP, on the other hand, keeps trying to resend the lost packet before sending anything further, causing greater delays and breakups in the audio broadcast. Prior to UDP and RTSP transmission, data was sent over the Web primarily via TCP and HTTP. TCP transmission, in contrast to UDP and RTSP transmission, is designed to reliably transfer text documents, email, and HTML web pages over the Internet while enforcing maximum reliability and data integrity rather than timeliness. Since HTTP transmission is based on TCP, it is also not well-suited for transmitting multimedia presentations that rely on time-based operation or for large-scale broadcasting. Later in the chapter, you will learn why protocols are important. Some streaming technologies such as RealAudio and Windows Media utilize dedicated servers that support superior UDP and RTSP transmission. Other formats such as Shockwave, Flash, MIDI, QuickTime, and Beatnik are primarily designed to stream from a standard HTTP web server. While these formats are cheaper and often easier to use since they do not require the installation of a new server, they are typically not used in professional broadcasting situations that require the delivery of hundreds or thousands of simultaneous streams. HTTP streaming is thus referred to as pseudo-streaming, since technically it is possible to stream via HTTP. But it is much more likely to cause major packet drop-outs, and it cannot deliver nearly the same amount of streams as UDP and RTSP transmission. Herein lies the difference between most low-end solutions and more professional broadcasting solutions that require dedicated servers and extra bandwidth and server capacity. Regardless of the advances in UDP and RTSP transmission protocols, streaming media would not be possible without the rapid innovation in encoding algorithms or codecs that compress and decompress audio and video data. Uncompressed audio files are huge. One minute of playback of a CD-quality stereo audio file requires 10 MB of data, approximately enough disk space to capture a small library of books or a 200-page web site. Standard modem speed connections--including cable modems and xDSL systems--do not have the capacity to deliver pure, uncompressed CD-quality 16-bit, 44.1 kHz audio. In order to stream across the limited bandwidth of the Web, audio has to be compressed and optimized with codecs, which are compression-decompression encoding algorithms. In general, compression schemes can be classified as "lossy" and "lossless." Lossy compression schemes reduce file size by discarding some amount of data during the encoding process before it is sent over the Internet. Once received on the client side, the codec attempts to reconstruct the information that was lost or discarded. The benefit to this sort of compression lies in the smaller file size that results from discarding the "lost" information. The JPEG image format uses lossy compression to sample an image and discard unnecessary color information. Similarly, lossy audio compression discards frequencies on the high and low end of the spectrum and attempts to locate and remove unnecessary audio data. The technique is often referred to as "perceptual encoding" since the user is unlikely to notice the absence of this information. Lossy compression offers file savings on the order of 10:1. Since small file size is so important on the Internet, practically all of the formats we're interested in employ lossy compression. Here's how it works. First, the client player decompresses the audio file as it downloads to your computer. Then it fills in the missing information according to the instructions set by the codec. To illustrate why lossy compression is so crucial, consider the phrase, "Now is the time for all good men to come to the aid of their country". One way to compress this would simply be to remove all the vowels and spaces: "Nwsthtmfrllgdmntcmtthdfthrcntry". That cuts the message from 71 characters to 31, a 56% file savings, but of course our compressed message is unintelligible. Imagine that our codec, however, has appropriate rules for decompressing this message with minimal distortion. The conversion likely wouldn't be perfect, but it would be good enough to understand the message, something like, "Now's tha ti'm for oll gudm en to com to the aad of their country". This is exactly what happens with lossy audio compression. The compressed file is unintelligible to the listener; the decompressed file is intelligible but of a lower quality than the original. For example, a RealAudio speech file encoded from a standard AIFF or WAV file is generally one-tenth the size of the original file after encoding. To reduce that file's size, first you preserve the integrity of the 1,000 Hz to 4,000 Hz frequency spectrum of the human voice and then discard the frequencies above and below those ranges. By eliminating the unnecessary low- and high-end frequencies, the encoder is able to reduce the file size while maintaining speech intelligibility. It should be noted that speech tends to have aural characteristics (sound) that extend into the 7,000 Hz range. When the area between 4,000 Hz and 7,000 Hz is reduced or removed entirely, encoded speech will sound intelligible, but it may lose clarity and sound unnatural. Furthermore, since some voices and sounds often reach into even higher frequency ranges, lossy compression and encoding can result in dull, muted, or abrasive sounds.
About the Author
Josh Beggs is co-founder and president of Raspberry Media, a Design Firm in the San Francisco Bay Area specializing in Web-smart architecture, interface design, and brand development for Internet start-ups. Josh began his career in the multimedia industry as a recording engineer and sound designer. In 1995 he produced the interactive soundtrack for EMI Records flagship CD-ROM, Queensr˙che's Promised Land. After receiving impressive reviews from Billboard Magazine (March 1996) for the soundtrack, Josh went on to explore interactive media design with Raspberry Media. In addition to designing some of the top Web sites on the Internet, he also follows his musical passions as a pianist and recording artist.
Dylan Thede's multimedia experience began in the
cultural mecca of the San Francisco Bay Area in 1985. At a young age, he was
designing sound systems and multimedia presentations for the University of
California at Berkeley. At the University of California at Santa Cruz, Dylan
became a pioneer in the emerging fields of Digital Audio, Digital Video, and
Multimedia and later graduated with a degree in Multimedia and Psychology. He
was one of the pioneers in web design when the World Wide Web burst onto the
scene in 1994. In 1995, Dylan founded AudioVisualize, a multimedia consulting
company that caters to companies who wish to implement multimedia into their web
sites and corporate operations. Besides writing and creating multimedia
projects, he is also a musician and is currently composing and recording music
for an upcoming multimedia CD release.
Streaming protocols
Lossy compression