File Format Comparisons
by Paul Arnote
Audio File Formats
In our comparisons of file formats, this month we will take a look at the more common audio file formats. Hopefully, this brief overview of the different audio file formats will help you figure out which audio file format is best for your intended use.
Just as with graphic file formats (see the July, 2009 issue of PCLinuxOS Magazine), audio formats can employ either lossy or lossless file compression. With lossless data compression, an exact extraction of the original data from the compressed data occurs, without any loss of sound quality. Meanwhile, lossy data compression offers an approximation of the original data, sacrificing some of the original sound quality in the quest for better compression rates.
Sample rate refers to how many samples are taken per second. It is commonly expressed in Hz, or Hertz, which stands for the number of samples per second. One hertz is, thus, equal to one cycle per second. So, an audio file with a sample rate of 22.5 kHz will make 22,500 samples per second. An audio file with a sample rate of 44.1 kHz will make 44,100 samples per second. DVD audio is commonly done with a sample rate of 48 kHz, or 48,000 samples per seconds.
Sample depth refers to how many bits each sample of data occupies. Although it is available at such sample depths of 2, 4, and 8 bits, the 16 bit sample size is most common for accurate sound reproduction, and is used most commonly to save sound files in stereo.
One other term we have to understand is pulse code modulation, also referred to as PCM. PCM is a digital representation of an analog signal that is sampled regularly at uniform levels, then quantized (converted) to a binary code that represents the original signal. PCM is the standard for digital audio in the computing world, as well as being commonly used with digital telephone networks and electronic musical keyboards. It exists all around us, forming the foundation for audio (Red Book) CDs, and even for the sound we all hear when we play a DVD in our home DVD players.
So with that taken care of, lets take a look at some of the more common audio file formats.
AIFF (Audio Interchange File Format)
This sound file format was co-developed by Apple Computers in 1988, and is based on Electronic Art's IFF (Interchange File Format), which was in wide use on Amiga computers. Most commonly, you will find the AIFF format in use on Apple Macintosh computers. It can usually be recognized by the .aiff or .aif file extension.
Traditionally, the AIFF format is uncompressed. As such, it is considered to be a lossless audio file format. It is frequently used by professional-level audio and video applications. Since it is uncompressed, it is possible to stream multiple audio files from the disk to the application used for playback.
Because it is uncompressed, it takes considerable space to save an AIFF file. One minute of audio will take approximately 10 MB of hard disk space to store, at a sample rate of 44.1 kHz and a sample size of 16 bits. Despite the enormous amount of space it takes to store audio in the AIFF format, you can expect accurate and faithful sound reproduction.
WAV (Waveform Audio Format)
WAV, also known as "Audio For Windows", is a format for storing bit-stream audio on PCs, and was developed by Microsoft and IBM. Formally released in 1992, it is very similar to the AIFF format (above) that was co-developed by Apple Computers. It is the main format used to store audio on Microsoft Windows® systems. It is usually recognized by the .wav file extension.
Like the AIFF format, the WAV file is most commonly uncompressed. This makes it a lossless audio file format. WAV files are compatible with Linux and Macintosh operating systems, besides Windows. Due to its lack of compression, WAV files are commonly used by professional users or audio experts to preserve maximum audio quality.
The WAV file format has lost some of its popularity, due to its traditionally uncompressed nature and inherently large file sizes. This makes them an unpopular choice for internet file sharing or transfer. Just as with AIFF files, WAV files take up a lot of space. One minute of audio recorded at 44.1 kHz with a 16-bit sample size will consume approximately 10 MB of disk space.
However, it's popularity does continue, as the WAV file makes a good choice for first generation audio recordings, which can be edited repeatedly without any loss of quality. It is also popular because of its relatively simple file structure, and because it is familiar to a very large number of computer users. In many cases, the WAV file represents the "least common denominator" when it comes to the exchange of file formats between different applications or operating systems.
One significant limitation of standard WAV files are their limitation to files of no larger than 4 GB. This is due to the use of a 32 bit unsigned integer used to record the file size header. Although this is large enough to store 6.6 hours of audio recorded at 44.1 kHz and 16-bit sample size, it may be necessary at times to exceed this inherent limit. Some programs change this to a 64 bit unsigned integer to circumvent the file size limitation, but it is not yet considered standard implementation of the WAV file format.
In PCLinuxOS, most programs that offer audio file playback will play WAV files.
MP3 (Moving Pictures Experts Group [MPEG]-1
Audio Layer 3)
Approved as an ISO/IEC in 1991, the MP3 file format is probably one of the most popular and most used of the current crop of audio file formats. It has become the de facto standard for the vast majority of consumer portable audio players. Besides being widely accepted, one of the biggest attraction to MP3 files is their compression. This compression can, at 128 kHz bit rate with a 16-bit sample size, can compress the audio information represented by the WAV format in a file 1/10th of the size. This means that one minute of audio recorded in the MP3 format (at 128 kHz bit rate and with a 16-bit sample size) will only occupy approximately 1 MB of disk space, compared to the 10 MB it takes to save the same audio recorded in either the WAV or AIFF file formats.
The trade-off is in the sound quality. MP3 files use a lossy compression algorithm, and is often compared to the JPEG graphics file format. It does this by discarding frequencies that are difficult for the human ear to hear, and recording the remaining frequencies in an efficient manner. At bit rates of 128 kHz and above, it is difficult for many people to discern the loss of quality. And, of course, the higher the audio sample bit rate, the lower the compression, and the better the quality. Thus, an audio sample recorded at the 256 kHz bit rate setting will preserve more of the audio quality of the original than the same recording recorded at the 128 kHz bit rate setting.
Because of the outstanding compression employed by the MP3 file format, it has become a favorite of those who like to exchange and share files on the internet. The smaller file size means quicker downloading and file transfers. However, as with JPEG graphic files, MP3 files will suffer from what is called "generational degredation." This means that re-encoding an MP3 file will cause it to be re-compressed, and even more audio fidelity will be lost. Do this multiple times, and the loss will be significant and noticeable.
The MP3 file format has not cleared the patent hurdles, however. Numerous patents related to the MP3 file format have been filed, and they expire somewhere between 2007 (already expired, for the original MPEG-1 Audio Layer 3 specification) and 2017, depending on which patent is being applied. The Fraunhofer Society, commonly referred to as Fraunhofer IIS, patented the MP3 file format, and is generally considered to be the owner. Licensing for the use of the MP3 file format has generated the Fraunhofer Society €100,000,000 (or over $180,000,000 U.S.) in 2005 alone. Thompson Consumer Electronics also claims to hold the licensing rights on the MP3 file format in the U.S., Japan, Canada, and the EU countries. If only the MP3 patents files by December, 1992 are considered, the MP3 file format may be patent free by December, 2012. Both are known to actively pursue enforcement of the MP3 patents.
The enforcements of these patents on the MP3 format led to the development of other audio file formats in an effort to circumvent the need to license the use of the patents.
Most programs that allow audio playback in PCLinuxOS support the playback of MP3 files.
OGG (Ogg-Vorbis)
Stemming from the receipt of a letter from the Fraunhofer Society announcing plans to charge licensing fees for the MP3 format in September, 1998, the Ogg Vorbis format was born as an open source project to circumvent that licensing. Headed up by the Xiph.org Foundation, the Ogg Vorbis format employs a lossy compression algorithm, and is intended to be an open source replacement for the patent-riddled, proprietary MP3 format.
The first stable (v 1.0) of the software was released in July, 2002. The most recent version of the software, libvorbis, is v 1.2.0, and is the version that is included in the PCLinuxOS repository. Ogg Vorbis files can often be recognized by their .ogg file extension.
The Ogg Vorbis file format is popular among those who support free software. It is free, and unencumbered by patents. Many of its supporters claim that it has a higher sound fidelity, when compared with MP3 files recorded with the same bit rate and sample size. The file format has also became popular in many modern computer games for storing in-game audio.
Listening tests have shown the Ogg Vorbis format to outperform MP3, WMA, and other lossy audio formats across the board, at all sample bit rates and sample sizes, of the competing formats similarly recorded.
Just as with MP3 files, however, the Ogg Vorbis file also suffers from generational degradation, where editing an Ogg Vorbis file, and re-saving it, will result in a loss of audio fidelity. As with MP3 files, doing this several times will result in a noticeable and significant loss of audio quality.
While not as popular as the MP3 format, its popularity is increasing. This is primarily due to the fact the the Ogg Vorbis format can be used free of charge, without the licensing hassles. As a result, personal audio players are appearing on the market that will play Ogg Vorbis encoded files, in addition to the more traditional — and more popular — MP3 encoded files. Some of those devices that are known to play Ogg Vorbis files include the Sandisk's SansaClip (1.01.29 firmware and higher), Sandisk's Sansa Fuze player (1.01.15 firmware update), devices using Google Android, Samsung YP series audio players, iAudio X5, most of the iRiver audio players, to name a few. More audio devices are being added to the list as the file format increases in popularity.
Due to its open source nature, many of the audio playback programs in the PCLinuxOS repository support the playback of Ogg Vorbis files.
WMA (Windows Media Audio)
Microsoft created the WMA format in 1999 to circumvent the MP3 licensing issue, and to address perceived deficiencies of the MP3 format. It is a proprietary format that uses lossy compression, and forms a part of the Windows Media Framework. To keep things "simple," the WMA format actually consists of four different codecs: the original WMA format, WMA Professional (supporting multi-channel sound at higher fidelity), WMA Lossless (utilizing lossless compression to improve sound fidelity), and WMA Voice (utilizing lower sampling bit rates to achieve greater compression). Unfortunately, none of the latter three are compatible with the original WMA format. For this article, we'll restrict our discussion to the original WMA format.
WMA files are often "encapsulated" in the ASF (Advanced Systems Format) container. While WMA files themselves are not "encoded" to provide DRM (Digital Rights Management), it is the ASF container that provides this "functionality."
WMA is one of the most popular audio codecs, presumably due to Windows market share in the computer market. Numerous portable audio player devices exist that allow playback of WMA files. Many home/set-top DVD players also come with the ability to play WMA files, as well.
In PCLinuxOS, there are several programs that can play WMA files — well, the ones unencumbered by DRM, anyways. They include VLC, Mplayer, and XMMS, to name a few.
FLAC (Free Lossless Audio Codec)
FLAC files differ from MP3, Ogg Vorbis, and WMA files in that they employ lossless compression. As a result, a comparable FLAC file will be larger than any of the lossy file formats, but the sound quality will be the same as the source. In fact, many audiophiles frequently "back up" their audio CD collections to FLAC files; then, should anything happen to their audio CDs, they can simply burn another copy — and that copy will have the same quality as the original CD. The compression achieved by the FLAC file can be somewhere in the neighborhood of 30% to 50% of the original size of the CD track.
Another popular use for FLAC files is for transcoding to MP3 files. Because FLAC files use lossless compression, the files can be edited, saved, re-edited, saved again … and so on, as many times as you want, without any loss of audio quality. Then, once the file is edited to how you want it, it can then be saved as an MP3 file.
As a free and open source codec, there is strong support for FLAC files in Linux. And, the same free and open source nature is gaining some support among portable audio players. Some of these include the SanDisk Sansa Fuze, SanDisk Sansa Clip, Rio Karma, and the iRiver E100.
Started in 2000, the FLAC format is now under the banner of the Xiph.org Foundation, right along side Ogg Vorbis. FLAC files are also DRM free. In fact, the FLAC Project encourages developers to not incorporate any kind of copy prevention schemes.
On PCLinuxOS, there are several programs that support the encoding of FLAC files. They include aTunes, Audacity, FFmpeg, and VLC. There are also several programs that support the decoding of FLAC files, including aTunes, Audacity, FFmpeg, VLC, MPlayer, Songbird, Squeezebox, and XBMC Media Center. Several more programs support CD ripping, including aTunes, Songbird, Asunder, Banshee, Mencoder, Grip, K3b, and Konqueror (see the PCLinuxOS Magazine, July, 2009 issue) … to name a few.
Summary
Hopefully, this overview of audio file formats is helpful in sorting out performance issues with the various file formats, and helps you decide which format to use for your audio library. Of course, some of the decisions will already be dictated to you, based on the capabilities of the audio playback device you are using.
Most users are going to be perfectly satisfied with using one of the file formats that employ lossy formats (MP3 or Ogg Vorbis). But, to avoid the generational degradation, it's best to convert first to a lossless format (WAV, FLAC, AIFF), perform your edits, then transcode to one of the the lossy formats for the smaller file size.
Now … where did I put that Garth Brooks CD?