Multimedia Tips: Dealing With Digital Audio File Formats – Dan Blank: Publishing, Innovation & the Web

By Peter Welander, Control Engineering process industries editor

While working through the process of recording and editing podcasts, one of the most confusing topics has been that of understanding digital recording formats. I’m an analog kind of guy, and working with digital recorders and computer editing has been an eye opening experience. I’d like to try and explain a few things, but first I have to lay out some technical background. If you don’t want to read it all, just skip down to the subhead.

Consider for a moment the more familiar topic of digital photography. Those of us in publishing understand basic concepts of file size, format (.jpg, .gif), resolution, and the like. Some digital photos are beautiful while others look jagged and “stair-steppy.” We know that has to do with file size, pixel counts, dots per inch, etc, and we have specific minimums for what we consider publishable. If an image is too pixilated, we generally don’t run it in print.

Your correspondent in his “studio,” a quiet corner of the basement.

Digital audio is much the same. It is an attempt to duplicate the wave form of a sound with bits. Just as a curved line in a digital photo can be smooth or jagged depending on the nature of the file, it is the same in audio. High-quality digital audio as you hear in a commercially produced CD is virtually indistinguishable from the best analog audio (although audiophiles who cling to their vinyl LPs may argue the point). The problem with high-quality digital audio, like high-quality digital photography, is file size. Stereo CD quality sound is essentially 10 Mb per minute, so it isn’t practical for someone to download for a podcast.

To make file sizes more manageable, there are compression methods to reduce the size but these invariably hurt the quality, at least to some extent. For example:

Uncompressed high-quality CD audio is typically recorded at 16 bit, linear PCM (pulse code modulation) at 44.1 kHz. These files are normally designated as .WAV. There are even higher quality formats, such as 48 or 96 kHz, but these are designed for sophisticated music recording and excessive for podcasts.

The most common compressed audio format is MPEG1 Layer III, or as it is more commonly known, MP3. It is one format, but has a wide range of compression levels indicated by its frequency and bit rate, so all MP3s are not created equal. Frequency can range from 16 to 48 kHz, lowest to highest quality, respectively. Bit rate for mono files can range from 32 kbps but it goes as high as 160 kbps. If the file is stereo, the bit rate number doubles. (Don’t try to compare the numbers of .MP3 and .WAV files to each other or you’ll get really confused.)

There are other formats (.MP2, .BWF, .WMA, Ogg Vorbis, etc.) and other bit rates, but there is less likelihood that you will encounter these.

A simple recording setup that can produce high quality (relatively speaking) audio, but done on a budget. Electrovoice RE16 microphone, old Behringer mic preamp, and Marantz PMD670 recorder work as a good combination. All of it was purchased second-hand on eBay at a fraction of new prices. The fancy mic support used to be a desk lamp. The mic preamp isn’t required, as I could feed the mic directly to the recorder but it helps reduce hiss. Towels on the table reduce noise and reverberation. A few simple steps make for better sound.

So what does it mean?

If you’ve made it through this technical discourse, you may ask yourself, “OK, so what does all this mean to me? Should this affect the way I record discussions for podcasts?” As you might have guessed, the answer is yes, or I wouldn’t have gone through all that explanation.

Digital recorders normally have a range of recording options:

MP3, with various bitrates;
WAV, with one or two options, perhaps;
WMA, Windows media audio;
DSS, on Olympus recorders; and
Possibly some other proprietary formats.

So which should you use?

The general answer is, record in a higher quality format than the final product will be, and as high as useful and practical. To put it in more specific terms: We post podcasts as 44.1 kHz, 64 kbps mono .MP3 files. This is considered high quality for speech, but not good enough for serious music. It’s capable of the kind of fidelity we expect from AM talk radio. We want as high quality sound as possible but with a file size that does not take an excessive amount of time to download. (To hear the difference, listen to this sound file that illustrates the point. If you aren’t getting sound quality equal to this, there’s room for improvement.) With that understanding, how should you do your field recording?

First choice: Record in Linear PCM at 44.1 kHz. This generates a .WAV format file. Edit the podcast in this format and your pre-press department will compress it to its final form for posting. (Recording at 48 or 96 kHz, is overkill and will only shorten available recording time and complicate editing.) If you have a 1 gig memory device, you can record about 90 minutes in stereo, 180 minutes in mono.
Second choice: If your recorder only does MP3s, set the frequency at 44.1 kHz and use a bit rate of at least 80 kbps (160 kbps if in stereo) or higher. 128 mono/256 stereo is even better, and almost as good as a .WAV file
Wrong choice: Don’t record as a .WMA file if you can avoid it. Windows Media files have the lowest compatibility with editing platforms, etc., and will have to be converted.
None of the above? If your recorder doesn’t offer any of these suggested options, use whatever the instructions recommend for the highest voice quality. Stereo is not necessary.
If you are recording a phone conversation via a conference service, ask for the recording as a .WAV file. You’ll probably have to wait for them to send you a CD. If you can’t wait for the CD and they offer to let you download it as an MP3, find out what the compression specs are and follow the same suggestions above.

The fixation on 44.1 kHz is to keep some level of consistency from format to format. Editing platforms (such as GarageBand and Audacity) like this frequency and use it as a default.

It’s true that digital editing does not cause the kind of generational deterioration you get with analog (making a copy of a copy), it still exists when an audio file is uncompressed, edited, and then re-compressed. The sound does not improve through these steps. The better the original, the better the final product.

Whatever you do, choose one format and stick with it. If you do some of your recording as MP3 and some as WAV, you will have problems if you try and mix them during editing. You will probably have to convert them all to one or the other format. Moreover, given the complexity of most digital recorders, the less fiddling you do, the better.

Leave a comment