For the podcast I’m recording [How the Tiger got her Stripes…so to speak], I’m including a few details on the settings I record and edit with and why. My research on what each number means is a bit rusty, so I’m giving myself a brief refresher course on How Digital Recording. Or more specifically, why I choose the settings I do and use the gear I use.
In order to understand the settings, you gotta know the gear:
Recorder: I’m recording on a MixPre 3 digital recorder. Audiobook narrators take note - you simply cannot beat this device for sound quality. The preamps are beautiful, the noise floor is all but non-existent, ease of use is excellent, the list goes on. Two features I love about this thing that are unheard of in other devices of its class, is that it has limiters and low-cut filters built in…and those limiters and low-cut filters are analog. My personal holy grail. The more you can do to capture your sound in the world of analog, the more faithful your sound capture will be.
Mics: Oktavamod large diaphragm mics modified to nail the Neumann U 87 mic sound to a T, built on an initial mic design known for it’s competition-beating low noise floor. The U 87 sound is no exaggeration; if you go to the website you can hear this mic compared to the U 87 in a blind sound-test…and actually? my ridiculously picky ears chose the Oktavamod. Michael Joly went the extra mile for me and further modified my mics with a diaphragm that would be even better at controlling my highly sibilant S’s. I recommend Michael and his mics/mods unreservedly. He is a delight to deal with.
Now! On to the more specific point of the article…
The MixPre is set for 24-bit depth, 96 kHz sample rate, and .wav uncompressed format. All my gear was selected for the lowest noise floor I could get. Why? Because less noise…allows me to make more noise. Lets keep going, cause that really does make sense in a couple of minutes!
Why I use 24-bit depth:
Why record at 24-bit depth? Especially since the final Audible file will be at 16? Well, where I really need the extra dynamic range & excellent signal-to-noise ratio, is in the actual recording process. Having the room to be ‘too loud’ and not clip, or ‘too soft’ and have clean sound I can boost in post, means far fewer technical issues to go back and re-record. Basically, I can retain subtle sounds without increasing overall signal noise; ‘recording for noise…without noise!‘ Recording at 24 means retaining a clean too-loud, retaining a clean too-soft, with best overall signal-to-noise ratio my machine can give. I’m a dynamic performer, so this is important to me.
The extra headroom provided by 24-bit can also be heard as physical soundspace in which recording was made – and this sense of physical presence matters hugely to me. I want to sound like a real person, not a flat voice emerging out of a void. This is the same logic behind retaining most inhales and subtle mouth noises in my performances – I want to sound like a real person reading to you, performing for you. I carefully plan breaths and subtle noises into the performance – they are never random; they are all intentional.
This style decision is pretty controversial – either you love it or you hate it :) I’ve had more than one fan write to me saying, ‘You have ruined me for all other audiobooks! No one breathes! You can’t hear the soundspace! Other books all sound airless!’ This gives me non-malicious glee. Yes – this is what I intend as a producer. Similarly non-fans have written exhorting me to edit for god’s sake! not realizing that the final product is edited to within an inch of its life, and retains indicators that a real person read the book on purpose. Anyone with an ear for editing or knowledge of production or the experience of listening to hundreds of audiobooks can hear the careful editing. The difference between untutored ‘real sounds’ recording and what I do is vast – but I can appreciate that at first glance one could make that snap judgement, or just dislike the result, no matter how polished.
So! What are the technical advantages of that 24 bit depth that allow these performance elements to shine in the recording?
From: Sample Rate and Bitrate: The Guts of Digital Audio by Dan Connor [emphasis mine]
In general, the higher bitrate the ‘smoother’ the sound will be. 8-bit sounds rather grainy and harsh whereas 16-bit sound sounds quite a bit better. 24-bit sound is used by most audio professionals these days not because it sounds so much better than 16-bit sound but becausethe higher accuracy is useful because so much is done to the audio in the recording, mixing, and mastering process. Higher bitrate means that each change that is done to the sound produces a more accurate result. Imagine only being able to describe the sounds you’re recording with two volumes: on or off. It would be impossible to produce any music at all with such a low bitrate.
96kHz sample rate:
I record and edit at 96kHz sample rate. Why record sounds literally above the range of the human ear? Here’s a great explanation – one that my ear agrees with:
From: 16 Bit vs. 24 Bit Audio by The Tweak (tweakheadz.com) [emphasis mine]
Nyquist Theory and Sample Rate
This theory is that the actual upper threshold of a piece of digital audio will top out at half the sample rate. So if you are recording at 44.1, the highest frequencies generated will be around 22kHz. That is 2khz higher than the typical human with excellent hearing can hear. Now we get into the real voodoo. Audiophiles have claimed since the beginning of digital audio that vinyl records on an analog system sound better than digital audio. Indeed, you can find evidence that analog recording and playback equipment can be measured up to 50khz, over twice our threshold of hearing. Here’s the great mystery. The theory is that audio energy, even though we don’t hear it, exists as has an effect on the lower frequencies we do hear. Back to the Nyquist theory, a 96khz sample rate will translate into potential audio output at 48khz, not too far from the finest analog sound reproduction. This leads one to surmise that the same principle is at work. The audio is improved in a threshold we cannot perceive and it makes what we can hear “better”. Like I said, it’s voodoo.
Pretty cool, eh?
In the end, everything I record for Audible (I do record elsewhere) is down-sampled *sigh*. Audible’s file parameters are 16 bit, 41kHz, 192kbs compressed MP3s. (here insert an entire soliloquy full of pathos bewailing MP3 compression losses)
So why go to all this trouble? Because the more detailed and cleaner the original raw data, the more fidelity you retain even after effects, editing & processing. Lets look at each of those in a bit more detail.
Processing is a lot of steps - and each software step removes a bit of sound quality. Boosting volume, software limiter, deesser, click removal, hiss removal… So where does this 'recording for noise without noise’ come into play? Mouth noises! Retaining the almost subliminal emotion indicators created by a quiet inhale, or a lip lick, or a stuttering breath, or a swallow, or the many tiny noises that comprise actual enunciation but that change each time the spoken word expresses a different emotion. You’d be amazed how important these things can be to the way in which the performance is perceived on an emotional level.
You’d think this couldn’t be true! But an example of this can be heard when I voice a man. One of my strongest strategies for voicing men is to flatten out inflection - drastically. Compared to voicing women or a melodious narrative voice, it’s downright monotone. You don’t notice this, because it’s a real male speech pattern. It lets me evoke 'essence of man-voice’ even though I’m not a baritone. But if you take inflection out of a line delivery, where exactly is the emotion content of that dialog going to come from? It’s going to come from all those almost-subliminal sounds. Sure your your male dialog expresses emotion with male-pattern cadence and pauses and such - but if you add in a delayed breath, a suppressed inhale, a dry-throat start to a line, a faint 'mmm’, a trailing off of air as the dialog gives up its point as useless…all of these things pack enormous emotional punch. And if your settings aren’t refined enough to retain these subtleties through the dynamism of the performance, let alone survive all the post-production, your delicate performance goes right out the window. Bam - goodbye emotional wallop!
Books make us weep and rant and laugh and occasionally throw them against a wall or out the window. Audiobooks have the great fun of amplifying the text - but it’s a subtleperformance that gives the best boost. To preserve that sublety I strive for a production esthetic that’s as transparent as a window.
My goal is to reach a point at which the entire process sounds so effortless that people listening to it have no idea how much effort was expended. Of course there’s one drawback that I have already stumbled into on occasion - if your entire process sounds effortless, you get no credit for all the effort! As they say - no mistake goes unpunished, and yet excellence goes unremarked :)
Ah well - the end result is beautiful audiobooks for you, and that’s what really counts.