The WSRS would like to thank Nagra and John Owens for permission to reprint this article, a timely reminder that we need to attend to the more aspects of the recording system than simply the narrow specmanship of bit depth and sampling rates.
In the days of analogue tape, recordings were not only dependent upon the quality of the recorder, pre-amplifiers and microphones but also on the limitations of the recording tape. Artifacts such as wow and flutter, tape "hiss", even print-through along with limitations of distortion, equalization and bias had tremendous effects on the quality of recordings, not to mention the alignment of the tape transports, as often tapes were recorded on one machine and played back on another.Today, thanks to digital technologies, all of these media and
transport related problems have been removed, and the quality relies
more heavily on other parameters, which are often either misunderstood
or simply ignored.
For example there is a common perception today that a "better"
recording will result if a recorder capable of making 24 bit 96kHz
recordings is used instead of one operating at 16 bit 44.1 kHz. Without
trying to define "a better recording" it is interesting to look at the
critical factors behind such a presumption.
The word length 16, 18, 20 or 24 bits can be related to two specific
areas of the recording chain: Firstly the word length used by the A/D
when digitizing the analogue signal, and secondly the word length which
is actually recorded on the media. The latter needs to be at least
equal to the A/D in the record chain, however it is perfectly
reasonable to record a 24-bit word length derived from a 20 or even 16
bit A/D converter. In such a case the least significant bits (LSB's)
will simply be recorded as "0's" and will serve purely to make the bit
stream compatible with other equipment of the same format, but will
have no bearing on the audio quality. Therefore to assume that a 24 bit
audio signal was created using a 24 bit A/D converter is purely
hypothetical.
Concerning the sampling frequency which is chosen, there have been many
studies arguing the virtues of higher sampling frequencies, and without
side tracking too much, it is safe to say that the more audio
information that is recorded at the outset, the better the chance of
reconstituting the original sound later, becomes. It is perhaps simply
better to decide which sampling frequency to use depending on each
particular recording.
It should be remembered that sound is generated as an analogue signal
and is perceived by the human ear as an analogue signal. Several
critical factors such as microphone choice, placement and recording
environment are far more important than the word length and sampling
frequency or even the recorder chosen. But, assuming these practical
factors are well understood, then the last remaining factor, as far as
a recorder are concerned, is the audio chain itself in terms of level,
frequency response and dynamic range. These factors in themselves have
an equally important bearing on the recording, and if misunderstood or
misused can produce poor recordings even with the best recording
equipment.
Level is probably one of the most difficult points to discuss as there
is no "golden rule" or formula for setting it, and the setting will
have a bearing on the quality of the recorded sound. It is however
important to remember that unlike the old analogue days when a signal
peaking comfortably at +3 or +6dB above maximum level was quite normal,
where the gentle progressive distortion introduced, gave warmth and
depth to the recording. In the digital world this is impossible as
digital chains do not allow this approach and therefore level setting
is far more critical than in the past. A digital signal cannot get
"louder" than 0 dB or in digital terms "7FFF" for a 16-bit sample, and
once this point is reached distortion produced is total. So in
principle, one tries to record as close to this point as possible,
without going over it to ensure maximum use of the available dynamic
range of the digital system. In reality though, as the generally
accepted tone reference is -18dB the peaks will be around -10dB at
best. This insinuates that the average or mean audio level will
probably be some 6 - 8 dB below this i.e. around the -16 to -18 dB
point.
Now, as the useable dynamic range in digital terms is calculated at 6
dB per bit, a 16 bit system should have a useable dynamic range of 96
dB (although in reality this equates to about 90 dB),
starting from the "0" point, and counting back to -90 dB. If the
recording level is peaking at around - 10dB then the maximum available
dynamic range is only in fact about 80 dB. In this scenario, with the
best will in the world, you are only making a 13-14bit recording.
If we now look at the microphone pre-amplifier stages, a dynamic range
of 120 dB can be considered as the best currently available in any
audio recorder, and therefore, using this to its absolute maximum
ability your digital level is only going to brush the 20 bit mark, so
one could argue as to the advantage of using 24 bits, in audio terms is
somewhat academic.
This being said, if one looks at a microphone preamplifier stage and
sees that its dynamic range is limited to say 85 dB, there seems little
point in worrying whether the recording is 16, 18, 20 or 24 bit and it
will make absolutely no difference to the "quality" of the recording.
A frequent comment is "Using external 24/96 converters makes it sound
so much better" well in fact this is purely because such external
converters are relatively expensive pieces of equipment, and contain
high quality analogue stages. Hence the subjective improvement has
little to do with the number of bits or sampling frequency.
The sampling frequency and frequency response go hand-in-hand really,
and although using the Nyquist theory, 44.1 kHz is sufficient to record
perfectly up to 22.05 kHz bandwidth, using higher sampling frequencies
does appear to reconstitute the original sound more accurately.
In conclusion, sound recording is a combination of art and science, and
many years experience enables the correct choices of equipment to be
made for any particular recording, but complicated technical
explanations, often imply that better recordings are made using certain
technologies. However, the quality and positioning of the microphones,
the analogue pre-amplifiers and their useable dynamic range is far more
important than the digital word length or sampling frequency. Some
manufacturers will encourage the belief that these technical
specifications are of critical importance when, in fact, much of the
time they are completely irrelevant. This may simply be because it is
relatively easy to design circuits using high bit depth converters at
high sampling frequencies, rather than designing analogue microphone
pre-amplifiers with high gain, wide dynamic range and low distortion.
The best way to judge a piece of equipment is to listen to the
recording rather than reading the glossy brochures.
John OWENS
NAGRA AUDIO
Switzerland