Blind testing of Noise Reduction algorithms
A number of audio editing packages include Digital Noise Reduction routines. Some use NR plugins. I want to blind test how good they are.
These usually take a sample of noise (e.g. tape hiss) with no intended signal present and analyze it statistically in the frequency domain, then attempt to remove noise from the audio signal without removing the desired audio.
I want to find out which, if any, are audibly transparent, which types of material might cause problems, and which routines are the best (including best of the free and best regardless of cost, so I'm providing the method and a sample of noise, which you can try out in any software you have, and you can provide samples here for all of us to rate or ABX.
The noise I'm providing is pretty low level as it's recorded after passing through Dolby B NR. It's the result of tape hiss in the side two leadout of a commercial prerecorded ferric tape mastered with Dolby B/Dolby HX-Pro. It's played back through a Sony WM-36 Walkman with Dolby B turned ON, Normal position, all EQ set to zero and volume reasonably below maximum to reduce distortion, fed to the line in of a fairly cheap 'PnP-16v' soundcard (with all other recording inputs deselected) and adjusted to zero DC offset during read-in (CoolEdit 2000 trial). Avg RMS Power = -55.41 dB (Left), -62.74 dB (Right). Peak value = -174.
I'm also providing an independent 3.07 second (135180 sample) noise sample from which to gather statistics of the noise, recorded with the same settings in the same session, but from the lead-in of side two of the tape (I used this on some real restoration of a quiet track with a very quiet intro). Avg RMS Power = -57.57 dB (Left), -61.97 dB (Right).
I guess being at opposite ends of the same side of tape, might be an approximate worst case. I'm also providing a much longer (22.513 sec) sample of noise from before a very small click that preceded the 27.495 second sample, also at the leadout of the tape, in case its proximity or duration might make it a better independent sample for training the noise reduction routine to recognise the right kind of noise. Avg RMS Power = -55.38 dB (Left), -62.64 dB (Right), peak value = -131.
Noise samples: (all compressed using FLAC v1.0.3 at level 8)
Add this noise to your test audio:
• noise_DolbyB_sndcard_27_495_sec_leadout.flac (1.61 MB, 1212532 samples)
Use this sample to get noise profile:
• GetNoise_sample_DolbyB_sndcard_3_07sec_leadin.flac (0.18 MB, 135180 samples)
This is an alternative noise sample from the leadout to use for NR profile or adding to audio:
• extra_noise_22_513sec_leadout_before_pop.flac (1.32 MB, 992832 samples)
Modern CD's are usually mastered louder than tapes would be, so you may wish to replaygain (Album Gain?) before adding the noise to bring the samples into line. It so happens my noise was recorded with the tape and line-in amplification set where the cassette album provided 89.89 dB loudness (as measured by running Album Gain in Foobar2000), so this is close to standard 89.0 dB for ReplayGain.
The tape restoration I did from this 1991-vintage tape was recorded so it peaked at about -2 dBFS (~0.80). The Album Gain then turned out to be -0.89 dB - i.e. loudness was 89.89 dB. The loudest track (with the -2 dB peaks) had a Track Gain of -2.52 dB (peaks ~0.80, volume = 91.52 dB) and audible hiss on the intro and on the fadeout, and the next track was the quiet one with a very quiet intro and the most audible hiss throughout, with Track Gain +3.37 dB (peaks ~0.45, volume = 85.63 dB). The intro to this track would need a track gain of +10.46 dB to reach 89 dB (peak =0.20, volume = 78.54 dB). The noise sample (after Dolby B) from the tape leadin has a track gain of +47.84 dB (i.e. the hiss is perceived as 37.38 dB quieter than this very quiet intro) which is clearly audible. For that restoration, I ended up applying 24 dB noise reduction in Exact Audio Copy (EAC) and the bitrate of MusePack and Lame APS files of the intro dropped by about 20-30 kbps after NR compared to the files before NR, indicating crudely how much of the hiss was considered audible by the psychoacoustic models.
The method is:
• 1. Rip a 44.1 kHz stereo source file from CD (pref in EAC secure mode for accuracy)
• 2. Select and trim to an excerpt no longer than 27.495 sec (1212532 samples) - this is the length of the main sample of noise I provide and falls within the 30 sec guideline for test samples specified in the forum rules. Jot down the length in samples, then Save As... your original file, e.g. test_orig.wav
(Consider reducing the file to a lower volume if it's within 174 samples of full scale: +/- 174 is the peak level of the noise to be added, and we should prevent digital clipping, which we're not trying to test. Also, NR routine may adjust adjust frequencies and phases slightly and could potentially cause clipping, so it's advisable to leave some headroom anyway. Furthermore, most tapes don't include music at an equivalent of 98 dB ReplayGain loudness, so reducing to 89 dB album gain might be in order. You might even wish to gently low-pass filter the result at around 15 or 16 kHz to simulate an audio tape's frequency response. I haven't filtered my examples below.)
• 3. Mathematically add the noise I provide to the original using a function such as Mix Paste/Combine/From File... (in CoolEdit 96). If the noise has extended the length of the file over the original, trim it back to the exact original length. Save As... e.g. test_noise.wav
• 4. Open the GetNoise sample which is an independent sample of noise from the leadin of the same tape. Get noise profile from this and save the profile if necessary or desired. (Some software, like EAC, can't get the noise profile from the whole waveform, so you may need to append your desired test_noise.wav file first, keeping the profile noise selected, then Get Noise Profile from Selection, then delete the selected noise).
• 5. Open the noisy file, test_noise.wav and apply Noise Reduction to the whole file, making a note of the settings used (in Audacity, a screenshot may be required for the less/more slider). Save the resulting audio, for example as test_EAC_NR24dB.wav
• 6. Now you can use blind-test tools such as WinABX to compare and rate the noise-reduced file (e.g. test_EAC_NR24dB.wav) with the original (test_orig.wav) and repeat step 5 with different settings if desired. If differences are obvious, it may be instructive to try very low (near zero?) amounts of reduction and blind test against test_noise.wav to judge how much the routine's FFT/inverse FFT and windowing will mangle the audio on their own (scaling down the added noise by about 20-30 dB would do the trick).
• 7. It'd be great if you'd supply the original sample and the post-NR sample here for us to try out, and if you try adding a different noise to add to your audio (e.g. 8-bit soundcard noise, vinyl noise, non-Dolby tape hiss, 50/60 Hz mains hum), tell us what it is and post it too if you can.
If you choose to rate things you've ABXd, this is the recognised scale, and you can use decimal fractions within the range:
5.0 = Imperceptible (not perceptible)
4.0 = Perceptible but not annoying
3.0 = Slightly annoying
2.0 = Annoying
1.0 = Very Annoying
I won't make a special effort to rate all my samples, so feel free to rate them youselves.
Example:
I tried to choose a difficult track to test this and used some aggressive settings when trying NR, so you might consider this to be training so that you know what types of defect you're trying to spot. With more noise in the first place (e.g. Dolby OFF), the defects would have been worse, but this represents a real situation when Dolby ON was required to restore the correct timbre to the music on a cassette mastered with Dolby B.
Hopefully, this sample will demonstrate the shortcomings of digital noise reduction so you know what to listen for. From a promotional jazz CD called Light Jazz by Lucent, ripped with EAC secure mode, this is 27.49 sec of the intro to 08 - Marie Bergman - All The Way.
It includes a clean intro with low background noise, then introduces vocals, with sibilants (ss, sh, ch, th, etc) that can be mangled by digital NR, and also introduces brushed drums, which sound very like tape hiss and might cause problems for the NR routine (listen for smushed/flanged sounds and high frequency twinkling or burbling like a 1960's sci-fi computer). I didn't choose to apply ReplayGain, but Track Gain was -3.94 dB.
Original CD control sample:
• AllWay_orig.flac (2.1 MB, EAC secure mode, 1212440 samples)
CD sample plus noise:
Used mix-paste in CoolEdit96 to add the noise, unscaled (100%), then trimmed off samples 1212440-1212531 to leave 1212440 samples.
• AllWay_noise.flac (2.2 MB, 1212440 samples)
Heavily Noise-Reduced examples:
I got the noise profiles from the 3.07 second sample then tried some fairly heavy NR to show the sort of damage it can do on this difficult sample. You can 'Keep Only Noise' to hear what's being removed, but this isn't necessarily psychoacoustically valid because some of what's removed might be adequately masked in the original signal.
CoolEdit 96 Noise Reduction:
Without testing for the best settings, I collected 263 statistical snapshots with 8192 point FFT, precision=11, smoothing=3, transition=0dB, then set the NR amount to 100% (values over 100% can also be typed).
The Help file in CoolEdit96, especially in the NR department, is an education in itself, by the way.
• AllWay_CoolEdit96_NR100.flac (1.83 MB)
I don't think CE96 uses lapping transform windows or anything clever like modern codecs do, and it just smooths a number of FFT windows together. If you Keep Only Noise then amplify it, you can often hear the boundaries between windows. I first used this some years ago to reduce some dreadful FM noise from radio recordings (and FM noise is very glitchy and hard to remove cleanly, so some of the worst recordings have pretty bad burbling and sibilants after cleanup. It took about 42 seconds to save undo data (1 sec) then perform the NR (on a PentiumII-350 with 320MB RAM under WinNT4.0)
Exact Audio Copy / Process WAV / Noise Reduction:
EAC's WAV editor works only with 44.1 kHz WAVs. It has no user-definable settings when getting the noise profile, and only a dB scale for the approximate amount of noise reduction (from 0 to 48 dB) and version 0.95pb2 now displays the selected number to 0.1 dB precision above the slider. It took about 25 seconds to process the same file on the same PC.
EAC, slider at 24 dB reduction:
• AllWay_EAC_NR24dB.flac (1.76 MB)
EAC, slider at 48 dB reduction:
• AllWay_EAC_NR48dB.flac (1.75 MB)
In normal use, I'd probably aim for about 12 to 18 dB reduction on this file, as the noise would become reasonably well masked. Another option is to noise-reduce only until the onset of the brushed drum, then apply no NR beyond that point. Usually it's safest to apply only light NR, but partly that's the point of this blind test exercise. If you keep only noise with EAC, it's not easy to hear boundary effects, so perhaps it uses lapping transform windows to overcome this (any clues, anyone?).
Anyway, these may well need only rating, not ABXing, though certain sections might be indistinguishable from the original. This was a difficult sample, and I did use fairly aggressive amounts of NR. In section 20.0 - 22.6 sec for example, the high frequency brush sound on the original is fairly constant, but is modulated more and is slightly lower in level on the EAC 24 dB NR version. On cheapish, pleasant-sounding headphones, the difference is readily perceptible, so not transparent (WinABX: 7/7; p = 0.8%) but not especially annoying given a moderate level of environmental noise, so would probably rate around 3.8 (EDIT: I mean just for that brief section, not the whole sample, which as 2Bdecided, points out has lost some twang on the bass for example). For more aggressive NR, worse algorithms and listening in a quiet environment, twinkling may be heard quite easily, or smushing of sibilants or transients might be considered much more annoying.
For some kind of figure of merit, I suppose this would need to be weighed against the annoyance of the noise in the original, which will depend largely on how well it's masked, how natural it sounds, and how easy it is to ignore the noise. If the noise in the original is considered less annoying that the distortions in the edited version it's subjectively not worth using noise reduction with those settings.
(EDIT: In this particular case, I think it's not worth it, as the noisy version isn't intrusively noisy (and in fact noise is below the brushed drums later on), and even worse, as we'll see below, eliminating the noise eliminates some of the enjoyable parts of the signal)
These problems also tend to be clearer on headphones than speakers.
5.0 = Imperceptible (not perceptible)
4.0 = Perceptible but not annoying
3.0 = Slightly annoying
2.0 = Annoying
1.0 = Very Annoying
Feel free to post your own ratings, preferably with details of your listening environment.
Other NR routines to test would include the one built into Audacity (which I have), newer versions of Cool Edit / Cool Edit Pro, plugins such as Waves' Noise Reduction. Any more suggestions?
Any volunteers to test certain packages or provide further original samples to which we can add noise?
Feel free to add anything except unverifiable opinion. Provide files so we can hear for ourselves whether product X sounds as good as you say, or you know how little weight your opinion will carry (unless you have a really good reputation around here!).
Regards,
Dick Darlington