Help - Search - Members - Calendar
Full Version: Blind test of Noise Reduction algorithms
Hydrogenaudio Forums > Hydrogenaudio Forum > General Audio
DickD
Blind testing of Noise Reduction algorithms

A number of audio editing packages include Digital Noise Reduction routines. Some use NR plugins. I want to blind test how good they are.

These usually take a sample of noise (e.g. tape hiss) with no intended signal present and analyze it statistically in the frequency domain, then attempt to remove noise from the audio signal without removing the desired audio.

I want to find out which, if any, are audibly transparent, which types of material might cause problems, and which routines are the best (including best of the free and best regardless of cost, so I'm providing the method and a sample of noise, which you can try out in any software you have, and you can provide samples here for all of us to rate or ABX.

The noise I'm providing is pretty low level as it's recorded after passing through Dolby B NR. It's the result of tape hiss in the side two leadout of a commercial prerecorded ferric tape mastered with Dolby B/Dolby HX-Pro. It's played back through a Sony WM-36 Walkman with Dolby B turned ON, Normal position, all EQ set to zero and volume reasonably below maximum to reduce distortion, fed to the line in of a fairly cheap 'PnP-16v' soundcard (with all other recording inputs deselected) and adjusted to zero DC offset during read-in (CoolEdit 2000 trial). Avg RMS Power = -55.41 dB (Left), -62.74 dB (Right). Peak value = -174.

I'm also providing an independent 3.07 second (135180 sample) noise sample from which to gather statistics of the noise, recorded with the same settings in the same session, but from the lead-in of side two of the tape (I used this on some real restoration of a quiet track with a very quiet intro). Avg RMS Power = -57.57 dB (Left), -61.97 dB (Right).

I guess being at opposite ends of the same side of tape, might be an approximate worst case. I'm also providing a much longer (22.513 sec) sample of noise from before a very small click that preceded the 27.495 second sample, also at the leadout of the tape, in case its proximity or duration might make it a better independent sample for training the noise reduction routine to recognise the right kind of noise. Avg RMS Power = -55.38 dB (Left), -62.64 dB (Right), peak value = -131.


Noise samples: (all compressed using FLAC v1.0.3 at level 8)

Add this noise to your test audio:
noise_DolbyB_sndcard_27_495_sec_leadout.flac (1.61 MB, 1212532 samples)

Use this sample to get noise profile:
GetNoise_sample_DolbyB_sndcard_3_07sec_leadin.flac (0.18 MB, 135180 samples)

This is an alternative noise sample from the leadout to use for NR profile or adding to audio:
extra_noise_22_513sec_leadout_before_pop.flac (1.32 MB, 992832 samples)



Modern CD's are usually mastered louder than tapes would be, so you may wish to replaygain (Album Gain?) before adding the noise to bring the samples into line. It so happens my noise was recorded with the tape and line-in amplification set where the cassette album provided 89.89 dB loudness (as measured by running Album Gain in Foobar2000), so this is close to standard 89.0 dB for ReplayGain.

The tape restoration I did from this 1991-vintage tape was recorded so it peaked at about -2 dBFS (~0.80). The Album Gain then turned out to be -0.89 dB - i.e. loudness was 89.89 dB. The loudest track (with the -2 dB peaks) had a Track Gain of -2.52 dB (peaks ~0.80, volume = 91.52 dB) and audible hiss on the intro and on the fadeout, and the next track was the quiet one with a very quiet intro and the most audible hiss throughout, with Track Gain +3.37 dB (peaks ~0.45, volume = 85.63 dB). The intro to this track would need a track gain of +10.46 dB to reach 89 dB (peak =0.20, volume = 78.54 dB). The noise sample (after Dolby B) from the tape leadin has a track gain of +47.84 dB (i.e. the hiss is perceived as 37.38 dB quieter than this very quiet intro) which is clearly audible. For that restoration, I ended up applying 24 dB noise reduction in Exact Audio Copy (EAC) and the bitrate of MusePack and Lame APS files of the intro dropped by about 20-30 kbps after NR compared to the files before NR, indicating crudely how much of the hiss was considered audible by the psychoacoustic models.


The method is:

• 1. Rip a 44.1 kHz stereo source file from CD (pref in EAC secure mode for accuracy)

• 2. Select and trim to an excerpt no longer than 27.495 sec (1212532 samples) - this is the length of the main sample of noise I provide and falls within the 30 sec guideline for test samples specified in the forum rules. Jot down the length in samples, then Save As... your original file, e.g. test_orig.wav

(Consider reducing the file to a lower volume if it's within 174 samples of full scale: +/- 174 is the peak level of the noise to be added, and we should prevent digital clipping, which we're not trying to test. Also, NR routine may adjust adjust frequencies and phases slightly and could potentially cause clipping, so it's advisable to leave some headroom anyway. Furthermore, most tapes don't include music at an equivalent of 98 dB ReplayGain loudness, so reducing to 89 dB album gain might be in order. You might even wish to gently low-pass filter the result at around 15 or 16 kHz to simulate an audio tape's frequency response. I haven't filtered my examples below.)

• 3. Mathematically add the noise I provide to the original using a function such as Mix Paste/Combine/From File... (in CoolEdit 96). If the noise has extended the length of the file over the original, trim it back to the exact original length. Save As... e.g. test_noise.wav

• 4. Open the GetNoise sample which is an independent sample of noise from the leadin of the same tape. Get noise profile from this and save the profile if necessary or desired. (Some software, like EAC, can't get the noise profile from the whole waveform, so you may need to append your desired test_noise.wav file first, keeping the profile noise selected, then Get Noise Profile from Selection, then delete the selected noise).

• 5. Open the noisy file, test_noise.wav and apply Noise Reduction to the whole file, making a note of the settings used (in Audacity, a screenshot may be required for the less/more slider). Save the resulting audio, for example as test_EAC_NR24dB.wav

• 6. Now you can use blind-test tools such as WinABX to compare and rate the noise-reduced file (e.g. test_EAC_NR24dB.wav) with the original (test_orig.wav) and repeat step 5 with different settings if desired. If differences are obvious, it may be instructive to try very low (near zero?) amounts of reduction and blind test against test_noise.wav to judge how much the routine's FFT/inverse FFT and windowing will mangle the audio on their own (scaling down the added noise by about 20-30 dB would do the trick).

• 7. It'd be great if you'd supply the original sample and the post-NR sample here for us to try out, and if you try adding a different noise to add to your audio (e.g. 8-bit soundcard noise, vinyl noise, non-Dolby tape hiss, 50/60 Hz mains hum), tell us what it is and post it too if you can.

If you choose to rate things you've ABXd, this is the recognised scale, and you can use decimal fractions within the range:

5.0 = Imperceptible (not perceptible)
4.0 = Perceptible but not annoying
3.0 = Slightly annoying
2.0 = Annoying
1.0 = Very Annoying

I won't make a special effort to rate all my samples, so feel free to rate them youselves.

Example:

I tried to choose a difficult track to test this and used some aggressive settings when trying NR, so you might consider this to be training so that you know what types of defect you're trying to spot. With more noise in the first place (e.g. Dolby OFF), the defects would have been worse, but this represents a real situation when Dolby ON was required to restore the correct timbre to the music on a cassette mastered with Dolby B.

Hopefully, this sample will demonstrate the shortcomings of digital noise reduction so you know what to listen for. From a promotional jazz CD called Light Jazz by Lucent, ripped with EAC secure mode, this is 27.49 sec of the intro to 08 - Marie Bergman - All The Way.

It includes a clean intro with low background noise, then introduces vocals, with sibilants (ss, sh, ch, th, etc) that can be mangled by digital NR, and also introduces brushed drums, which sound very like tape hiss and might cause problems for the NR routine (listen for smushed/flanged sounds and high frequency twinkling or burbling like a 1960's sci-fi computer). I didn't choose to apply ReplayGain, but Track Gain was -3.94 dB.


Original CD control sample:
AllWay_orig.flac (2.1 MB, EAC secure mode, 1212440 samples)


CD sample plus noise:
Used mix-paste in CoolEdit96 to add the noise, unscaled (100%), then trimmed off samples 1212440-1212531 to leave 1212440 samples.
AllWay_noise.flac (2.2 MB, 1212440 samples)


Heavily Noise-Reduced examples:
I got the noise profiles from the 3.07 second sample then tried some fairly heavy NR to show the sort of damage it can do on this difficult sample. You can 'Keep Only Noise' to hear what's being removed, but this isn't necessarily psychoacoustically valid because some of what's removed might be adequately masked in the original signal.

CoolEdit 96 Noise Reduction:
Without testing for the best settings, I collected 263 statistical snapshots with 8192 point FFT, precision=11, smoothing=3, transition=0dB, then set the NR amount to 100% (values over 100% can also be typed).
The Help file in CoolEdit96, especially in the NR department, is an education in itself, by the way.
AllWay_CoolEdit96_NR100.flac (1.83 MB)

I don't think CE96 uses lapping transform windows or anything clever like modern codecs do, and it just smooths a number of FFT windows together. If you Keep Only Noise then amplify it, you can often hear the boundaries between windows. I first used this some years ago to reduce some dreadful FM noise from radio recordings (and FM noise is very glitchy and hard to remove cleanly, so some of the worst recordings have pretty bad burbling and sibilants after cleanup. It took about 42 seconds to save undo data (1 sec) then perform the NR (on a PentiumII-350 with 320MB RAM under WinNT4.0)


Exact Audio Copy / Process WAV / Noise Reduction:
EAC's WAV editor works only with 44.1 kHz WAVs. It has no user-definable settings when getting the noise profile, and only a dB scale for the approximate amount of noise reduction (from 0 to 48 dB) and version 0.95pb2 now displays the selected number to 0.1 dB precision above the slider. It took about 25 seconds to process the same file on the same PC.

EAC, slider at 24 dB reduction:
AllWay_EAC_NR24dB.flac (1.76 MB)

EAC, slider at 48 dB reduction:
AllWay_EAC_NR48dB.flac (1.75 MB)

In normal use, I'd probably aim for about 12 to 18 dB reduction on this file, as the noise would become reasonably well masked. Another option is to noise-reduce only until the onset of the brushed drum, then apply no NR beyond that point. Usually it's safest to apply only light NR, but partly that's the point of this blind test exercise. If you keep only noise with EAC, it's not easy to hear boundary effects, so perhaps it uses lapping transform windows to overcome this (any clues, anyone?).


Anyway, these may well need only rating, not ABXing, though certain sections might be indistinguishable from the original. This was a difficult sample, and I did use fairly aggressive amounts of NR. In section 20.0 - 22.6 sec for example, the high frequency brush sound on the original is fairly constant, but is modulated more and is slightly lower in level on the EAC 24 dB NR version. On cheapish, pleasant-sounding headphones, the difference is readily perceptible, so not transparent (WinABX: 7/7; p = 0.8%) but not especially annoying given a moderate level of environmental noise, so would probably rate around 3.8 (EDIT: I mean just for that brief section, not the whole sample, which as 2Bdecided, points out has lost some twang on the bass for example). For more aggressive NR, worse algorithms and listening in a quiet environment, twinkling may be heard quite easily, or smushing of sibilants or transients might be considered much more annoying.

For some kind of figure of merit, I suppose this would need to be weighed against the annoyance of the noise in the original, which will depend largely on how well it's masked, how natural it sounds, and how easy it is to ignore the noise. If the noise in the original is considered less annoying that the distortions in the edited version it's subjectively not worth using noise reduction with those settings.

(EDIT: In this particular case, I think it's not worth it, as the noisy version isn't intrusively noisy (and in fact noise is below the brushed drums later on), and even worse, as we'll see below, eliminating the noise eliminates some of the enjoyable parts of the signal)

These problems also tend to be clearer on headphones than speakers.

5.0 = Imperceptible (not perceptible)
4.0 = Perceptible but not annoying
3.0 = Slightly annoying
2.0 = Annoying
1.0 = Very Annoying

Feel free to post your own ratings, preferably with details of your listening environment.

Other NR routines to test would include the one built into Audacity (which I have), newer versions of Cool Edit / Cool Edit Pro, plugins such as Waves' Noise Reduction. Any more suggestions?

Any volunteers to test certain packages or provide further original samples to which we can add noise?

Feel free to add anything except unverifiable opinion. Provide files so we can hear for ourselves whether product X sounds as good as you say, or you know how little weight your opinion will carry (unless you have a really good reputation around here!).

Regards,

Dick Darlington
2Bdecided
That's a good sample to test this kind of thing! To my ears, all three NR versions you've provided remove too much of the "string slapping" sound from the bass at the start of the piece. The life's gone. If I didn't have the original, I'd take the version with the hiss in preference to the NR ones. Sorry!


I tried the NR in Cool Edit Pro 1.2a using 300 snap shots, NR level 40, 4096 FFT, Reduce by 10dB, Precision Factor 8, Smoothing Amount 1.5, Transition Width 1. I also tried much greater "Reduce by" values, and slightly higher "NR level " values.

(most of these are default values - I've read about and tweaked most of them at some point, but I haven't put my Cool.ini file back from a recent re-install, so these are 1.2a defaults)

It left more of the string bass intact, but probably didn't remove so much noise. Since my PC is up on the desk next to me, it's hard to hear! However, it sounds OK in this environment. (Can't upload - no space - but I think a lot of people here are using CEP/CE2k so there will probably be lots of people playing around with it's NR)


This is an interesting listening test, though it's as much a test of "listener taste" and "NR operator choice" as it is of product effectiveness. I look forward to reading/hearing other results.

Cheers,
David.
DickD
QUOTE(2Bdecided @ May 13 2003 - 04:21 PM)
That's a good sample to test this kind of thing! To my ears, all three NR versions you've provided remove too much of the "string slapping" sound from the bass at the start of the piece. The life's gone. If I didn't have the original, I'd take the version with the hiss in preference to the NR ones. Sorry!

Thanks for the feedback, David. I'm also stuck with a loud computer fan!
(BTW, you could try tripod.lycos.co.uk (or tripod in other lycos country domains) if you want a free webhost in exchange for top-of-page adverts).

I'm not supporting any of these as a good or universal solution. I think they should be used judiciously when they're really needed and provide improvement, and I didn't audition any of them carefully to judge the best compromise, as I would if I was doing it in real life.

Actually, my 3.8 rating for the 24 dB one wasn't actually for the whole sample anyway (just the short section I mentioned), and was a bit of an afterthought, so please disregard it.

I can clearly hear the loss of transient edge as the bass strings are plucked too, on all three samples I provided, and could ABX this easily (e.g. 8.0-11.0 seconds, 4/4 and it was so easy I stopped there, sure I'd get 7/7 if I could be bothered). This is something I hadn't clearly identified as a problem when all I had was a cassette of a less transient-rich (and much quieter) track, with no noiseless "control" sample to compare it to - only a noisy sample, which was obviously different thanks to the loud noise.

It isn't as bad as truly objectionable and non-musical artifacts, like smushed sibilants and loud twinkling, but it's a loss of enjoyment, realism and liveliness compared to the twangier original.

What will be interesting is whether a lower setting (or even 0dB) will still have these problems.

I forgot to mention that I also had an inkling that some of the heavy ambient reflections (delayed, attenuated repeats of the original that give it sparkle, atmosphere and stereo space) seemed to have been lost in the noise-reduced versions. I haven't been able to ABXd this independently of the transients. I always seem to focus on the loss of transients, but it's also the case that transients reflected and appearing in different ears at different times after the initial sound and attenuated, tend to create the impression of ambience and the type of space one is in, and that ambience around constant tones can't be detected this way, so having smoothed-out the sharp edges of impacts (which are broadband, brief impulses, so look very like broadband noise on a spectral plot, which may be why they're removed by the NR algorithm) that's why it's lost the ambience. So I think this ambience thing I identified on casual listening was a symptom of the transient problem you identified.

I wonder if this could be improved by trying to detect transients and doing something different (e.g. making an assumption that the noise would be masked or somehow noting that a certain amount of broadband spectral content had come from that shorter time period). Or perhaps the FFT block/window size is too large to preserve the transients or to make them high enough in amplitude averaged over the window in question to be markedly above the known noise profile. I guess trying lower reduction settings (around 6 dB) might provide clues to those sorts of questions.

I'm not sure whether these routines are simply calculating the presumed noise using FFTs and inverse FFTs then subtracting that from the untransformed WAV or whether they're actually transforming and inverse transforming the whole WAV, editing it during the frequency-domain representation. Nor is it obvious that they're preserving the complex part of the FFT, which would include information to reconstruct the phase/timing of transients.

Later, I may get time to create and upload samples with less aggressive reduction to see if any of these problems reduce or disappear.

As ever, feel free to post more tests and samples, folks.

[Addendum: Just to point out, I've edited my first post near the end, to acknowledge some of these points and that this "All The Way" test sample is one where that level of noise is less objectionable than the losses of the noise-reduced version]
KikeG
Just one advice, for these kind of comparisons that require rating several samples, ff123's abc/hr tool is more adequate than a plain abx comparator, more if the samples are clearly not from transparent.
DickD
KikeG, I quite agree about ABC/HR being better for this situation. It seems that impairments are large enough to be obvious in many/most cases, so rating the algorithms blindly is the better approach.

Just to double-check that the transients (e.g. plucking the bass) could be adequately heard with noise present, I tried adding the noise to the EAC 24 dB noise-reduced sample and ABX'd it against the original plus nose (AllWay_noise.wav). It was easy (9/9). So the noise reduction routine is trying too hard to remove noise or isn't psychoacoustically aware enough (e.g. short enough time windows) to recognise that transients are different from the hiss it had sampled.

The reason I carried on until 9/9 was that "X is A, Y is B" was the correct answer for the first 8 trials in succession, and being human, I wanted to see how long it would go on. For the 9th trial "X is B, Y is A" was the correct answer. Getting either of the two "lucky streaks" of 8 trials the same was only a 0.78% chance (1 in 128), so I'm hardly incredibly lucky and I won't be buying a lottery ticket today! ;)

I understand that CEP has both hiss reduction and noise reduction. I presume that hiss reduction (which I believe has three levels in CEP) is a simpler approach. Presumably it does one or both of low-pass filtering and dynamic range expansion (possibly multiband, and presumably with the major effect only on low amplitudes, so the rest of the signal doesn't have too much harmonic distortion).
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.