Help - Search - Members - Calendar
Full Version: Noise Removal Algorithm
Hydrogenaudio Forums > Hydrogenaudio Forum > Scientific/R&D Discussion
Zeke
Hi all, I've been thinking about trying to implement a noise removal algorithm. I have an older version of Cool Edit which has this feature as does Audacity. It seems the way it works is that it wants a sample of the audio which the user considers "noise" and then it uses that to find and remove the noise in the recording.

Basically I would think that one would implement this by using an FFT to break the noise down into 'n' frequency bins and then use an FFT with the same number of bins of the recording we wish to remove the noise from. From there it seems you could do one of two things:

1) You could simply attenuate the recording's frequency bins which correspond to the noise frequency bin. Just attenuate it by the amplitude of the corresponding noise frequency bin...

2) Try to exactly pinpoint the location of the noise in the recording's frequency bins by using correlation with the noise frequency bins. When finding the best "match" via correlation you could then eliminate the signal in the recording's frequency bin...

...then put the recording signal back together with an iFFT.

Anybody have any thoughts on this? Just wanted to run it by you guys before I code the whole thing and find out I went down the wrong path. Thanks.

Zeke
HotshotGG
QUOTE
Anybody have any thoughts on this? Just wanted to run it by you guys before I code the whole thing and find out I went down the wrong path. Thanks.


This appears to be the most logical approach and what your saying makes sense. I am by all means no expert at this, but what about windowing the data and then breaking into down into corresponding bins? is that possible as well? Everything appears to be correct so far though you could try using FFTW. wink.gif
Zeke
QUOTE
...what about windowing the data and then breaking into down into corresponding bins? is that possible as well?

Yes, that's usually how I break down audio data into the frequency domain and then build it back together. I think I got a little sloppy with my lingo on the original email and misspoke w.r.t. analyzing the frequency domain w.r.t. time. The time domain and the frequency domain are two separate and independent things.

I'm by no means an expert here either, but I like working on these DSP algorithms. I'm kind of the DSP equivalent of a guy who monkeys with engines (and can fix them) but has not had a lot of formal training as a mechanic and would never refer to himself as a mechanic...

Zeke
Dynamic
I'd have thought that you could gather statistics from the noise sample and use those to decide.

For example, with the mean and standard deviation noise power in each bin you could analyse to see if it's approximately a Gaussian (Normal bell curve) probability distribution (as the Central Limit Theorem suggests it would be for many noise sources). (You might find that amplitude fits a normal distribution, rather than power, though - so analyse histograms on some real noise samples to see what is best)

If you take a particular frequency bin you could then estimate whether the contents of that bin are within the statistical limits you'd expect (e.g. power from zero to the mean + 2 standard deviations implies about 97.5% confidence that it's just noise and no appreciable signal, or mean + 3 standard deviations implies about 99.8% confidence). Then you can either silence the selection or subtract something like the mean power in some way that's a little more gradual and graceful (meaning you might reduce the tinkling or burbling that some Noise Reduction algorithms introduce, but at the expense of slightly more noise).

If, however the power in that bin is higher than the expected noise, you might assume that it contains a signal you wish to pass as well as the expected amount of noise, so you could pass it unchanged or with the noise subtracted. Subtracting some proportion of the expected noise might be good if it's only a fraction louder, thus reducing the amount of tinkling from suddenly letting noise through in one frequency bin because it was the 2.5% of noise that exceeded 2 standard deviations.

In my previous incarnation as DickD I made a couple of posts regarding NR techniques a few years ago.

One was regarding possible use of psychoacoustics to reduce the amount of NR applied during loud passages (in which the noise would be masked anyway).

The other was a method of blind-testing NR algorithms by superposing tape hiss (arithmetic add or subtract, for example) onto a CD recording and providing a different sample of the same tape's noise to seed the NR statistics, allowing the use of ABX or ABC/HR. In the second post in that thread, 2Bdecided pointed out quite correctly that the satisfying string-slapping transients of the original double-bass part were greatly diminished by all the NR settings I tried out. Atonal transients like these have a relatively white spread spectrum with rather low power spectral density in each frequency bin, making them look similar to noise, and putting them in danger of being removed or severely damaged by NR algorithms. Transients like these give a lot of the enjoyment to music and make me reluctant to use the CoolEdit96 and Audacity NR filters unless the sound is really bad.

Thoughts on approach

I'm sure that working in greater-than-16-bit depth (e.g. 24-bit) then dithering back to 16-bit at the end (if required - or remaining in 24-bit before further processing) is the right thing to do.

In general, my inclination would be to try a smoothly varying attenuation in each frequency bin as the power spectral density (PSD) in a bin varies close to the expected noise PSD or a few standard deviations from it, thus making more natural distortions than a hard cut-off would produce. (I guess such an attenuation should aim to scale down the real and imaginary components corresponding to each frequency bin by the same factor, that factor being unique to that frequency bin).

I guess that a relatively short time window (for 44100 Sa/s PCM audio perhaps 1024-sample FFT, maybe lapping with one 512 samples before?) would probably work reasonably well, though it needs testing. This is approximately the time-frequency resolution of the human ear (this is distinct from interaural time resolution of microseconds that gives spatial cues in binaural hearing) and might help preserve transients a little better than CoolEdit's 8192-sample typical NR settings. Then again, perhaps lessons could be learnt from transient detection and short windows used in lossy compression schemes.

If quality is more important than processing time, I guess it's even possible to analyze both the noise sample and the signal+noise audio with two different window lengths. We could probably detect and preserve transients in the desired signal better because they would stand out above the noise floor better over a shorter window length, and we could even compare the degree by which the signal appears to be above the expected noise with two different window lengths that overlap in time.

I'm mainly throwing around some ideas in case it may help you to code things flexibly enough to test various options to fine-tune the NR algorithm. I'm sure there is scope to learn a good deal by experimenting with approaches to various parts of the algorithm, perhaps optimising each in turn to home in on a near-optimal overall solution, or perhaps using a design-of-experiments type of approach to explore the solution space.

I'd guess it makes sense to gather various samples of hiss to add to a selection of clean CD recordings (e.g. tape hiss from ferric, chrome with and without Dolby B being used, perhaps some vinyl noise at 45 and 33 rpm) and separate noise samples (uncorrelated to the noise added to the music).

Other possible noises to remove might include mains hum, TV line scan whistle etc, though it possible that an algorithm that's good at gracefully removing gaussian hiss without damaging wanted transients might be less effective at removing hum and whistle which might better dealt with by a plain notch filter.
Zeke
Thanks for the great reply Dynamic - you've given me a lot to work with. After doing a little testing on my own I think my current idea is a two step approach:

1) Correlate the given "noise" sample of audio with the "recording" audio. If the correlation at any position results in a value above a certain threshold that would indicate a match I will subtract out the "noise" from the "recording". This will work well at removing a constant sort of noise such as a bass drum kick from a recording.

2) Do frequency analysis on the "noise" and work (in various ways) to attenuate matching frequency components in the "recording". I haven't gotten to this yet, but I plan to work on it this weekend a good bit. It seems this would work well for removing noise such as tape hiss.

One question that greatly concerns me because I am a dabbler and not a scientist: I know a bit about dithering but from what I've heard (and seen in dithered audio waveforms) I'm a little afraid to work with it because of the possibility of hearing damage. Can someone please correct me if I have this wrong: When dithering you're introducing random noise into an audio recording. This random noise is basically just white noise. Prolonged exposure to white noise can damage your hearing even at low volumes. So why wouldn't listening to noise-shaped/dithered audio recording also damage your hearing?

If anyone has a definitive answer on this I would greatly appreciate it. Thanks.

Zeke
Dynamic
There is dither in all CDs. The volume is not just low, it's almost certainly below the threshold of human hearing, even with flat dither (which should be just fine for your application).

Although flat dither is 96 dB below full scale, that's over the whole bandwidth. It more like -120 dB per frequency bin below full scale if you assume about 512 frequency bins (1024 sample FFT) dividing up the power spectrum represents the cochlea reasonably well. In crude terms, that's roughly a 120 dB ratio (a million millions in terms of power, or a million-fold in terms of amplitude, such as voltage or mechanical displacement) between the dither noise's excitation of an individual cochlear nerve cell and that of a full-scale sinusoidal wave.

120 dB is typically taken to be the ratio between the pain threshold at the ear's peak responsivity and the threshold of hearing. The professional-grade Makita chain saw I was using earlier indicates 113 dB user exposure, though this isn't a pure sinusoid, but I dare say that few people would listen to music whose peaks sound as loud as a chainsaw at full throttle when being used without ear defenders, let alone at over 4 times that power!

I'm sure the rule of thumb about prolonged exposure to white noise at low levels must be an imprecise rule of thumb applied to industrial safety. Only scientists consider inaudible stochastic fluctuations to be noise. The layperson calls a readily audible hissing to be noise. A low level is probably one that's not annoying, which would surely be heck of a lot louder than dither. Doubtless there are guidelines for long term human exposure to white noise (perhaps the W.H.O.), but surely it's far in excess of dither (probably thousands to millions of times the power).
2Bdecided
Amazingly, an entire book about audio restoration from the key people involved with CEDAR is now available on-line...

http://www-sigproc.eng.cam.ac.uk/%7Esjg/springer/index.html

You can also search on one or both of the author's names to find other papers in the field. Great stuff. Quite scary how little we mere mortals know!

Cheers,
David.
Zeke
QUOTE

There is dither in all CDs. The volume is not just low, it's almost certainly below the threshold of human hearing, even with flat dither (which should be just fine for your application).


Right, that is what I thought - when applying dither to 16 bit audio or higher the amount of noise added is going to essentially be 1/65536 or the maximum peak volume which seems very safe.

However, I was once working with an answering machine that allowed 8 bit audio files to be "uploaded" to it. Dither can certainly help the clarity of 8-bit digital audio after quantizing from a 16 bit (or higher) source. In many attempts at my own dither algorithms and examining others I came across Naoki Shibata's SSRC which contains a dither algorithm that greatly rivals that of Cool Edit or Audacity. However, the thing that really turned me off of this project was the following quote from Shibata's "readme" file:

QUOTE

Dithered 8bit files contain strong supersonic, and listening to these files for long hours may damage your hearing.


...I hadn't even thought about it before but it makes sense. The noise from "dither" would be on the order of 1/256. Which should be detectable by the human ear. So, this left me a bit confused:

First, I created a 16bit/44.1KHz sine wave with Cool Edit. Then I applied fade-out to it where it slowly goes from nearly full amplitude to silence. Then I converted this to an 8 bit wave file without applying any dither. When listening to the 8bit wave file as it fades out you'll notice a good amount of extra "noise" due to the quantization (16bit -> 8bit). However, after applying the SSRC dither the quantization noise is greatly reduced. The confusion this left me with was: "Where does the hearing damage come in?"

Could hearing damage come from prolonged listening to the audio file after it has been converted from 16bit to 8bit without dithering? After all, there is now audible noise in the recording due to this conversion.

Or, does the hearing damage come from the prolonged listening to the audio file after the 8bit version has been dithered? After dithering "noise" is not very audible however if you look at the waveform it no longer has a smooth shape that a sine wave should have - it's very rough. Could hearing damage occur because of that?
cabbagerat
QUOTE(Zeke @ Feb 19 2007, 07:45) *

Could hearing damage come from prolonged listening to the audio file after it has been converted from 16bit to 8bit without dithering? After all, there is now audible noise in the recording due to this conversion.

Or, does the hearing damage come from the prolonged listening to the audio file after the 8bit version has been dithered? After dithering "noise" is not very audible however if you look at the waveform it no longer has a smooth shape that a sine wave should have - it's very rough. Could hearing damage occur because of that?
I think you misunderstand the issue here. Simply, dithering moves quantization noise out of the audible band, into the part of the spectrum above what we can hear. For low resolution files (like 8 bit), the amount of inaudible high frequency noise could be quite large. With a answering machine speaker (or telephone, whether the bandwidth is limited), this isn't a problem, because the supersonics aren't reproduced. With a good speaker, though, you could get levels of inaudible noise which potentially could cause hearing damage.

Not being an expert on the area of hearing loss, I can not comment on whether this could actually happen, but the high frequency, high amplitude signals will certainly be present.
Jonah
QUOTE(Zeke @ Feb 16 2007, 05:08) *

I'm by no means an expert here either, but I like working on these DSP algorithms. I'm kind of the DSP equivalent of a guy who monkeys with engines (and can fix them) but has not had a lot of formal training as a mechanic and would never refer to himself as a mechanic...

That's me too! (with engines as well)

I had the same idea as you a few years ago and wrote my own noise reduction program. I started out with the "spectral subtraction" method and then tried various refinements to improve the sound quality. I ended up using some simple psychoacoustic masking models, and imho it sounds pretty good (although I'm probably biased!)...

I haven't done any work on it for a couple of years now but I'm wondering whether to release the code into the public domain and let other people refine it. It's written in C but my programming skills aren't up to much so the user interface is very basic (command line). Would anyone be interested??

Dynamic, since you've obviously looked into this in some detail, I'd be really interested in your opinion on whether my program sounds any good compared to other algorithms you've tried. Would you be willing to do this and if so, could you upload a noisy clip (and a sample of the noise on its own) for me to try?

Thanks,
J.

Dynamic
With strong ATH noise shaped dither at 44.1 kSa/s you may well get peak-to-peak amplitudes of around 31 x LSB (from my experience with foobar2000 some time ago) compared to 1 or so with flat dither, but the perceived loudness might be 15 to 18 dB lower than with flat dither (if you make it loud enough to be perceived, such as by using 8-bit, or 10-bit for example).

In an 8-bit file, the peak amplitude might by around 32/256 = 1/8 of full scale, and most of the amplitude is spread around the higher frequencies. I'd imagine it's worth a note of caution that damage may be done even if you don't know for sure, particularly to avoid a lawsuit!

And, of course, if you have open source code where people can modify the noise shaping for themselves (as I've read that SSRC permits) they could possibly do damage even if your algorithm can't.

Anyhow, back to noise removal algorithms, I really can't imagine many (if any) circumstances when you'd try to remove noise then only save as 8-bit audio. With flat dither at 16-bits or more, excessive amplitude is surely unlikely be a problem at any frequency even if it's somehow conceivable with either 8-bits or with very strongly noise-shaped dither at 16-bits.
Dynamic
QUOTE(Jonah @ Feb 20 2007, 00:43) *

Dynamic, since you've obviously looked into this in some detail, I'd be really interested in your opinion on whether my program sounds any good compared to other algorithms you've tried. Would you be willing to do this and if so, could you upload a noisy clip (and a sample of the noise on its own) for me to try?


Your program sounds really interesting, Jonah. I no longer have the samples I had for my old thread (as DickD) regarding blind testing of NR algorithms, but with a little time I could re-capture them if I dig out the CD I ripped it from and I know I have a Dolby-B cassette album with a good long silence at the end of side 2, which I probably used before (and in fact it appears that I still have a lossless file of about 28 seconds of noise from that).

EDIT: I've just checked the old files I thought were the plain noise, and they were from the start of side 2, which only has about 3 to 4 seconds of clean noise before the music starts. I'll need to hook up my old Walkman WM-36 with Dolby B and record the noise afresh.
Jonah
Sounds good - if you could mix your noise sample with a music clip of your choice and upload it, plus a few seconds of the noise on its own (which my program needs to get a noise profile), then I'll post up the processed result.
bug80
Recently I made a noise reduction algorithm in Matlab based on this article:

Lorber, M., Hoeldrich, R. (1997), "A combined approach for broadband noise reduction", Proc. IEEE Workshop on Audio and Acoustics, Oct. 1997

Here is the link, although I assume you can only download the PDF if you are a member, or browsing from a University network.

It is based on nonlinear spectral subtraction and it works quite well. The article also mentions an auditory masking threshold, but I did not implement that (yet).
Jonah
I've just dug out my NR program again so I thought I'd resurrect this thread. I've uploaded a sample before and after processing - see upload thread here:
http://www.hydrogenaudio.org/forums/index....showtopic=54793

Would be interested in comments on sound quality, noise reduction effectiveness, etc.
How does it compare with commercial or, dare I ask, professional NR algorithms??

If anyone else has samples they'd like me to try, I'd be happy to run them through my program. It works best with constant level broadband noise such as tape hiss.

Any feedback appreciated! smile.gif
Dynamic
QUOTE(Jonah @ May 10 2007, 23:42) *

Would be interested in comments on sound quality, noise reduction effectiveness, etc.
How does it compare with commercial or, dare I ask, professional NR algorithms??


It sounds remarkably good to me, including preservation of transients and lack of "musical noise" (the 1960's sci-fi computer sounds common to heavy NR). I thought I might have heard some reduction in the stereo space around the noises behind my head (ambient reflections of the guitar-picking clicks etc, I guess), so I thought I'd better try to ABX...

Although I'm tired and not good at ABX testing, I tried mix pasting the tape hiss 6 times onto the noise-reduced version, cropping to the same length then listening to the original and NR+hiss versions. Unfortunately, I'm on the wrong PC and can't get the volume extra loud, but I might try again on my other PC when I'm not tired. I got my first two picks correct, then one wrong, one right, then another wrong, giving 3/5 and a 50% probability that I'm guessing then I gave up for now.

My other problem may be that I'm listening to the wrong sections, so don't take my result as a ringing endorsement, but I'm certainly fairly impressed so far.

I would be interested in supplying independent samples of tape hiss from the same tape - one for analysis and one for addition to a clean CD recording (as I tried a few years ago when I was DickD on these boards) to try ABXing the NR against the clean CD, but I'm living in a building site so that may be some time.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.