declip.exe win32 (CuteStudio Ltd)

Topic: declip.exe win32 (CuteStudio Ltd) (Read 13276 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

declip.exe win32 (CuteStudio Ltd)

2007-07-28 13:15:54

Compiled with VS 2005. Not much tested.

declip.exe win32 (CuteStudio Ltd)

Reply #1 – 2007-08-08 09:33:27

Hi People,

Thanks for uploading, please always distribute any binaries with source, part of the GPL licensing terms to make sure all who receive it can also tinker, read and modify with it. The main CD declipping documentation, explanations, source and binaries for Windows, Max OS/X and Linux (static build) are here:

Declip 2.04i source + binaries

The discussion of CD clipping and how it looks in the waveform is here:

CD Clipping information - loudness war news which in turn links to the 'charts' or halls of fame and shame - casualties and heros of the CD loudness war, and there is also related loudness war information here: Death of the CD, CDs are not hi-fi

Again, feel free to copy this program+source and incorporate it and play with it, it uses the GPL (Gnu Public License) GPL that also encourages you to do that. This is how it repaired one track I have - in this case it copied clear channel information until only the both-channels-clipped section was left, and then is went looking left and right and stitched in good sections from there. So it is reconstructed and looks good, sounds fine, but of course is always a guess at to what was really there before it was lost to clipping.

Please let me know if you noticed an improvement in the sound of your CD tracks - I'd love to know that people are getting results!!

Graham

declip.exe win32 (CuteStudio Ltd)

Reply #2 – 2007-08-09 01:17:45

A number of things in your Loudness Wars article are inaccurate, and I quote a few below. If you were trying to sell something I'd be disinclined to believe you.

I wish you well in your attempts to reconstruct what has been discarded by overcompression and clipping, though.

Quote

...to determine the waveform shape at 22.05kHz - this is a 'hard limit', there are only 2 sample points to determine this note, and of course the phase is fixed to the digital sample clock. At 5kHz you get about 8 points per cycle, it's pretty approximate stuff.

This is why analog can sound better, and why pro-audio generally runs at 96kHz, the waveforms are much more accurate.

It will be analogue by the time you hear it, and with a proper reconstruction filter it will be just as accurate as the 96 kHz waveform (even showing up nanosecond-scale phase shifts) as can be mathematically proven. This is true only for frequencies below the Nyquist limit, which is essentially all of human-perceptible sound, especially in the context of music rather than isolated test tones.

Quote

The CD dynamic range covers around 96dB, but the digital scale is linear - and the ear hears logarithmically. At low sound levels each numerical level of digital output is significant.

Pro-audio uses 24bits, to fit the tune onto a 16bit CD, you lose the least significant 8bits (256 discrete detail levels), which causes a grainy sound.

Dynamic range is different to Signal-to-Noise Ratio, and perceived dynamic range for 44.1 kHz material is approximated rather well by a 1024-point FFT with suitable windowing (e.g. Blackman). A post of mine (username DickD at the time) from some time ago shows this graphically and gives enough info to create the same test signals yourself to amplify and hear the sine wave or noise modulation that is well over 96 dB below full-scale. The post uses flat dither then noise-shaped dither (the type that gives around 15 dB extra perceived dynamic range) to demonstrate it.

What's particularly objectionable is that I know of no blind-tested evidence of 'grainy sound', whatever that is exactly, at normal listening levels from 16-bit CDs mastered with proper respect for Nyquist and flat or shaped dither. Undithered (rounded or truncated) test signals could appear grainy but what kind of mastering DAW would do that since the early 80s? If you know of any blind-tested evidence, please point us to it.

Quote

Mastering time. Uh ohh.. Whop!, there goes 8 bits. Hack,fizzle, there goes between 8kHz and 52kHz of sampling accuracy.

I sort of get the 8-bits being lost as a crude analogy, though in reality it doesn't add the same hiss as 8-bit PCM would provide even with really disgusting mastering practices. The loss of sampling accuracy in downsampling to 44.1 kHz should be completely inaudible, so although it is indeed lost, knowing that I have a reconstruction filter and possibly an oversampling DAC to improve the reconstruction further, I'm quite unperturbed by the resampling alone.

I agree with most of your points and much of the rest of the technical details (odd harmonics etc.) and how bad mastering is affecting CD sales. It certainly affects me, and on iTunes too I've chosen not to buy because of bad CD mastering. I played a sample of the only release of Dark Side Of The Moon on iTunes UK (Money, available in 256 kbps unprotected AAC thanks to EMI) and was simply disappointed by the thud of the percussion which should have had so much more punch. The most recent remastered CD version was the only one available, so I didn't buy it. I'm with you on most points, and I'm right behind your cause.

This point however, has some dubious claims... (emphasis mine)

Quote

People share MP3s rather than WAV CD tracks. They do this because MP3s are smaller, it's a convenience thing. The IPod is doing well with MP3s too. MP3s however have lost the low level detail of the original CD, and have compressed dynamics. So guess what happens when you create each new CD where the CD is so poorly mastered? Yes - people can't tell the difference between the MP3 and the CD.

MP3 encoders do not perform dynamic compression. I know of none that do this. Only decoders running compressor DSPs will do this. Even the tiny amount of effective compression caused by clipping in the decoder (if you don't use Replaygain peak information to scale it down) can't seriously be called compressed dynamics.

Most music can be rendered to an MP3 that is impossible to ABX against the original. LAME -V2 is quite enough for the vast majority of music. Only a few killer samples (and special areas like harpsichord music) are readily spotted under double-blind conditions (often for pre-echo). OK, many lay people use only moderately good encoding methods for MP3, and their deficiencies may pale in comparison to the poor mastering of the CD.

Aside from these misleading assertions, I applaud your effort and look forward to trying out a few samples.

Regards,

Dynamic

declip.exe win32 (CuteStudio Ltd)

Reply #3 – 2007-08-09 09:32:05

Thanks Dynamic, it's very useful to have some feedback on this article as it is quire near 'written once' status.

Nyquist
This may be down to my ignorance, but this problem still puzzles me:
22.05kHz, assuming we have a perfect clock and perfect sinusoidal waveform, how do we represent a (for instance) 1024 level waveform? It peaks at +512 and -512, which is fine.

But what if that signal is really a slightly phase shifted one (30 degrees perhaps) which IIRC was originally at +1024 and -1024 peaks. In the digital domain it may look like a perfect 22.05kHz signal but who is to say it is not a different amplitude at a slightly different phase?

Going from 96kHz to 44.1 (eg) has to deal with this, integrating may simply create an average amplitude and phase which will still not be the same as the original.

Grainy sound
The Dynamic range is extremely bad wording on my part, what I'm trying to say here is that the linear and log scale conversion causes problems with the 16bit CDs for quiet passages (not an issue with modern pop!).

For instance, if you have a trailing cymbal sound that goes to -50dB below the normal music level then that sound will use 0.003162 of the DAC scale. If the RMS loudness of the music was -6dB below full scale (0dB) anyway (i.e. the CD is not clipped) then the actual 16bit level will be -56dB or 0.001585 of 65536 levels = 103.87456 levels. This corresponds to slightly less than roughly to 7bits of resolution, in the 24bit world you'd still have almost 15bits at this level.

Maybe grainy is the wrong word - but you run out of bits very fast on quiet areas, I know dither can help in the midrange but 16bits is still rather tight compared to analog. Maybe it's just the vinyl hiss but for me vinyl always sounds 'softer' or 'sweeter' on very quiet parts than the CD.

Down-converting
I agree that the music can still be respectable here, certainly CDs made by Hi-Fi manufacturers seem to be pretty good, although I'm not a personal fan of dither as I can't see it doing much to help the high frequencies.

Dark side of the music Industry
I'm disappointed that Dark Side of The Moon has been squashed - if Floyd gets squashed on the new 'high quality' iTunes then that's game over for everything. Most annoying.

MP3 compression[]
MP3s and compression - don't know how I said dynamic compression for MP3, typo/editing artifact - corrected! It probably came from the view that if a CD is compressed and clipped anyway, an MP3 will hardly sound any worse.

Thanks for the comments, I'll amend the page accordingly (still not sure about the phase differenciation of a 22.05kHz tone though )

declip.exe win32 (CuteStudio Ltd)

Reply #4 – 2007-08-09 16:19:08

CuteStudio, this is certainly an interesting project, and I am looking forward to its continued development.

Question: How difficult would it be to compile your source for use as a foobar2000 DSP plugin? This would effectively give foobar2000 the ability to play audio using your declipping algorithms, and process source material for "lossless" (post-declip lossless, that is) archival using the "Convert" settings.

Given the prevalence of foobar2000 enthusiasts, this seems a logical course for encouraging public experimentation.

- M.

declip.exe win32 (CuteStudio Ltd)

Reply #5 – 2007-08-09 16:59:27

I guess it would be wiser to use replaygain values to decide whether declipping processing is needed or not.

declip.exe win32 (CuteStudio Ltd)

Reply #6 – 2007-08-09 19:59:57

Hi M,

Say hi to James for me!
I'm going to look at the foobar2000 project hopefully this weekend to determine if it needs a linked-in plugin, or if it is compiled separately.

Foobar2000 is I understand under the BSD license whereas DeClip is under the GPL, so linking the two may cause Foobar2000 GPLification which in this case is not what we want.

BSD and GPL are both free open source software licenses. BSD is more free in one way - you can use the source and change it, and you do not have to show anyone your improvements.

The GPL extracts a cost - the cost of the free source is that you make your source changes available to help other people. In some ways however the GPL is more beneficial - for instance if my code was BSD, anyone from Microsoft to Meridian could just take it for themselves and use it, which is a little bit of a one way street, whereas with the GPL they would have to make their usage of it either a) available to the public, or b) come to me for a different license.

Anyhow - to actually answer your question I think it would be a good fit and if I can work out a way to do it then I will. I might not do the actual coding though, I seem to have 101 things one currently, and they are not dalmations...

G

declip.exe win32 (CuteStudio Ltd)

Reply #7 – 2007-08-09 20:26:35

foobar2000 is not open source at all.

What is open source (for obvious reasons) is the API to make plugins for it.

So it becomes your gpl program using bsd licensed code, and not a bsd program using your gpl licensed code.
In other words, perfectly legal.

declip.exe win32 (CuteStudio Ltd)

Reply #8 – 2007-08-09 21:44:49

Quote from: CuteStudio on 2007-08-09 19:59:57

Hi M,

Say hi to James for me!

If I can ever get him away from the Baccarat tables, I'll be sure to do so.

As [JAZ] said, it is also my understanding that building a DSP plugin for foobar2000 to utilize your declipping algorithms would harm neither software nor license of either entity.

Thank you!

- M.

declip.exe win32 (CuteStudio Ltd)

Reply #9 – 2007-08-10 12:02:56

Quote from: CuteStudio on 2007-08-09 09:32:05

Nyquist
This may be down to my ignorance, but this problem still puzzles me:
22.05kHz, assuming we have a perfect clock and perfect sinusoidal waveform, how do we represent a (for instance) 1024 level waveform? It peaks at +512 and -512, which is fine.

Or it could peak at 0 and 0, which isn't! Actually, anything below 22.05 kHz can be perfectly reconstructed in theory, but not 22.05 kHz itself. 22.049 kHz for example, would have a number of points near zero for a while, then a number of points near +512 and -512 for a while then back again over about one second. It looks almost like a beat pattern - and a very slow one because we're so close to the Nyquist limit. A sufficiently long sinc-function reconstruction filter would extract the 1024-sample amplitude from this properly, including phase down to accuracy far below the sampling period.

In practice, we don't actually try to represent signals so close to the bleeding edge of the Nyquist limit, so the apparent beat pattern is faster and shorter reconstruction filters are more than adequate.

For CDs, frequencies over 20 kHz are usually filtered out, and in blind listening tests on real music (rather than artificial test signals of extreme loudness that might fry your tweeters), a lowpass removing all frequencies above about 18 or 19 kHz is indistinguishable, so there's ample room for relatively crude reconstruction filters to render transparent analogue sound from CD.

Quote

Going from 96kHz to 44.1 (eg) has to deal with this, integrating may simply create an average amplitude and phase which will still not be the same as the original.

You would filter out frequencies of 22.05 kHz and ever-so-slightly below in the 96 kHz domain as your anti-alias filter prior to downsampling. Then there's no problem.

Quote

Grainy sound
The Dynamic range is extremely bad wording on my part, what I'm trying to say here is that the linear and log scale conversion causes problems with the 16bit CDs for quiet passages (not an issue with modern pop!).

For instance, if you have a trailing cymbal sound that goes to -50dB below the normal music level then that sound will use 0.003162 of the DAC scale. If the RMS loudness of the music was -6dB below full scale (0dB) anyway (i.e. the CD is not clipped) then the actual 16bit level will be -56dB or 0.001585 of 65536 levels = 103.87456 levels. This corresponds to slightly less than roughly to 7bits of resolution, in the 24bit world you'd still have almost 15bits at this level.

Maybe grainy is the wrong word - but you run out of bits very fast on quiet areas, I know dither can help in the midrange but 16bits is still rather tight compared to analog. Maybe it's just the vinyl hiss but for me vinyl always sounds 'softer' or 'sweeter' on very quiet parts than the CD.

Your last comment about vinyl can apply equally to dither (and for now, we'll talk about flat dither - 1 bit, not frequency-shaped dither). With a statistically sufficient amount of dither, there is guaranteed to be no truncation distortion and the signal sinks gracefully below the noise with no harmonic overtones being added.

You can fade from a 4-bit signal to a 0.001-bit signal (which you calculated at higher resolution or floating point before converting to those bits, of course) and it will slide gracefully below the dither noise become imperceptible just as gradually as the signal fading below the much louder noise of vinyl.

This works because the ear doesn't hear the sample-values, but essentially hears the frequency spectrum over roughly 20-40 milliseconds time resolution (something like 1024 sample FFT at 44100 Hz is a good approximation). It also perceives much smaller inter-aural timing differences for stereo localization, and the pinnae of the ear cause frequency-filtering (EQ) dependent on the height of the sound source.

Over 1024 samples duration, a signal frequency, even if much smaller than 1 bit, causes small perturbations be added to the dither noise (which is calculated at sub-bit levels then rounded to bit levels) and this causes the frequency spectrum to accurately show the amplitude of the sub-bit signal. Also, you'll notice that over 1024-bit averaging of the FFT, the flat dither noise attributable to each frequency-bin amounts to about -120 dB, but if you add the power (convert out of dB to add) over all the bins (the convert back to dB) you come to -93 dB (see this post for the calculation and graphs)

Of course vinyl has not only constant white noise but crackles - a sort of impulsive white-spectrum noise - which does differ from CD's constant dither. We might like that sound, like I like the sound of a log fire crackling, and it may disguise the noise, but properly flat-dithered 16-bit PCM should not sound grainy or un-natural any more than a top grade low noise studio analogue tape reel would sound grainy.

Quote

Down-converting
I agree that the music can still be respectable here, certainly CDs made by Hi-Fi manufacturers seem to be pretty good, although I'm not a personal fan of dither as I can't see it doing much to help the high frequencies.

Dither is different from noise shaping. Flat dither makes all frequencies behave the same. In fact, for normal listening volumes, it's inaudible at 16-bit unless you crank the volume up dangerously to hear some details in the fade-out of a track.

The confusion arises because noise shaping is often applied at the same time as dithering to reduce the noise floor at the most audible frequencies for humans below the -120 dB of flat dither, perhaps by around 15 dB in the 1-4 kHz range. Providing the dither remains sufficient, statistically speaking, this causes no truncation distortion and improves perceived sound. The extra noise at ultra-high frequencies can easily be kept way below the absolute threshold of hearing.

Quote

Dark side of the music Industry
I'm disappointed that Dark Side of The Moon has been squashed - if Floyd gets squashed on the new 'high quality' iTunes then that's game over for everything. Most annoying.

Yup, the CD remaster was squashed on the CD, so a perfectly transparent AAC encoding of it is going to represent those squashed dynamics perfectly. If iTunes had encoded the earlier CD release it would have been un-squashed, as would the properly mastered SACD side of the dual-disc version. It's not the data-compression that caused the loss of dynamics, but the dynamic-compression when mastering the CD.

DSoTM reissue isn't the worst example of dynamic compression but it's still a retrograde step and another thing that puts me off anything that says "Digitally Remastered" with an issue date after about 1996.

Quote

Thanks for the comments, I'll amend the page accordingly (still not sure about the phase differenciation of a 22.05kHz tone though )

No, you can't differentiate 22.05 kHz tones. Actually dead on the Nyquist limit is not possible, but 22.04999 kHz can be done, given a long-enough reconstruction filter (order-of-magnitude calculation, to get 0.01 Hz below the Nyquist limit reconstructed, you'd need about a 100-second long = 44,100,000 sample filter length). To actually correctly down-sample a 22.04999 kHz sinusoid at 44.1 kSa/s, you'd have to have a brickwall anti-alias filter of sufficient length to have a 0.01 Hz stop band too! In reality nobody goes that close to the Nyquist limit for audio, but it's theoretically possible. You can get infinitely close to Nyquist, but you can never reach it.

A good illustration of reconstruction filters is available in even old copies of Cool Edit like shareware CoolEdit96 (Cool Edit Pro then became Adobe Audition). If you generate a 22.0 kHz tone, you can zoom into a few samples across the whole screen and see the sample values and the reconstructed waveform that passes through those points, complete with peaks above and below the amplitude of the sample points. Equally, you could upsample to an incredibly high sample rate, apply a tiny time-shift, much less than 1/44100 second to left or right channel and see the reconstructed 22.0 kHz wave is correctly time-shifted.

The rules of digital audio:

Nyquist theorem - filter out all frequencies above Fs/2 before sampling and you avoid aliasing.

Dither - apply sufficient flat dither (or sufficient white noise to your analogue signal) instead of rounding or truncation and you get no low-level digital distortions when you quantize to however many bits you choose.

Reconstruction filter (Nyquist part 2) - after you convert from digital to analogue for output you have a staircase waveform (containing frequencies above the Nyquist limit of your true audio), so you must filter out Fs/2 and all frequencies above it to reconstruct the true analogue signal. This is normally integrated into the DAC, so it's nothing for consumers to worry about. Sometimes, to get a brick-wall filter, the digital audio is upsampled with a digital brick-wall filter before the DAC which then runs at a higher sampling rate, so it's only necessary to have a very simple analogue lowpass filter to remove content above that higher Nyquist limit, and far above audible frequencies.

People have been known to bend the rules, mainly in applying insufficient dither or none at all. That can cause truncation distortions which could well be a cause of graininess or "digititis" in early CD releases.

Notice