halb27
Sep 19 2007, 12:25
Sorry, I abxed Atem-lied 9/10 using v0.1.6 -s.
BTW this does not mean this version is worse than the last one. I use my very open Alessandro MS-2 when my wife is not around, otherwise I use my canal phones ultimate ears super.fi 5 pro. They are both very good but sound very differently. But even with same hedphones I'm sure there are differences for me from day to day in being able to hear such subtle problems.
Just for comparison and to exclude the possibility that something may be slightly wrong with the Delphi implementation: is there a chance to get a analogue MATLAB generated version of Atem-lied?
Nick.C
Sep 19 2007, 13:28
QUOTE(halb27 @ Sep 19 2007, 19:25)

Sorry, I abxed Atem-lied 9/10 using v0.1.6 -s.
BTW this does not mean this version is worse than the last one. I use my very open Alessandro MS-2 when my wife is not around, otherwise I use my canal phones ultimate ears super.fi 5 pro. They are both very good but sound very differently. But even with same hedphones I'm sure there are differences for me from day to day in being able to hear such subtle problems.
Just for comparison and to exclude the possibility that something may be slightly wrong with the Delphi implementation: is there a chance to get a analogue MATLAB generated version of Atem-lied?
The Matlab script has not been keeping pace with the Delphi in this instance. I will attempt to incorporate the skewing function into Matlab and post the result.
Currently trying to include a mersenne-twister random number generator instead of the delphi standard version - just to see if it makes a difference.
In terms of input / output file naming, the drive / directory name (if any) in the input filename will now be stripped and the output file will default to the current directory, unless the -o parameter is used to indicate an alternative output directory. I am thinking of changing the -o parameter to require the whole output filepath / filename to be specified.
I will post 0.1.6b soon which will not reduce the noise_threshold_shift at all when the skewing is switched on. However, this may result in bitrate bloat of the resultant FLAC file, as David suggested.
Or, are we trying to redesign the whole method around one problem sample? I don't know which way to go right now. Have you tried Atem_lied at quality -1? If so, is it better and if better can you ABX it?
halb27
Sep 19 2007, 14:24
I also see it as an option to ignore Atem-lied some day especially as the problem is extremely subtle (to me).
At the moment however there may be a small chance that the problem is due to a Delphi implementation error and we shouldn't give away the chance to find it.
Is the method of the current Delphi version when not using skewing identical to that of the Matlab script? Then it makes sense to try the Matlab version.
I hope I have fixed the wavIO problem and will test it tomorrow (I'm too tired now). I'll also try quality -1 tomorrow.
Nick.C
Sep 19 2007, 15:23
QUOTE(halb27 @ Sep 19 2007, 21:24)

I also see it as an option to ignore Atem-lied some day especially as the problem is extremely subtle (to me).
At the moment however there may be a small chance that the problem is due to a Delphi implementation error and we shouldn't give away the chance to find it.
Is the method of the current Delphi version when not using skewing identical to that of the Matlab script? Then it makes sense to try the Matlab version.
I hope I have fixed the wavIO problem and will test it tomorrow (I'm too tired now). I'll also try quality -1 tomorrow.
I have implemented the debug mode to allow the bits_to_remove for each codec_block to be examined on a block-by-block basis, i.e. to check if the Matlab and Delphi output is the same. As the average bits to remove for the Matlab version seem to be higher than the Delphi version (see comparison txt files above) I feel that the Delphi version is *slightly* more conservative than the Matlab version - why, I don't quite know - but I sense another debug session tomorrow night.........
I agree that we shouldn't ignore the possibility that there's an error in implementation!
Attached the bits_to_remove data, block-by-block for atem_lied, no skewing, 3 analyses (10,8,6 bit), triangular dither, noise_threshold_shift=-3. As can be seen, mainly the same, only a few differences.
I will go through the maths regarding the determination of the sub-blocks for analysis again tomorrow and see if the result improves.
One problem I am having is re-creating the noise analysis for creating the reference_threshold and threshold_index values - currently I am using the pre-processed constants to re-create the surface (fft bits x bits_to_remove), accurate to <0.2dB.
halb27
Sep 20 2007, 00:58
Thanks for the tables.
Judging from that I don't think the Atem-lied problem is a problem of the Delphi implementation.
In the critical region ~ 6 bits are removed (with both the Delphi and the Matlab version), and I think this is really not appropriate in this situation.
So I think we have two choices:
1) try more variants, for instance averaging over 3 pins instead of 4.
2) if things don't essentially improve accept that Atem-lied is a (very minor) problem sample we get fully transparent only with best quality setting (will try it tonight if the current best quality setting does it).
In this case however more listening experience than just mine is most welcome (not only true in this case).
Nick.C
Sep 20 2007, 01:24
QUOTE(halb27 @ Sep 20 2007, 07:58)

Thanks for the tables.
Judging from that I don't think the Atem-lied problem is a problem of the Delphi implementation.
In the critical region ~ 6 bits are removed (with both the Delphi and the Matlab version), and I think this is really not appropriate in this situation.
So I think we have two choices:
1) try more variants, for instance averaging over 3 pins instead of 4.
2) if things don't essentially improve accept that Atem-lied is a (very minor) problem sample we get fully transparent only with best quality setting (will try it tonight if the current best quality setting does it).
In this case however more listening experience than just mine is most welcome (not only true in this case).
Last night, I got two out of three of the noise analysis calculations to give output which agrees with the matlab output - unfortunately they were the no dither and rectangular dither calculations - triangular dither still gives about 1.5dB less than it should, it's as if the whole surface has been shifted down by that amount.
I will add a "-a" parameter to allow the spreading function length to be reduced from 4 to 3.
2Bdecided
Sep 20 2007, 05:02
If you want to tackle atem_lied (or any other sample) I propose a very simple procedure...
Use the default original code, and change the noise threshold only. ABX these various different versions, and find out where the problem is solved.
I suggest obscuring the actual noise threshold shift from the listener - so if you're going to pass files to halb27, randomise the order and re-name them A, B, C etc. You can losslessly re-encode to FLAC at different (random) block sizes to hide the real bitrate too, since block size impact efficiency - more so when it doesn't match that used by the pre-processor, which should be left at default.
As a result of this ABXing, you know (by checking) how many bits can be removed before a problem appears.
Then you can mess around with any options you want, and look at the number of bits removed. If it's more than the known good figure, it probably doesn't solve the problem. You can do all this playing without constant ABX testing. When you have something that seems to work numerically, then ABX to make sure that the bits are actually being removed at the correct time in the file.
Hope this suggestion makes sense.
Cheers,
David.
Nick.C
Sep 20 2007, 05:17
David,
Thanks for the valued input - you make it sound so easy! (well it is, but we hadn't come up with it

, and you came up with the original script......)
I will try this out and post, say 3 samples tonight.
My other main debug is to make *very* sure that I'm using the correct reference_threshold values for triangular dither - the fact that no dither and rectangular dither calculate correctly has me a bit worried about the triangular dither calculation in Delphi and Matlab until I can get them to match up.
It would be *really* nice if I could get the Delphi to spit out exactly what the Matlab script produces in terms of WAV output (no dither, as you said previously). My take on the sub-blocks for calculation and end overlap may not be exactly the same as yours, so I'll see what the differences are - I re-invented the wheel a bit on that element of the coding.
Could I be having problems with the Delphi Random Number Generator? I have tried (a little bit) to find another that I could just plug in to the delphi code, but I haven't found a suitable candidate (yet......).
[edit] It was the
<censored> random number generator - at least indirectly. In a discussion in another thread it was intimated that one way to carry out triangular dither is to generate one random number per cycle, and subtract the previous random number from the new one. This was the basis upon which my triangular dither *was* coded. Unfortunately, this (in Delphi at least) does not give the same result as in the Matlab code. By switching to generating two random numbers per cycle and subtracting one from the other the problem of not matching the Matlab calculated values is solved!
Thinking it through, if there was a problem with the triangular dither in my noise_analysis code, there must also be in the bit-reducing code - therefore both have been "fixed". I will post v0.1.7 this evening. [/edit]
Nick.
bryant
Sep 20 2007, 06:44
Something that struck me in looking at the spectrum of Atem_lied is that the lowest bins through the critical section are around 200 Hz, which I don't remember as that common. At those frequencies, averaging over 4 bins might get too much from higher frequencies because each bin covers such a wide range down there. Perhaps averaging fewer bins at lower frequencies might help (a little like the Bark scale).
Nick.C
Sep 20 2007, 07:07
Nice find Bryant! Where might the crossover be say, from 3 to 4 bins or 2 to 3 to 4 bins, in frequency terms? It took me long enough with the latter (i.e. working) version of CONV
2Bdecided
Sep 20 2007, 08:02
QUOTE(Nick.C @ Sep 20 2007, 12:17)

[edit] It was the <censored> random number generator - at least indirectly. In a discussion in another thread it was intimated that one way to carry out triangular dither is to generate one random number per cycle, and subtract the previous random number from the new one.
Don't do that here, at least not yet. It gives you high pass filtered dither noise. There are clear advantages to that, but the threshold calculations assume a flat noise floor.
btw, if you
were using that, and it was working properly, and atem_lied still sounds bad, then you/we have worse problems than we thought - since the noise was already a few dB lower at the critical frequencies for that sample.
Cheers,
David.
Nick.C
Sep 20 2007, 08:23
QUOTE(2Bdecided @ Sep 20 2007, 15:02)

Don't do that here, at least not yet. It gives you high pass filtered dither noise. There are clear advantages to that, but the threshold calculations assume a flat noise floor.
btw, if you were using that, and it was working properly, and atem_lied still sounds bad, then you/we have worse problems than we thought - since the noise was already a few dB lower at the critical frequencies for that sample.
Cheers,
David.
Wouldn't it sort of cancel itself out? But..... I was using the pre-calculated constants from Matlab, i.e. the ones I've just now managed to duplicate, so the constants *were* correct and I was using the filtered triangular dither method, so maybe there is indeed a problem.
As Bryant pointed out there may be merit in changing the CONV routine to average over fewer samples at low frequencies. Maybe institute a mid_frequency_bin for each analysis and average across fewer samples between lfb and mfb then across more samples between mfb and hfb? As seen from one of my previous posts, reducing the number of bins being averages reduced the bits_to_remove value. In the same way, could the number of bins averaged be *increased* above a certain threshold?
2Bdecided
Sep 20 2007, 08:26
QUOTE(bryant @ Sep 20 2007, 13:44)

Something that struck me in looking at the spectrum of Atem_lied is that the lowest bins through the critical section are around 200 Hz, which I don't remember as that common. At those frequencies, averaging over 4 bins might get too much from higher frequencies because each bin covers such a wide range down there. Perhaps averaging fewer bins at lower frequencies might help (a little like the Bark scale).
Thank you David. Can you help me think this through...
You could do psychoacoustically sensible (or at least slightly more like psychoacoustically sensible) spreading - but that's not what I put the spreading there for. Simply, if you FFT something, some of the bins are going to get very little energy just out of coincidence. Move the window by a few samples, and it'll be different bins which get very little energy.
Those minima are pretty much irrelevant. The spreading function is there to smooth them out, otherwise they'll be chosen as the noise floor and the bit rate reduction will be very low. (You could do more FFTs, greatly overlapped, and average in time to achieve a similar thing, but that would really slow things down).
You're right that, at low frequencies, this convolution might be smoothing over important dips. I tested this originally (around 1kHz, not 200Hz) and found it was OK. If you have an extreme dip of about 4 bins or less in width, then it does get
partly filled with noise, but not enough to be audible (to me). The only issue was if it was narrow and short - with contrived signals, you can get something that's (just) audibly overlooked, so I included the optional 5ms FFT to catch that.
However, it seems to me in my experiments with the noise shaping version, that SebG is exactly right (and even basic masking models from decades ago support this): you need a greater SNR at lower frequencies than higher frequencies.
So I don't know why atem_lied is failing - is it because a LF dip is smoothed too much, or is it because LFs simply require a higher SNR? Either a variable length conv, or a low frequency threshold skew, can solve this problem. The question is which is correct (and does it matter)?
Obviously I favour the LF skew because it's much easier to implement! But if it's just a "bodge" then it may leave another problem lurking elsewhere. A correct psychoacoustic-ish spreading function might make it more efficient as well as more careful.
If you really want to go mad, for both the standard version, and the noise shaping version (unreleased), you could replace the current model (which is basically "find the noise floor, keep the noise below it") with a psychoacoustic model from your favourite lossy encoder. I don't know how useful this would be, and I don't recommend it - but it'll be a nice project for someone when everything else is finished.
Cheers,
David.
halb27
Sep 20 2007, 13:06
I tried Atem-lied using v0.1.6 -1 -s and couldn't abx it.
I tried my other samples using default quality and -s and could not abx any of them (trumpet however may be a bit on the edge and I can imagine it can be abxed by someone with better hearing - but that's speculation).
-s seems to have a tendency to bring the number of removed bits down a bit.
I didn't really get what the skewing option does. Can you explain it please?
Now that I use LossyWav from the commandline I see the average number of bits removed. I was quite astonished to see for instance herding_calls having only 0.1150 bits removed on average. With such a sample I would have expected to see more bits removed.
It's not just with herding_calls. From that I think a little bit more bits can be removed on average. But of course in this case there should be some method to cover problems like Atem-lied.
I'd really like to see the behavior when averaging over 3 bins. Maybe it's possible to rise the noise threshold this way compensating for the impact of averaging over 3 instead of 4 bins.
To me David Bryants idea is plausible, and may be a rough approximation is already valuable like averaging over 4 bins in the frequency range beyond 1.5 kHz and 3 bins below that.
IMO it's worth trying.
Added:
I looked up earlier in this thread where I could not abx a MATLAB Atemlied version based on 3 analyses, noise_threshold_shift=-3, triangular_dither.
Where are we with the current Delphi version in terms of these parameters?
Nick.C
Sep 20 2007, 14:44
QUOTE(halb27 @ Sep 20 2007, 20:06)

I didn't really get what the skewing option does. Can you explain it please?
I'd really like to see the behavior when averaging over 3 bins. Maybe it's possible to rise the noise threshold this way compensating for the impact of averaging over 3 instead of 4 bins.
To me David Bryants idea is plausible, and may be a rough approximation is already valuable like averaging over 4 bins in the frequency range beyond 1.5 kHz and 3 bins below that.
IMO it's worth trying.
Added:
I looked up earlier in this thread where I could not abx a MATLAB Atemlied version based on 3 analyses, noise_threshold_shift=-3, triangular_dither.
Where are we with the current Delphi version in terms of these parameters?
Skewing lowers the outputs in bins at the low end of the FFT by up to 6dB, with no reduction at the high frequency bin (16kHz), there is a 1-cos shape to the curve.
New option -t (3 bin average) added.
Files attached - 2 Matlab, 2 lossyWAV. Same analyses, noise threshold shift and dither - 2 are 576 sample blocks and 2 are 1024 sample blocks. Removed - flawed processing.
Also attached - I was playing with the random number generator and tried rectangular, triangular, (triangular + triangular)/2 [Tr2] and Tr3 - results attached as Dither.txt. Tr2 and Tr3 seem to have a gaussian shape to them - is this something which might be of use?
Also, looking at the frequency coverage of each bin at varying fft lengths I started something which may end up being the basis for variable bin number averaging - see Bins.txt
I'm tidying up v0.1.7 just now.
halb27
Sep 20 2007, 15:09
A short intermediate result:
I abxed the w version 6/6 and ended up 9/10.
I abxed the x version 6/6 and ended up 7/10.
I don't think w is easier to abx for me. I'm just tired now (especially of listening to Atem-lied), and I think it's better to continue with the remaining versions tomorrow (may be tomorrow morning before going to work if I have sufficient time).
Nick, I've emailed you the corrected version of the wavIO unit. Should be ok now.
Nick.C
Sep 20 2007, 15:39
QUOTE(halb27 @ Sep 20 2007, 22:09)

A short intermediate result:
I abxed the w version 6/6 and ended up 9/10.
I abxed the x version 6/6 and ended up 7/10.
I don't think w is easier to abx for me. I'm just tired now (especially of listening to Atem-lied), and I think it's better to continue with the remaining versions tomorrow (may be tomorrow morning before going to work if I have sufficient time).
Nick, I've emailed you the corrected version of the wavIO unit. Should be ok now.
Thanks very much for your "ear-time" - it's much appreciated, as is the work put into the wavIO unit!
Attached is v0.1.7: - Superseded b v0.1.8
-t parameter added : sets spreading_function_length to 3 rather than 4.
-s parameter modified : skewing function amended, no longer changes noise_threshold_shift value.
-f parameter added : sets fft_overlap to 1/n * fft_length samples, i.e. 1/4 = 5 analyses in 2 fft_lengths, 1/8 = 9 analyses in 2 fft_lengths.
I will have a think about the mechanism whereby variable spreading_function_length can be applied in the CONV function, using a 3.5kHz transition. Is there any merit in thinking of the frequency range in octaves, i.e. spreading_function_length increases exponentially as frequency increases?
halb27
Sep 21 2007, 00:04
In short my result for atem_lied.lossy.y.flac: 2/2 -> 5/7 -> 6/10, so I couldn't abx it.
I do think it's better than the w and x version, though I don't think it's transparent. May be it was not a good idea to test it this morning as I'm a bit pressed to go to work now.
I'll redo the test this evening together with the z version.
Nick.C
Sep 21 2007, 05:33
Horst's updated wavIO unit incorporated into code;
Small error in the CONV routine (yet again!) fixed;
Skewing now follows a (sin-1) [sin(pi/2*min(hfb-lfb,max(0,this_bin-lfb))/(hfb-lfb))-1] shape rather than (1-cos), now 9dB amplitude rather than 6dB;
Small error in calculation of Average Bits Removed fixed;
Small error in individual fft_analysis result calculation fixed.
[edit] Superseded - v0.2.0 [/edit]
SebastianG
Sep 21 2007, 06:06
Perhaps we should split this thread somehow in one that contains bug reports and announcements of versions (which I'm not interested in) and a technical one where strategies and techniques are discussed.
QUOTE(Nick.C @ Sep 21 2007, 13:33)

Skewing now follows a (sin-1) shape rather than (1-cos), now 9dB rather than 6dB;
Forgive my ignorance but what exactly is "skewing"?
Is there a relation to noise shaping?
What's the current state strategy on selecting the 'wasted_bits' count / noise shaping filters (if any) ?
Cheers!
SG
Nick.C
Sep 21 2007, 06:19
QUOTE(SebastianG @ Sep 21 2007, 13:06)

Perhaps we should split this thread somehow in one that contains bug reports and announcements of versions (which I'm not interested in) and a technical one where strategies and techniques are discussed.
QUOTE(Nick.C @ Sep 21 2007, 13:33)

Skewing now follows a (sin-1) shape rather than (1-cos), now 9dB rather than 6dB;
Forgive my ignorance but what exactly is "skewing"?
Is there a relation to noise shaping?
What's the current state strategy on selecting the 'wasted_bits' count / noise shaping filters (if any) ?
Cheers!
SG
The original thread is still running in the FLAC forum - maybe technical discussion should move to that?
Skewing in this instance artificially lowers the fft bin values, in this case at the lower end of the fft results.
As applied in the code, at the low_frequency_bin the dB reduction is by the full amplitude of the selected reduction amount. At the high_frequency_bin there is no reduction at all. The shape of the dB reduction curve is a scaled (1-sin[value]) curve where value is 0 at the lfb (or lower) and pi/2 at the hfb (or higher). For a 32 sample fft_length, lfb=2, hfb=11:
CODE
Bin Freq. 1-sin dB reduction
00 0 -1.000 -9.031
01 1378 -1.000 -9.031
02 2756 -1.000 -9.031
03 4134 -0.826 -7.463
04 5513 -0.658 -5.942
05 6891 -0.500 -4.515
06 8269 -0.357 -3.226
07 9647 -0.234 -2.113
08 11025 -0.134 -1.210
09 12403 -0.060 -0.545
10 13781 -0.015 -0.137
11 15159 0.000 0.000
12 16538 0.000 0.000
From the discussions previously, maybe the zero-point in this reduction should be the nearest bin to 3.5kHz, and maybe the amplitude of the skew should be more extreme.
The bits_to_remove value for each codec block is the threshold_index corresponding to the dB of the lowest (CONV'd) bin in any of the analyses carried out on that codec block. The threshold index is determined by calculating the dithered bit reduction noise dB for each bit_to_remove for each fft length.
halb27
Sep 21 2007, 09:54
I tried atem_lied.lossy.z.flac and couldn't abx it (5/10).
I also retried atem_lied.lossy.y.flac and didn't arrive at a better result than this morning.
Nick.C
Sep 21 2007, 10:00
QUOTE(halb27 @ Sep 21 2007, 16:54)

I tried atem_lied.lossy.z.flac and couldn't abx it (5/10).
I also retried atem_lied.lossy.y.flac and didn't arrive at a better result than this morning.
The problem I found in lossyWAV today will probably be what caused "w" and "x" to be so poor - "y" and "z" were Matlab versions, the only difference being the codec_block_size (x&y=1024, w&z=576). This leads me to think that the fft_overlap will help this sample - i.e. more analyses of the same length over the same data.
I will re-process and post "w" and "x" tonight - using all the same parameters as before, as well as the same with 9dB skewing (20Hz to 3.7kHz).
guruboolez
Sep 21 2007, 12:37
Does LossyWav
remove some noise?
I ABXed (with pain) sample No.
E12 on 01.00 - 01.50 range encoded with 0.18. Issue: there's more noise... on reference file.
On the other side very low volume sample played at high level gain (sample No.
V03 for example) have more noise (obvious) after LossyWav processing (bitrate is higher too). I guess it's expected.
EDIT: ABX log for E12:
CODE
foo_abx v1.2 report
foobar2000 v0.8.3
2007/09/21 20:26:28
File A: file://C:\150 samples\E12_MODERN_CHAMBER_L_piano_flute.flac
File B: file://C:\150 samples\E12_MODERN_CHAMBER_L_piano_flute.lossy.flac
20:26:28 : Test started.
20:26:50 : 01/01 50.0%
20:26:58 : 01/02 75.0%
20:27:01 : 01/03 87.5%
20:27:04 : 02/04 68.8%
20:27:09 : 03/05 50.0%
20:27:16 : 04/06 34.4%
20:27:20 : 05/07 22.7%
20:27:43 : 06/08 14.5%
20:27:49 : 07/09 9.0%
20:28:07 : 07/10 17.2%
20:28:46 : 08/11 11.3%
20:28:59 : 09/12 7.3%
20:29:07 : 10/13 4.6%
20:29:12 : 11/14 2.9%
20:29:54 : 11/15 5.9%
20:30:38 : 12/16 3.8%
20:30:44 : Test finished.
----------
Total: 12/16 (3.8%)
Nick.C
Sep 21 2007, 13:36
Thanks for the input Guru - where the bits_to_remove is zero, lossyWAV will still dither the samples because there's an automatic anti-clipping amplitude reduction to 95.28% (30.49/32, i.e. 32 -1 (triangular_dither amplitude) -0.5 (normal rounding) -0.01 (Nick.C's error margin)) as the file is processed, so dither is still required.
Maybe a follow on batch file to detect which files have become bigger and annihilate them?
A bit surprised about E12 - and glad that your trained ears are not shuddering at the output from lossyWAV.
From my own processing testing I am getting ever close to the Matlab output in terms of matching bits_to_remove on a block-by-block basis - the latest build has only 6 instances of 1 bit difference between the processors for Atem_lied, on 524 blocks. Oddly enough they cancel each other out as 3 are +1 and 3 are -1. I am going to use the same reference threshold surface on both processors to bottom that out and then can move on, confident that the output of lossyWAV is the same as that of Matlab.
When I processed your 150 samples with and without skewing there was only about 50kB difference over the whole sample set between skewed and non-skewed (i.e. not a lot of minimum bins between 20Hz and 3.7kHz.....)
Dynamic
Sep 21 2007, 14:38
This is an interesting thread that I keep getting back to between being busy and I wish you all well. This sounds promising. It's likely that predictor-based lossy coders (like Wavpack lossy, as I'm sure Bryant is thinking) could use the same sort of analysis for a safe VBR lossy mode with the additional advantage of setting the amount of permissible prediction error to match the noise-floor relevant to that instant regardless of the bit-depth and block length in use.
It's interesting to see how one must look for the true noise floor and ignore the small chance troughs in the power spectrum that tend to vanish if we shift the transform window slightly, as 2Bdecided pointed out, and it's great to see the problem-solving at work, such as Bryant's recognition of the unusually low frequency of the quietest frequency bin during the atem-lied problem moments.
It occurs to me that there might be ways to optimise the computation of multiple overlapping FFTs (or any roughly equivalent transform) to attempt to set the noise floor more accurately without rogue troughs, though I can't get past the fact that one has to pre-multiply the samples in each analysis segment by a windowing function centred on that segment, thus making it difficult to efficiently re-use the results from any part of the analysis in calculating an overlapping FFT without compromising the smoothness of the windowing function, so I guess the averaging solution, while more temporally spread out/needing skewing adjustments to make it unABXable for atem-lied, is the most computationally-viable option.
There's probably little to be gained, but I presume it's rare that anything above, say 18 kHz is the lowest power bin in the power spectrum, but presumably it's pretty safe to ignore any bins above 18 kHz if this analysis should happen to yield a higher noise floor. It should be safe enough given that people have such difficulty ABXing music lowpassed at 18 kHz. And of course LossyFLAC preprocessing wouldn't actually lowpass anything in this scenario, it would just be capable of ignoring the noisefloor in any frequencies above 16, 17 or 18 kHz for example in its calculation of the noisefloor for the whole block, while still passing all frequencies unaltered except for the bit-depth and hence the exact pattern of the noise, which one could ABX as inaudible.
Anyway, loving your work, guys. I'm not averse to pre-scaling (and dithering) my audio with Album Gain, which saves many percent in lossless for anything over-loudly mastered and considering it an excellent source for encoding into lossy (which I tend to do with Album Gain pre-applied or supplied via a --scale switch where convenient in any case), and I'd equally consider a safe lossy mode based on robust noise-floor calculations like this and no other psychoacoustics so be an excellent storage medium for sound reproduction, including heavy EQ, processing and the like, and of course, as a pretty-darned robust source for transcoding to conventional lossy formats or indeed something like resampled-to-32 kHz wavpack lossy.
Nick.C
Sep 21 2007, 15:14
Thanks Dynamic, the appreciation is appreciated!
Attached is v0.2.0 : - Superseded by v0.2.1.
Revised skewing function - skews below 3.7kHz (gleaned from ReplayGain technical data - equal loudness curves) by up to 9dB (0dB at 3.7kHz, -9dB at 20Hz);
Tidied up code, revised quality -1.
As said previously, when testing Guru's sample set there is only a difference of 50kB in 101.12MB between skewed and non-skewed (skewed bigger, as hoped). On my 50 sample set skewing increases the size of the FLAC'ed set by 203kB in 44.65MB.
I haven't created variants of Atem_lied to upload for ABX, but I'm confident now that if any are created they should be pretty good. Late now, will read bug reports tomorrow....
halb27
Sep 21 2007, 16:03
I tried Atem-lied with v0.2.0 using -s, -3 -s -t, and -3 -t, and the results were all transparent to me.
From feeling I'd call -3 -t a tiny bit worse than -s.
Good work, Nick.
If I see it correctly lowering the fft bin values as you do with skewing can be used for seemlessly adjusting the noise threshold.
So maybe something like quality -3, -s, additional lowering the fft bin values, and averaging over 3 bins instead of 4 below say 3.5 kHz may save more bits on average while keeping quality at the level attained by -2 -s.
bryant
Sep 22 2007, 00:33
QUOTE(2Bdecided @ Sep 20 2007, 07:26)

You're right that, at low frequencies, this convolution might be smoothing over important dips. I tested this originally (around 1kHz, not 200Hz) and found it was OK. If you have an extreme dip of about 4 bins or less in width, then it does get partly filled with noise, but not enough to be audible (to me). The only issue was if it was narrow and short - with contrived signals, you can get something that's (just) audibly overlooked, so I included the optional 5ms FFT to catch that.
However, it seems to me in my experiments with the noise shaping version, that SebG is exactly right (and even basic masking models from decades ago support this): you need a greater SNR at lower frequencies than higher frequencies.
Hi David,
It's interesting that these low frequency bands would be an issue here. I'm sure that for conventional codecs it's a non-issue because those bands take so little data to encode accurately it's probably not worth worrying about. But here we're adding white noise, so no frequency gets a free ride...
I haven't gotten too far on my implementation yet, but I was thinking of doing the convolution in both the time and frequency domain, perhaps with a 3x3 kernel, and perhaps not uniformly weighted. This was not based on any experimentation but just because I find it more elegant, although now that I see how wide the bins are at low frequencies I like it even more. I also don't care for the idea of a filter that varies with frequency, but it might be necessary if nothing fixed will work well.
It looks like the LF skew is working well and since most material has a lot of LF energy I can see why it doesn't have a large effect on bitrate for most samples. However, once you start shaping the noise it's going to make a much bigger difference, so it's probably a good idea to get it accurate now.
One thing I really don't care for is the level shift always being on. I think some very low level samples (like Guru's) are only going to be transparent if unmodified. You can easily imagine the case where a sample has over 16 bits effective resolution (at some frequencies) due to noise shaping and this would be destroyed by just about any modification, especially dithering. I don't have a solution, but perhaps there would be a way to have the shift be level dependent? I realize that this would introduce dynamic compression and maybe a little harmonic distortion depending on how it was done, but I suspect both might be okay. Of course, one could argue that very low level samples should be played at very low levels, but I'm not sure everyone will buy this...

Anyway, it's certainly looking very promising at this point.

David
Nick.C
Sep 22 2007, 01:25
@Halb27 : I'm glad that Atem_Lied is better. It's interesting that -s works better than -3 -t, This demonstrates the value of more fft_lengths in the analysis process. From memory, -3 -t -s produces a bigger FLAC file than -s, so this would not be a more attractive option - quicker maybe, but not better.
@Bryant: My conditional clipping reduction in the Matlab script analysed all the blocks, determined all the bits_to_remove and at the same time noted the peak amplitude for each block. The clipping reduction value for the whole file was then calculated taking into account actual bits_to_remove and block peak value for each block and taking the lowest value.
This resulted in much less level reduction (a surprising number of files did not require to be amplitude reduced). However it requires two passes through the blocks - something I was initially unwilling to do, however it's probably not *that* time consuming for the analysis. I will take that on as my next modification to the code.
bryant
Sep 22 2007, 07:01
QUOTE(Nick.C @ Sep 22 2007, 00:25)

@Bryant: My conditional clipping reduction in the Matlab script analysed all the blocks, determined all the bits_to_remove and at the same time noted the peak amplitude for each block. The clipping reduction value for the whole file was then calculated taking into account actual bits_to_remove and block peak value for each block and taking the lowest value.
This resulted in much less level reduction (a surprising number of files did not require to be amplitude reduced). However it requires two passes through the blocks - something I was initially unwilling to do, however it's probably not *that* time consuming for the analysis. I will take that on as my next modification to the code.
Well, if I understand you correctly that means that a single wild sample in a file would alter the way the whole file was processed. That's a little weird too, but it might be a reasonable compromise.
I didn't mean to imply with my original post that I thought this was a critical issue, but it is something that might make a lot of samples possible to ABX under the right circumstances. I'm kind of glad I don't need to deal with it for a WavPack version...

BTW, hats off to you and halb27 for getting this going so quickly!
David
halb27
Sep 22 2007, 11:38
Now that we've reached the point where we can use lossyWav for practical purposes (though a lot of more listening experience is most welcome) I wonder which block size to use.
A blocksize of 576 samples is attractive to use thinking of FLAC performance. However 2Bdecided worried about blocksizes below 1024 samples for the lossyWav procedure.
With a blocksize of 1024 which blocksize should be used with FLAC? If I interpret the FLAC documentation correctly blocksize must be a multiple of 576. Is it wise to use a lossyWav blocksize of 1024 and a FLAC blocksize of 576?
EDIT:
I just tried, and FLAC is working with a blocksize of 1024.
Or should I use a lossyWav and FLAC blocksize of 1152?
Another question: I was quite happy using wavPack lossy with a sample rate of 32 kHz though 32 kHz is a bit too low. I'd like to use a sample frequency of 35 kHz which I can do with my DAP using FLAC.
Can I consider it a safe procedure to a) resample to 35 kHz b) apply lossyWav c) apply FLAC, that is: can I consider the current lossyWav procedure applicable to 35 kHz sampled tracks?
halb27
Sep 22 2007, 14:16
I tried all my usual samples with v0.2.0 using -s and couldn't abx any difference to the original.
As before I have a suspicion that trumpet isn't totally fine (sec. 0.6 ... 2.6). However I am not the one who can abx it (my best approximation towards a difference was 5/7, and I ended up 5/10).
halb27
Sep 23 2007, 01:52
QUOTE(bryant @ Sep 22 2007, 08:33)

... One thing I really don't care for is the level shift always being on. I think some very low level samples (like Guru's) are only going to be transparent if unmodified. ...
May be a 'worth while' consideration on a per block basis may be some help.
In case only say 1 bit is removed in the block (or maybe 2 bits) the block remains untouched, and this can easily be restricted to blocks with an RMS below a certain threshold to address low volume blocks.
In this case the machinery isn't worth while and has a tendency to give a bad SNR, be it only due to dithering.
collector
Sep 23 2007, 03:45
QUOTE(halb27 @ Sep 22 2007, 09:38)

With a blocksize of 1024 which blocksize should be used with FLAC? If I interpret the FLAC documentation correctly blocksize must be a multiple of 576. Is it wise to use a lossyWav blocksize of 1024 and a FLAC blocksize of 576?
Very interesting thread. Flac itself is using 4096 default which isn't a multiple 576 either.
CODE
-b, --blocksize=# Specify the blocksize in samples; the default is
1152 for -l 0, else 4096; must be one of 192,
576, 1152, 2304, 4608, 256, 512, 1024, 2048,
4096 (and 8192 or 16384 if the sample rate is
>48kHz) for Subset streams.
halb27
Sep 23 2007, 16:17
I wanted to see the behavior of lossyWav together with FLAC for a set of 50 full tracks which is typical of the kind of music I usually love to listen to (pop music and singer/songwriter music).
All values given are according to what foobar says when looking at the properties of the 50 selected songs:
Original ape files (extra high mode): 703 kbps
flac (--best -e) files: 744 kbps
lossyWav (-s), followed by flac (--best -e -b 1024): 507 kbps
lossyWav (-s -c 576), followed by flac (--best -e -b 576): 503 kbps
ssrc_hp (--rate 35000 --twopass --dither 0 --bits 16), lossyWav (-s), followed by flac (--best -e -b 1024): 453 kbps.
So with this kind of music staying with a blocksize of 1024 is fine, and the average bitrate is roughly 500 kbps.
Pre-resampling to 35 kHz saves some filesize though it is a bit disappointing (data flow is ~20% lower, but for the file size it's only ~10%). Quality is fine judging from listening without abxing and taking samples from this 50 track set. However abxing problem samples is required which I haven't done so far.
Remarkable was Simon & Garfunkel's short and calm Bookend Theme: the 1024 sample-block lossy.flac version was 1668 KB in size, which is more than the original ape file size (1535 KB) and only slightly less than the lossless FLAC version (1699 KB).
Using debug mode I saw there wasn't any block with bits removed so it was only the dithering which changed the file. I think samples like this are a good argument not to change a block at all when it's not worth while.
The question is of course: when is it worth while, but I think when 0 or 1 bit is removed it is not, independently of volume as measured by RMS. I also think with low-volume blocks it's not worth while (and a bit dangerous) in case 2 bits should be removed.
This per block consideration can be dragged also to a total file consideration: if the average number of bits removed is below a threshold of say 1 bit the lossy.wav output should be identical to the wav input.
Nick.C
Sep 23 2007, 16:34
If I was to implement a two-pass version of lossyWAV then where clipping_reduction=1 (i.e.file left at 100% amplitude) and bits_to_remove=0 then no dither, block output = block input and compression of that block should be identical to the original.
I have been developing a variable spreading_function_length dependent on fft_length, i.e. larger fft_length = larger spreading_function_length, switched on using the -v parameter. The -t parameter is now obsolete.
Enabling variable spreading_function_length also reduces skewing_amplitude from 9dB to 6dB and changes noise_threshold_shift to -1.5.
v0.2.1 attached. Superseded.
bryant
Sep 23 2007, 16:54
QUOTE(Nick.C @ Sep 23 2007, 15:34)

v0.2.1 attached.
Hey Nick,
I've noticed that you have been deleting the previous version every time you upload a new one. I understand why you wouldn't want people to use obsolete versions, but halb27 just did a whole bunch of testing with a specific version (v0.2.0) and I wanted to download it for a reference and it was already gone!
Perhaps you could set up a place where previous versions are archived, or maybe just put a note indicating a version is obsolete without actually deleting the dowload link?
Either that, or I'll write a script to download each one as it appears before you can delete it!

Thanks,
David
Nick.C
Sep 24 2007, 01:49
QUOTE(bryant @ Sep 23 2007, 23:54)

QUOTE(Nick.C @ Sep 23 2007, 15:34)

v0.2.1 attached.
Hey Nick,
I've noticed that you have been deleting the previous version every time you upload a new one. I understand why you wouldn't want people to use obsolete versions, but halb27 just did a whole bunch of testing with a specific version (v0.2.0) and I wanted to download it for a reference and it was already gone!
Perhaps you could set up a place where previous versions are archived, or maybe just put a note indicating a version is obsolete without actually deleting the dowload link?
Either that, or I'll write a script to download each one as it appears before you can delete it!

Thanks,
David
Apologies - I'll upload v0.2.0 tonight and in future merely indicate obsolescence rather than remove the file.
Command line parameters are being re-written at the moment to allow more sensible naming of parameters and inclusion of "-nts" to force noise_threshold_shift to a specific value, among others.
halb27
Sep 24 2007, 01:52
QUOTE(Nick.C @ Sep 24 2007, 00:34)

... If I was to implement a two-pass version of lossyWAV then where clipping_reduction=1 (i.e.file left at 100% amplitude) and bits_to_remove=0 then no dither, block output = block input and compression of that block should be identical to the original. ...
I welcome very much such a two-pass version but keeping block and/or track output = input is independent of two-pass processing.
'bits_to_remove=0 then no dither' is logical but why do you want to restrict it to the 'bits_to_remove=0' case?
It seems obvious to me that adding noise in the 'bits_to_remove=1' case has a bad advantage/disadvantage relation, and this is especially true for low-volume spots where S/N ratio is bad anyway.
Nick.C
Sep 24 2007, 02:00
QUOTE(halb27 @ Sep 24 2007, 08:52)

I welcome very much such a two-pass version but keeping block and/or track output = input is independent of two-pass processing.
'bits_to_remove=0 then no dither' is logical but why do you want to restrict it to the 'bits_to_remove=0' case?
It seems obvious to me that adding noise in the 'bits_to_remove=1' case has a bad advantage/disadvantage relation, and this is especially true for low-volume samples where S/N ratio is bad anyway.
If the amplitude-reduced block is not dithered then there is a strong chance of unwanted noise - all blocks are automatically reduced in amplitude to prevent potential clipping based on minimum_bits_to_keep=5, so they are also automatically dithered.
halb27
Sep 24 2007, 02:09
I see.
Maybe an approach like this can help:
- a priori think of the track not having to be reduced in amplitude and use output block = input block wherever it's not worth while resp. where it's dangerous to apply the lossyWav mechanism.
- as soon as you find amplitude reduction has to be done restart the procedure for the whole track using amplitude reduction.
Would be advantageous especially for those tracks which at the moment seem to be the most critical ones for the lossyWav procedure: tracks with low volume spots in them.
Hello,
QUOTE(Nick.C @ Sep 24 2007, 00:34)

I have been developing a variable spreading_function_length dependent on fft_length, i.e. larger fft_length = larger spreading_function_length, switched on using the -v parameter. The -t parameter is now obsolete.
Sorry to bother you, but... Is it possible for a lazy man like me that don't want to dive into Matlab and technical stuff to have a LossyWav.txt in the archive that
simply (to some extent) explain the parameters ?
I
know that this app is in its early stages and clearly developers & golden-ears-gurus oriented but think about the future documentation of this great tool !
Have a nice day,
AiZ
2Bdecided
Sep 24 2007, 03:34
QUOTE(Nick.C @ Sep 21 2007, 20:36)

Thanks for the input Guru - where the bits_to_remove is zero, lossyWAV will still dither the samples because there's an automatic anti-clipping amplitude reduction to 95.28% (30.49/32, i.e. 32 -1 (triangular_dither amplitude) -0.5 (normal rounding) -0.01 (Nick.C's error margin)) as the file is processed, so dither is still required.
Hang on a second. You shouldn't be changing the gain (even by 0.42dB) if you're getting people to ABX. This is
especially important when there's virtually no other audible different between the files.
With the gain change disabled, you shouldn't dither when no bits are removed. I know it's in my code as an option, but I don't think I enabled it even when I was changing the gain. In theory you should, but in practice I wouldn't.
Hope this helps.
EDIT: Now I've read the rest of the thread...
Remember the gain/declipping is only for efficiency, not sound quality. The "clipping" is only ever by 1LSB, and only happens when lossyFLAC has determined that several LSBs can be removed. In other words, the clipping will only be audible if the lossyFLAC algorithm itself is broken (and if it is, we can all go home anyway). Furthermore, the "clipping" will move the sample value closer to its original value (because it happens when lossyFLAC wants to increase it, but can't).
So I would not "leave the gain adjustment enabled unless there might be sound quality issues". I would "leave the gain adjustment disabled unless there are so many "clipped" samples that it reduces efficiency".
For my personal use, I would disable the lossyFLAC gain adjustment entirely. Instead, I'd run a ReplayGain album analysis, and apply
only the negative ones, before using lossyFLAC. I'm guessing lots of people wouldn't like this idea though.
Cheers,
David.
2Bdecided
Sep 24 2007, 03:55
QUOTE(AiZ @ Sep 24 2007, 09:25)

Sorry to bother you, but... Is it possible for a lazy man like me that don't want to dive into Matlab and technical stuff to have a LossyWav.txt in the archive that
simply (to some extent) explain the parameters ?
I
know that this app is in its early stages and clearly developers & golden-ears-gurus oriented but think about the future documentation of this great tool !
AiZ,
If it was down to me, when it was finished, it wouldn't have enough command-line options to need a manual! Just compact/default/overkill modes.
The problem is figuring out what these should be, hence all the current playing around.
If the "final version" still
needs all these tweaks, then IMO we've failed!
Cheers,
David.
Nick.C
Sep 24 2007, 04:06
As David said, it is his intention to limit user choice to the 3 stated quality levels. During the development phase of the command-line version, I have enabled an increasing number of command-line parameters to allow users to "tweak" settings in search of transparency.
I will attempt to more clearly illustrate what each one does in the command-line reference within lossyWAV.
gaekwad2
Sep 24 2007, 04:49
QUOTE(halb27 @ Sep 24 2007, 00:17)

I wanted to see the behavior of lossyWav together with FLAC for a set of 50 full tracks which is typical of the kind of music I usually love to listen to (pop music and singer/songwriter music).
All values given are according to what foobar says when looking at the properties of the 50 selected songs:
Original ape files (extra high mode): 703 kbps
flac (--best -e) files: 744 kbps
lossyWav (-s), followed by flac (--best -e -b 1024): 507 kbps
lossyWav (-s -c 576), followed by flac (--best -e -b 576): 503 kbps
Perhaps a bit late, but I also ran lossyWav 0.20 (using -3 instead of -s though) and FLAC 1.21 on a selection more or less representative of my cd collection (at least in terms of bitrate when compressed with TAK) and got pretty much the same result. Overall 576 produces slightly smaller files, but for classical or generally highly compressible music 1024 is better.
Hi again,
QUOTE(2Bdecided @ Sep 24 2007, 11:55)

If it was down to me, when it was finished, it wouldn't have enough command-line options to need a manual! Just compact/default/overkill modes.
The problem is figuring out what these should be, hence all the current playing around.
QUOTE(Nick.C @ Sep 24 2007, 12:06)

As David said, it is his intention to limit user choice to the 3 stated quality levels. During the development phase of the command-line version, I have enabled an increasing number of command-line parameters to allow users to "tweak" settings in search of transparency.
I will attempt to more clearly illustrate what each one does in the command-line reference within lossyWAV.
In one hand, it's Ok, I get the point. Sure, only one or two parameters are way better for the final release, no need for a manual.
But on the other hand, if someone who discovers the project today decides to help you, it would be fine for him not to search through this (long) post what all these changing parameters mean ; hence, a little doc up-to-date accompanying the executable would be perfect.
I stop here my off-topic posts and thank you for your dedication in better and clever audio.
AiZ
halb27
Sep 24 2007, 12:11
Atem-lied with v0.2.1 -v -s:
I got at 6/6, but managed to finish it up with 6/10.
Generally speaking I have a tendency to do a bad job with the second half of my guesses.
Looking at the debug results I guess two many blocks have 6 bits removed again in the critical area, and I think that's too much.
Nick.C
Sep 24 2007, 13:11
I found a bug in the -v parameter - it was picking the wrong spreading_function_length. I will post v0.2.2 tonight. For the moment, and by special request : v0.2.0 for Bryant. Definitely superseded now.....
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.