lossyWAV alpha v0.3.4 attached. Superseded by alpha v0.3.5.
new parameter -info to show WAV file rate, channels, bps and length;
code tidy up and speed up.
Have fun!
lossyWAV alpha v0.3.5 attached. Superseded by alpha v0.3.6
new parameter -spread, replaces -vsfl. An experimental take on spreading.
code tidy up and (quite significant) speed up.
Have fun!
QUOTE(guruboolez @ Sep 29 2007, 16:53)

Lossywav is impressive on harpsichord recordings.
...
There were no noise, no artefact, but something hard to define (audiophile would call it "lack of soundstage" or something similar).
Do you mind trying -cbs 1024? 2Bdecided once mentioned that he created the procedure with such a blocksize in mind and was a bit unsure about the outcome of shorter block sizes. Resulting FLAC filesize should be roughly the same according to my experience.
If this isn't sufficient can you please try -nts x as suggested by Nick.C or maybe also -skew y and -spread?
Sorry I can't do it myself as I'm not able to abx your provided samples.
QUOTE(Nick.C @ Oct 5 2007, 10:11)

... -spread, replaces -vsfl. An experimental take on spreading. ...
Sorry, but I'm not sure whether it's a promising procedure to try out different weights in building the average of 3 or 4 bins. My feeling is that in the overall view that's not significant variation and may produce better results in one case and worse in other ones.
I'm still a bit worried about David Bryants comment on the spreading function: that the critical bands have a different width, with corner frequencies according to Bark of 0, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500 Hz.
So to me it's plausible to vary the number of bins over which to build the average not only according to the fft_length as you did already with the previous -vfsl option, but also on the frequency range the corresponding bin belongs to. Taking it into account roughly may be sufficient (for instance averaging over 2 bins in the range up to 1720 Hz, 3 bins in the range up to 3700 Hz and over 4 bins otherwise).
halb27
Oct 10 2007, 03:20
@ Nick.C:
I also feel a bit uncomfortable about the many options. It is not inviting for potential listeners who have a hard job as quality is already very good, and a sufficient amount of listening experience is what is missing most at the moment.
It's all a matter of taste, but I think it will be good to return back more to the essentials of primary quality settings.
As for additional options I thinks it's good to have the dithering and the clipping option (default: no dithering and no anti-clipping strategy).
But other than that my feeling is that everything should go into -1, -2, -3.
Moreover we should concentrate on getting an extremely good quality in the ~500 kbps range. I think current experience is enough to show that achieving significantly lower bitrate while keeping up excellent quality is not possible with the current approach without additions like those proposed by SebastianG.
So I think we should leave the -3 option behind until more details about such an approach are available.
On concentrating on -1 and -2 I think to target at -2 at a level that makes any known sample transparent to any listener, and we should keep -1 details only slightly above these qualitywise. This gives any listener the chance to switch from -2 to -1 in case he has a sample which is not transparent.
As a consequence what was -1 should then become -2 in the next version (or a small promising variant of -1), and a new -1 should be created.
Suggestion for -2: what is -2 right now with no skewing, and a spreading function which does just arithmetic averaging, but with the number of bins participating in averaging depending on bin frequency as described in my last post, and also depending on fft length as you did already. Moreover I'd welcome a blocksize of 1024 instead of 576. No serious disadvantage in resulting bitrate but more secure.
Suggestion for -1: specifics of -1 like in the existing version, other details like with -2 but a tiny bit more demanding, for instance a slightly lowered noise threshold and a small skewing factor (as the first trial - can be increased if necessary).
2Bdecided
Oct 10 2007, 03:34
You must leave noise threshold shift in as a command line option.
Either the frequency skewing or the variable spreading length appear to be needed to make it work properly.
I agree that lots of options are confusing, but I thought they were only in there for testing. There will eventually be no direct user control of any of them, I hope, because someone will figure out the optimal settings and default them.
Cheers,
David.
halb27
Oct 10 2007, 05:17
QUOTE(2Bdecided @ Oct 10 2007, 11:34)

You must leave noise threshold shift in as a command line option.
Sure that is an advantage per se. But on the other hand it breaks the simple division of a simple quality parameter and options targeting at more or less additional features like dithering. Moreover differences in threshold shift are incorported in the difference between -2 and -1.
QUOTE
Either the frequency skewing or the variable spreading length appear to be needed to make it work properly.
This is the case with my suggestion for -1 and -2.
QUOTE
I agree that lots of options are confusing, but I thought they were only in there for testing. There will eventually be no direct user control of any of them, I hope, because someone will figure out the optimal settings and default them.
Sure, but I'm afraid the fact that we don't have a lot of listeners in the testing phase is not only due to the difficulties in abxing samples at the high quality already achieved but to some extent also to the amount of options not everybody knows what they are good for.
Looking at guruboolez (certainly the most welcome tester) it looks like he doesn't want to play around with options.
Sure these things are also related to my personal opinion that varying spreading function by varying weights in the average formula is not worth while. Variable spreading length however is promising IMO.
It's also related to my beleive that a significant saving in bitrate is not possible with the current approach, and I don't care much about whether it's finally 530 kbps or 480 kbps on average. After all we're targeting at a significantly lower bitrate than going lossless, while keeping up transparency to a high degree of security. The latter part is what I care about most, and IMO we should do everything to encourage testers.
shadowking
Oct 10 2007, 05:48
I would like people to feed all their transform problem samples and start testing lossywav. Problem is that hybrids make easy work of most transform problems. It would still be usefull I think even though i don't think we will see a good abx result even for -2 (hopefully).
Nick.C
Oct 10 2007, 06:14
I hear what's being said, but my ears / listening environment are not up to finalising the settings by myself.
The current (unreleased alpha v0.3.6) command line parameter list is as follows:
CODE
lossyWAV alpha v0.3.6 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C & Halb27 from a script, www.hydrogenaudio.org
Usage: lossyWAV <input wav file> <options>
Options:
-1, -2 or -3 quality level (1:overkill, 2:default, 3:compact)
-o <folder> destination folder for the output file
-force forcibly over-write output file if it exists.
-cbs <n> analysis codec_block_size (512<=n<=4608, default=576 samples)
(should match codec block size used in target compression codec)
-nts <n> noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
(reduces overall bits to remove)
-spread select variable spreading functions.(incompatible with -weight)
-weight select weighted spreading functions.(incompatible with -spread)
(weighted average of fft bins during convolution of fft results
weighted towards lower frequency fft bins, 5/8:3/8)
-skew <n> skew results of fft analyses by n dB (0.0<=n<=12.0, default=0.0)
with a (sin-1) shaping over the frequency range 20Hz to 3.7kHz.
(artificially decrease low frequency bins to take into account
higher SNR requirements at low frequencies)
-dither <n> dither selection, 0<=n<=2, default=0
(0=no dither; 1=rectangular dither; 2=triangular dither)
-clipping <n> clipping prevention selection, 0<=n<=1, default=0. 0=none;
1=fixed clipping prevention amplitude reduction, taking into
account dither amplitude (if any).
-overlap <n> fft_overlap = fft_length/n (2<=n<=8, default=2)
(increases number of fft analyses per codec block)
-quiet significantly reduce screen output
-nowarn suppress lossyWAV warnings
-detail enable detailled output mode
-info display WAV file information
-below set process priority to below normal.
-low set process priority to low.
Options not yet implemented:
-bitdepth <n> forced output bitdepth (16 or 24)
-flac optimizations for use with FLAC
-wv optimizations for use with wavPack
-tak optimizations for use with TAK
However, I think that it may be beneficial to reduce this to
CODE
lossyWAV alpha v0.3.6 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C & Halb27 from a script, www.hydrogenaudio.org
Usage: lossyWAV <input wav file> <options>
Options:
-1, -2 or -3 Classic quality level (1:overkill, 2:default, 3:compact)
-nts <n> noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
(reduces overall bits to remove)
-o <folder> destination folder for the output file
-force forcibly over-write output file if it exists.
-dither dither output using triangular dither, default=off
-clipping <n> clipping prevention selection, 0<=n<=1, default=0. 0=none;
1=fixed clipping prevention amplitude reduction, taking into
account dither amplitude (if any).
-quiet significantly reduce screen output
-nowarn suppress lossyWAV warnings
-below set process priority to below normal.
-low set process priority to low.
and tweak the parameters implicit in -1,-2 & -3. Possibly implement additional test settings to see whether a listener prefers -2 or -20? Codec block size needs to be stated for each quality setting or the user will not know how to optimally compress the output.
As an aside, I used v0.3.5 to compress 30GB of FLAC files at quality -2 and got 15.2GB out - average bitrate approx 420kbps.
As there are no real process developments (other than code optimisation) in v0.3.6, I will defer release until a way forward is agreed on internal quality settings development.
Nick.
QUOTE(halb27 @ Oct 9 2007, 23:01)

QUOTE(Nick.C @ Oct 5 2007, 10:11)

... -spread, replaces -vsfl. An experimental take on spreading. ...
Sorry, but I'm not sure whether it's a promising procedure to try out different weights in building the average of 3 or 4 bins. My feeling is that in the overall view that's not significant variation and may produce better results in one case and worse in other ones.
I'm still a bit worried about David Bryants comment on the spreading function: that the critical bands have a different width, with corner frequencies according to Bark of 0, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500 Hz.
So to me it's plausible to vary the number of bins over which to build the average not only according to the fft_length as you did already with the previous -vfsl option, but also on the frequency range the corresponding bin belongs to. Taking it into account roughly may be sufficient (for instance averaging over 2 bins in the range up to 1720 Hz, 3 bins in the range up to 3700 Hz and over 4 bins otherwise).
The most recent experimental take on spreading (in the original thread) uses simple 3 bin average at short FFT lengths (2 to 64 samples) and shifts gradually to max of adjacent bins and current bin (a simple attempt at masking) at long FFT lengths (1024 to 32768 samples). If anyone has any algorithmic ideas with regard to spreading, then please let me know. Bear in mind that the default quality settings have always used 4 bin averaging (-2 & -3) and 3 bin averaging (-1).
2Bdecided
Oct 10 2007, 07:31
QUOTE(halb27 @ Oct 10 2007, 12:17)

QUOTE(2Bdecided @ Oct 10 2007, 11:34)

You must leave noise threshold shift in as a command line option.
Sure that is an advantage per se. But on the other hand it breaks the simple division of a simple quality parameter and options targeting at more or less additional features like dithering. Moreover differences in threshold shift are incorported in the difference between -2 and -1.
I know. But if you want to use lossyFLAC in multiple generations of encoding (50 or more) you ought to use about -12.
Also, if anyone does find a problem sample, the obvious question is how far must the threshold shift before it's solved. If you remove the switch, no one can answer this!
Besides, the noise threshold shift is the most fundamental parameter in lossyFLAC. It was probably the first line of code that I coded! I wrote threshold_shift=0; with the assumption that really it shouldn't be zero and I'd figure it out later!

Cheers,
David.
Nick.C
Oct 10 2007, 07:38
QUOTE(2Bdecided @ Oct 10 2007, 14:31)

I know. But if you want to use lossyFLAC in multiple generations of encoding (50 or more) you ought to use about -12.
This doesn't appear to correlate with my findings with multi-generational processing - with no dither the output matches the input after about 4 or 5 generations - beyond that, generation n = generation n-1.
halb27
Oct 10 2007, 07:39
QUOTE(Nick.C @ Oct 10 2007, 14:14)

...However, I think that it may be beneficial to reduce this to
CODE
lossyWAV alpha v0.3.6 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C & Halb27 from a script, www.hydrogenaudio.org
Usage: lossyWAV <input wav file> <options>
Options:
-1, -2 or -3 Classic quality level (1:overkill, 2:default, 3:compact)
-nts <n> noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
(reduces overall bits to remove)
-o <folder> destination folder for the output file
-force forcibly over-write output file if it exists.
-dither dither output using triangular dither, default=off
-clipping <n> clipping prevention selection, 0<=n<=1, default=0. 0=none;
1=fixed clipping prevention amplitude reduction, taking into
account dither amplitude (if any).
-quiet significantly reduce screen output
-nowarn suppress lossyWAV warnings
-below set process priority to below normal.
-low set process priority to low.
and tweak the parameters implicit in -1,-2 & -3. Possibly implement additional test settings to see whether a listener prefers -2 or -20? Codec block size needs to be stated for each quality setting or the user will not know how to optimally compress the output. ...
I welcome such an approach very much.
As for codec block size sure it must be known. At the moment I think it's best to concentrate on FLAC and use a blocksize of 1024 (with any quality setting).
Whenever the demand comes for other lossless codecs I think it's best to bring the -tak etc. paramters to life and use a codec specific blocksize. Or may be bring them to life immediately with a promising blocksize.
QUOTE
The most recent experimental take on spreading (in the original thread) uses simple 3 bin average at short FFT lengths (2 to 64 samples) and shifts gradually to max of adjacent bins and current bin (a simple attempt at masking) at long FFT lengths (1024 to 32768 samples). If anyone has any algorithmic ideas with regard to spreading, then please let me know. Bear in mind that the default quality settings have always used 4 bin averaging (-2 & -3) and 3 bin averaging (-1).
To me this sounds plausible and IMO should be incorporated into the quality settings for -2 and (slightly more demanding) for -1. A rigid justification for such a procedure isn't necessary IMO.
You don't write about David Bryant's concern about the varying width of the critical bands. Isn't it plausible to you? Are there problems with implementation?
2Bdecided
Oct 10 2007, 10:10
QUOTE(Nick.C @ Oct 10 2007, 14:38)

QUOTE(2Bdecided @ Oct 10 2007, 14:31)

I know. But if you want to use lossyFLAC in multiple generations of encoding (50 or more) you ought to use about -12.
This doesn't appear to correlate with my findings with multi-generational processing - with no dither the output matches the input after about 4 or 5 generations - beyond that, generation n = generation n-1.
I didn't know that! Still, I can see why it might be true. Not sure I'm certain it's "proven" behaviour yet.
Still,
useful multi-generational encoding means that you're actually going to do something with the audio between encodes. So the audio will keep being changed, and then re-quantised by lossyFLAC/WAV. When I tested this (early on) I ended up with 12dB more noise than I wanted after 50 iterations (which is quite amazingly good, because standard 16-bit dither can be audible after 50 iterations!). Lowering the noise threshold shift will solve this, though I should check that with the current version I guess.
Cheers,
David.
Nick.C
Oct 10 2007, 13:16
QUOTE(halb27 @ Oct 10 2007, 14:39)

You don't write about David Bryant's concern about the varying width of the critical bands. Isn't it plausible to you? Are there problems with implementation?
I take the point that critical bands have varying widths - but as someone who has only become recently aware of most of the concepts being used in the method, I am a bit at a loss as to how to proceed with implementing an element of the method which would take this into account.
One thought that has just occurred to me:
Is there any merit in averaging / taking the minimum of FFT results
across analyses carried out for each FFT length for a codec block rather than or as well as along the FFT analysis results? Would this give some time spreading? Or have I just drunk too much coffee today?
Thinking about default settings:
-1 : codec block size=2304 samples; 4 analyses; 64, 256, 1024 & 4096 sample FFT lengths; noise_threshold_shift=-3.0; spreading_function_length=3;
-2 : codec block size=1152 samples; 3 analyses; 64, 256 & 1024 sample FFT lengths; noise_threshold_shift=-1.5; spreading_function_length=4;
-3 : codec block size=576 samples; 2 analyses; 64 & 1024 sample FFT lengths; noise_threshold_shift=-1.0; spreading_function_length=4;
or, should the spreading_function_length=n be replaced by the experimental 3 bin average to 3 bin max spreading?
I am stripping excess command line parameters out and will play with the temporal fft averaging / minimum algorithm.
GeSomeone
Oct 10 2007, 16:19
QUOTE(halb27 @ Oct 10 2007, 12:17)

.. I'm afraid the fact that we don't have a lot of listeners in the testing phase is not only due to the difficulties in abxing samples at the high quality already achieved but to some extent also to the amount of options not everybody knows what they are good for.
Looking from the sideline I add my 2 cents.
The only thing that really matters is: the default should be "the Right Thing".
Having a scale like -1 -2 -3 also helps to appear simple, but first you must have -2 (the default), before worrying about the user interface. The others should be sufficiently different in size (or maybe speed) with a tradeoff in quality.
Right now you need the options to find out what the best strategy is. (e.g. why remove -skew when it has proven useful?)
There could be another reason, the concept of lossylossless might not appeal to many and is certainly hard to ABX once you reach a certain low noise level.
BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. I played around with replaygain a bit and (in a very sub-optimal listening environment, not even through a headphone) somewhere at about +40 dB replaygain the noise was masked by harddisk and fan noise. Not very useful I'm sure.
QUOTE(Nick.C @ Aug 11 2007, 08:29)

However, a foobar2000 DSP plugin has to be at the top of my wishlist - it would make it all *so* much easier, and would more easily preserve tagging information.
I was wondering if that would work, as in the foobar2000 0.9 DSP pipeline everything is passed as 32 bit floats? It might be no problem to remove bits though.
Nick.C
Oct 10 2007, 16:25
QUOTE(GeSomeone @ Oct 10 2007, 23:19)

...the default should be "the Right Thing".
I wholeheartedly agree!
QUOTE(GeSomeone @ Oct 10 2007, 23:19)

Having a scale like -1 -2 -3 also helps to appear simple, but first you must have -2 (the default), before worrying about the user interface. The others should be sufficiently different in size (or maybe speed) with a tradeoff in quality.
That's what we've tried to do, the settings in v0.3.5 are close those arrived at with Halb27 and Wombat in this thread.
QUOTE(GeSomeone @ Oct 10 2007, 23:19)

Right now you need the options to find out what the best strategy is. (e.g. why remove -skew when it has proven useful?)
Point taken, coincidentally, it hasn't yet been removed - I won't yet.
QUOTE(GeSomeone @ Oct 10 2007, 23:19)

BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. I played around with replaygain a bit and (in a very sub-optimal listening environment, not even through a headphone) somewhere at about +40 dB replaygain the noise was masked by harddisk and fan noise. Not very useful I'm sure.
Which version of lossyWAV was that? Recent versions default to no dither, so this problem should not happen unless you dither.
Thanks for the input!
Nick.
Nick.C
Oct 10 2007, 17:01
lossyWAV alpha v0.3.6 attached. Superseded, see later.
halb27
Oct 11 2007, 01:11
QUOTE(Nick.C @ Oct 10 2007, 21:16)

I take the point that critical bands have varying widths - but as someone who has only become recently aware of most of the concepts being used in the method, I am a bit at a loss as to how to proceed with implementing an element of the method which would take this into account.
Don't the coefficients returned by the FFT relate to frequencies which equidistantly cover the frequency range (linear partitioning)?
That's my maybe naive imagination.
QUOTE(Nick.C @ Oct 11 2007, 01:01)

lossyWAV alpha v0.3.6 attached. ...
Thank you.
a) The codec blocksize of -1/-2/-3 is now 2304/1152/576?
b) The spreading_length of -1/-2/-3 is now 3/4/4
and simple averaging is done in the spreading function when not using advanced option -spread?
c) Noise threshold shift default of -1/-2/-3 is now -3.0/-1.5/-1.0?
What does -spread do?
Nick.C
Oct 11 2007, 02:03
QUOTE(halb27 @ Oct 11 2007, 08:11)

a) The codec blocksize of -1/-2/-3 is now 2304/1152/576?
b) The spreading_length of -1/-2/-3 is now 3/4/4
and simple averaging is done in the spreading function when not using advanced option -spread?
c) Noise threshold shift default of -1/-2/-3 is now -3.0/-1.5/-1.0?
What does -spread do?
a) Yes;
b) Yes;
c) Yes;
-spread carries out the spreading which varies with fft length. See code fragment in original thread.
halb27
Oct 11 2007, 04:56
Wonderful.
So at the moment we're left with guruboolez' problem where he could abx a harpsichord sample.
@guruboolez: Are you out there?
It would be great if you could give your sample another try with this new version.
Nick.C
Oct 11 2007, 05:17
@Halb27: Taking on board what you were saying about using Bark band width to determine how many bins to average, I will start to work out a new spreading option which does (inspired by one of J.M.Valin's papers).
2Bdecided
Oct 11 2007, 05:29
I think you're heading down a slippery slope here!
First you'll find yourself averaging over 100 bins at the highest frequency, and before you know it you'll be implementing a proper psychoacoustic model to sort it all out!
Cheers,
David.
halb27
Oct 11 2007, 05:42
At least I don't think of a sophisticated implementation of this principle.
For the extreme cases may be a spreading_length of 1 at the low end, and a spreading_length of 5 at the high end, or something like that, or maybe even less variation. Depending on fft_length. The principle may be worth implementing for a low or moderate fft_length.
For quality reasons (my main concern at the moment) the low end is the critical range, as a spreading_length of 4 or even 3 may not be appropriate here in cases. So taking this into account may be essential.
Allowing a very large spreading_length for the high frequency range is another story and might allow for a lower bitrate on average while keeping up excellent quality. At the moment however I see this rather as an option for the future.
GeSomeone
Oct 11 2007, 10:45
QUOTE(Nick.C @ Oct 10 2007, 23:25)

QUOTE(GeSomeone @ Oct 10 2007, 23:19)

BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. ...
Which version of lossyWAV was that? Recent versions default to no dither, so this problem should not happen unless you dither.
Nick.
It was v0.3.5 with -skew 7 and nothing else.
I don't see it as a real problem though, it is more like a side effect in combination with replaygain. But it seems to proof that even from silence bit's can be removed
Update: I am now convinced the dithering from the foobar2000 converted was to blame. Even though it was set to "only dither lossy sources" it seemed to have kicked in somewhere. (I marked lossFlac as lossy destination). Retesting with setting to "Never Dither" was OK. No extra noise.
2Bdecided
Oct 11 2007, 11:47
QUOTE(GeSomeone @ Oct 11 2007, 17:45)

QUOTE(Nick.C @ Oct 10 2007, 23:25)

QUOTE(GeSomeone @ Oct 10 2007, 23:19)

BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. ...
Which version of lossyWAV was that? Recent versions default to no dither, so this problem should not happen unless you dither.
Nick.
It was v0.3.5 with -skew 7 and nothing else.
I don't see it as a real problem though, it is more like a side effect in combination with replaygain. But it seems to proof that even from silence bit's can be removed

It's not the normal dither - silence and near-silence should be (and with the MATLAB script, are) transparent irrespective of system gain or dither chosen, because lossyFLAC won't touch silence - it won't even re-dither it.
Nick, did you have "always dither" set to on in that version?
Cheers,
David.
Nick.C
Oct 11 2007, 12:07
QUOTE(2Bdecided @ Oct 11 2007, 18:47)

QUOTE(GeSomeone @ Oct 11 2007, 17:45)

QUOTE(Nick.C @ Oct 10 2007, 23:25)

QUOTE(GeSomeone @ Oct 10 2007, 23:19)

BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. ...
Which version of lossyWAV was that? Recent versions default to no dither, so this problem should not happen unless you dither.
Nick.
It was v0.3.5 with -skew 7 and nothing else.
I don't see it as a real problem though, it is more like a side effect in combination with replaygain. But it seems to proof that even from silence bit's can be removed

It's not the normal dither - silence and near-silence should be (and with the MATLAB script, are) transparent irrespective of system gain or dither chosen, because lossyFLAC won't touch silence - it won't even re-dither it.
Nick, did you have "always dither" set to on in that version?
Cheers,
David.
There shouldn't have been - it was removed at about v0.3.2.
Nick.C
Oct 12 2007, 02:07
lossyWAV alpha v0.3.7 attached. Removed due to suspect spreading function and superseded by alpha v0.3.8 below.
"-spread" parameter now enables Bark spreading function rather than previous experimental 3 bin average to 3 bin max spreading function.
As stated in the original thread, for my 52 sample set:
WAV : 121.5MB;
FLAC : 68.2MB;
lossyWAV -2 : 39.5MB;
lossyWAV -2 -spread : 35.3MB;
The reassuring thing about the new spreading function is that those files that you would expect (from simple 3 or 4 bin averaging) very few bits to be removed still have very few bits removed.
halb27
Oct 12 2007, 02:32
I don't beleive it: Nick, did you work throughout the night? How do you manage to be so fast?
A big, big thank you to you!
And the result looks very, very promising.
Sure I'll try my usual test samples with this new version using -spread.
QUOTE(Nick.C @ Oct 11 2007, 09:03)

QUOTE(halb27 @ Oct 11 2007, 08:11)

a) The codec blocksize of -1/-2/-3 is now 2304/1152/576?
a) Yes;
Could you please add a switch to set the block sizes to 2048/1024/512? I would like to evaluate the optimum encoder settings for TAK. Unfortunately TAK currently only supports block sizes which are powers of 2...
I am very impressed by your (and 2BDecided's) work! For me LossyFlac is an exciting new option. Thanks also to the hard working testers.
Thomas
Nick.C
Oct 12 2007, 06:17
QUOTE(TBeck @ Oct 12 2007, 13:05)

Could you please add a switch to set the block sizes to 2048/1024/512? I would like to evaluate the optimum encoder settings for TAK. Unfortunately TAK currently only supports block sizes which are powers of 2...
Thomas, I will enable the "-flac" and "-tak" parameters tonight which will set the codec_block_size for FLAC to 2304/1152/576 and for TAK to 2048/1024/512.
I would also welcome any feedback whatsoever regarding my Bark spreading function - I can't hear anything wrong with the output, but I want independent critical input to determine whether it's worth keeping, needs work, or just needs to be trashed.
Nick.
GeSomeone
Oct 12 2007, 11:26
QUOTE(Nick.C @ Oct 12 2007, 13:17)

I will enable the "-flac" and "-tak" parameters tonight which will set the codec_block_size for FLAC to 2304/1152/576 and for TAK to 2048/1024/512.
But the blocksizes of -tak could also be used with FLAC.
Maybe it's somewhere in this thread, but where did the 576 size come from again?
halb27
Oct 12 2007, 11:50
Sorry, but -spread as of this version isn't so good.
I got used to only produce the .lossy.wavs via the command interpreter and watched the messages lossyWav produced, and I was very astonished about the rather high bits removed average of Atem-lied and keys_1644ds. So I was very curious about the audio quality.
Atem-lied is relatively good with so many bits removed (acceptable for -3 IMO), but I could abx it 9/10.
keys_1644ds however is bad (no abxing required).
So I guess the current implementation is a bit aggressive.
Nick: For experimentation maybe you can provide a parameter for the -spread option.
Something like:
One of the parameter values represents a spreading_length of 1 for low frequencies and a short or moderate fft_length, as well as a strong overall restriction like 4 to any spreading_length.
An other parameter value represents for a spreading_length of 1 for low frequencies and a short fft_length, a spreading_length of 2 for low frequencies and a moderate fft_length, as well as a rather strong overall restriction like 6 to any spreading_length, but switches to a speading_length of 6 only when fft_length is long.
These parameter values have quality in mind. More parameter values are welcome of course switching gradually from the pure quality target towards the efficiency target.
Nick.C
Oct 12 2007, 12:44
QUOTE(halb27 @ Oct 12 2007, 18:50)

Sorry, but -spread as of this version isn't so good.
I got used to only produce the .lossy.wavs via the command interpreter and watched the messages lossyWav produced, and I was very astonished about the rather high bits removed average of Atem-lied and keys_1644ds. So I was very curious about the audio quality.
Atem-lied is relatively good with so many bits removed (acceptable for -3 IMO), but I could abx it 9/10.
keys_1644ds however is bad (no abxing required).
So I guess the current implementation is a bit aggressive.
Nick: For experimentation maybe you can provide a parameter for the -spread option.
Something like:
One of the parameter values represents a spreading_length of 1 for low frequencies and a short or moderate fft_length, as well as a strong overall restriction like 4 to any spreading_length.
An other parameter value represents for a spreading_length of 1 for low frequencies and a short fft_length, a spreading_length of 2 for low frequencies and a moderate fft_length, as well as a rather strong overall restriction like 6 to any spreading_length, but switches to a speading_length of 6 only when fft_length is long.
These parameter values have quality in mind. More parameter values are welcome of course switching gradually from the pure quality target towards the efficiency target.
Before abandoning the Bark averaging method, I think that it should be expanded. At the moment each of the first 25 Bark ranges (0 to 24) are averaged then the minimum average value taken as the value for which to calculate bits to remove. I feel that this is too coarse and the granularity should be reduced by using half or even quarter Bark averaging. I will have a think about this and post v0.3.8 soon.
The -spread in v0.3.6 used 3 bin averaging at short FFT lengths (<=64 samples)and gradually changed to 3 bin maximum at long FFT lengths (>=1024 samples). This seems to be closer to what you mention above (although not exactly).
Thanks for the listening time!
QUOTE(GeSomeone @ Oct 12 2007, 18:26)

QUOTE(Nick.C @ Oct 12 2007, 13:17)

I will enable the "-flac" and "-tak" parameters tonight which will set the codec_block_size for FLAC to 2304/1152/576 and for TAK to 2048/1024/512.
But the blocksizes of -tak could also be used with FLAC.
Maybe it's somewhere in this thread, but where did the 576 size come from again?
Maybe what is required is a CD sector related codec_block_size (2304/1152/576 samples) or a power of two equivalent (2048/1024/512 samples). This could be easily implemented by a "-cd" or "-CD" switch to change from power of two blocks to CD sector multiple blocks. I will incorporate this for v0.3.8.
Thanks for the input!
halb27
Oct 12 2007, 13:08
I see: you immediately did the whole thing and averaged over an entire critical band.
Not exactly what I have in mind.
I wouldn't bring the critical band as such so much into focus. Guess that's what 2Bdecided is afraid of.
I'd rather have the original averaging in primary focus, but with (cautious) corrections according to the widths of the critical bands.
Qualitywise I think it is essential to concentrate on the lower spectrum and use the critical band idea to hold the spreading_length very small when only one or few bins fall into a critical band.
With this it's not even necessary to look at every single critical band, but just do the averaging differently within larger frequency ranges (for instance for low to moderate fft_length use a spreading_length of 1 below ~ 800 Hz, 2 in the ~ 800-2000 Hz range, 3 in the ~ 2-8 kHz range, and 4 for the ~ 8+ kHz range, and increase these spreading_lengths very softly with increasing fft_length).
This is all with quality in mind.
Once high quality is settled (we still have an open problem with guruboolez' sample) we might become less cautious and try a bit more adventurous tactics.
halb27
Oct 12 2007, 13:21
QUOTE(Nick.C @ Oct 12 2007, 20:44)

QUOTE(GeSomeone @ Oct 12 2007, 18:26)

...Maybe it's somewhere in this thread, but where did the 576 size come from again?
Maybe what is required is a CD sector related codec_block_size (2304/1152/576 samples) or a power of two equivalent (2048/1024/512 samples). This could be easily implemented by a "-cd" or "-CD" switch to change from power of two blocks to CD sector multiple blocks. I will incorporate this for v0.3.8.
This would break the idea of -flac, -tak, etc. as targeting specific lossless encoders. Why do you want to do that?
-tak does everything that is needed.
I think GeSomeone's question targets at why at the moment the blocksizes are 2304/1152/576 and maybe why they should be like that for -flac.
According to the FLAC documentation it looks like the FLAC blocksize should be a multiple of 576, but this is not so as I did use FLAC with a blocksize of 1024. Because of this was my suggestion to use a default blocksize of 1024 with -1, -2, and -3 when not using -flac, -tak, etc., especially as my experiments didn't show up a significant saving in bitrate when using 576 instead of 1024.
Anyway I welcome the activation of -tak, -flac, etc.
Nick.C
Oct 12 2007, 13:30
From the FLAC format page:
CODE
Block size in inter-channel samples:
* 0000 : reserved
* 0001 : 192 samples
* 0010-0101 : 576 * (2^(n-2)) samples, i.e. 576/1152/2304/4608
* 0110 : get 8 bit (blocksize-1) from end of header
* 0111 : get 16 bit (blocksize-1) from end of header
* 1000-1111 : 256 * (2^(n-8)) samples, i.e. 256/512/1024/2048/4096/8192/16384/32768
I like 576 because it increases the bits_to_remove by processing over a shorter time frame. If the concensus is that standard codec_block_size should be 1024 samples, then so be it.
The reason that -flac and -tak have not yet been activated it that, basically, there are no codec specific settings yet. The only reason to implement them now would be because of the codec_block_size issue.
halb27
Oct 12 2007, 13:53
QUOTE(Nick.C @ Oct 12 2007, 21:30)

The reason that -flac and -tak have not yet been activated it that, basically, there are no codec specific settings yet. The only reason to implement them now would be because of the codec_block_size issue.
Yes, but it brings already certainty to any user whatever lossless codec he uses. By using -tak a TAK user knows lossyWav will work fine with TAK. No need IMO to think of -tak etc. as of a super-optimized version for the specific codec. Things start with codec blocksize.
As for the blocksize without a target codec option I still think it's good to default it to 1024 universally. Clear thing, easy to memorize, and should also do it efficiently in any situation known so far. Optimizing blocksize is then the clear task of -flac, etc. However it's not really of primary concern. To me it's fine also with the way it is.
Nick.C
Oct 12 2007, 14:13
Default to 1024 for all quality settings will be implemented in v0.3.8
halb27
Oct 12 2007, 14:29
Thank you.
Nick.C
Oct 12 2007, 15:21
QUOTE(halb27 @ Oct 12 2007, 20:08)

I see: you immediately did the whole thing and averaged over an entire critical band.
Not exactly what I have in mind.
I wouldn't bring the critical band as such so much into focus. Guess that's what 2Bdecided is afraid of.
I'd rather have the original averaging in primary focus, but with (cautious) corrections according to the widths of the critical bands.
Qualitywise I think it is essential to concentrate on the lower spectrum and use the critical band idea to hold the spreading_length very small when only one or few bins fall into a critical band.
With this it's not even necessary to look at every single critical band, but just do the averaging differently within larger frequency ranges (for instance for low to moderate fft_length use a spreading_length of 1 below ~ 800 Hz, 2 in the ~ 800-2000 Hz range, 3 in the ~ 2-8 kHz range, and 4 for the ~ 8+ kHz range, and increase these spreading_lengths very softly with increasing fft_length).
This is all with quality in mind.
Once high quality is settled (we still have an open problem with guruboolez' sample) we might become less cautious and try a bit more adventurous tactics.
Oops - missed this post entirely. I'm getting disillusioned with my approach to Bark averaging - will park it and start on something akin to what you've just mentioned, i.e. spreading_function_lengths increase with both frequency and fft_length. Looking at the geometric fft_length increase, should the spreading_function_length also increase in that manner, i.e. sfl[n+1]:=sfl[n]*2; or should it increase more slowly?
halb27
Oct 12 2007, 17:44
More slowly. At the moment I think it would be good to keep spreading_length pretty much in the region we're used to even for long fft lengths. Spreading length must not increase with each increase of fft length.
As with frequency dependency for the spreading length I am thinking of only a very rough dependency on fft_length.
Something like: use something like the frequency dependency I mentioned (spreading length 1 to 4 according to a rough frequency classification - let's call this the basic frequency dependency rule) for a fft length <= 256, add 1 to the spreading length of the basic frequency dependency rule for a fft length > 256 but <= 1024, and add 2 to the spreading length of the basic frequency rule for a fft length > 1024. Maybe add 3 to the spreading length of the basic frequency dependency rule for extremely long ffts.
You see: even with highest frequency and longest fft length a spreading length of 6 or 7 as a maximum.
I guess this is a bit too conservative, but as long as we don't know it's better to play it safe. Variations can be done later (or by means of a -spread parameter value).
Nick.C
Oct 14 2007, 15:45
lossyWAV alpha v0.3.8 attached. Superseded.
Having made an abortive attempt at Bark related bit reduction determination, I have been changing the spreading method a bit, firstly having reverted to the original FFT bin averaging (3 or 4 bins dependent on quality level). As can be seen below, I have introduced two elements to the method: firstly, average 3 bins below 3.7kHz and 4 bins above; secondly, use the "square mean root" value as a slightly more conservative result (compared to simple averaging).
Reducing to very few bins (i.e. 1 or 2) drastically reduces the bits_to_remove and has not been implemented.
CODE
lossyWAV alpha v0.3.8 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C & Halb27 from a script, www.hydrogenaudio.org
Usage : lossyWAV <input wav file> <options>
Example : lossyWAV musicfile.wav
Options:
-1, -2 or -3 quality level (1:overkill, 2:default, 3:compact)
-nts <n> noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
(reduces overall bits to remove, -1 bit = -6.0206dB)
-o <folder> destination folder for the output file
-force forcibly over-write output file if it exists.
Advanced Options:
-spread <n> select spreading method : 0<=n<=3; default=0
0 = fft bin averaging : 3 or 4 bins, (original method);
1 = fft bin averaging : 3 bins below 3.7kHz, 4 bins above;
2 = fft bin square mean root : 4 bins;
3 = fft bin square mean root : 3 bins below 3.7kHz, 4 bins above
-skew <n> skew results of fft analyses by n dB (0.0<=n<=12.0, default=0.0)
with a (sin-1) shaping over the frequency range 20Hz to 3.7kHz.
(artificially decrease low frequency bins to take into account
higher SNR requirements at low frequencies)
-dither dither output using triangular dither; default=off
-noclip clipping prevention amplitude reduction; default=off
-quiet significantly reduce screen output
-nowarn suppress lossyWAV warnings
-detail enable detailled output mode
-below set process priority to below normal.
-low set process priority to low.
bryant
Oct 14 2007, 22:57
I'm not sure how useful this is, or whether it makes any sense to integrate into lossyWAV, but I have created a “smart” normalization program that I think might fix one of the troubling issues of lossyWAV (at least for me). It might even work well in other situations where normalization is desired, although I don't know enough about those to say.
Most normalization programs work by applying a scaling factor on every audio sample such that a maximum value sample (i.e., -32768/+32767) is reduced to some desired lower value. After applying the scale factor, they may or may not apply dither and noise-shaping (they probably should, but most I've seen don't). This works great at normal audio levels, but can cause trouble at very low levels. The problem is that by using various forms of noise shaping, well produced CDs contain information below the LSB. To preserve this information (and the characteristics of the original noise floor spectrum) it is important to preserve the exact sample values at low levels.
This suggests an alternative algorithm that maps low-level samples to the output exactly, but then goes non-linear at higher values to ensure that the desired peak limit is not exceeded (this is sometimes called soft clipping). This fixes the low-level sample problem, however soft-clipping introduces unacceptably high levels of harmonic distortion in full-scale signals.
The algorithm I chose for this program combines the two methods by calculating a running RMS level (with attack and decay) and using that to determine the ideal transfer function. At low levels it maps samples without modification to the output (with rogue high samples being softly clipped). At high levels it uses the simple scaling factor (where there's enough signal that dither and noise-shaping are not needed). In between the high and low level areas is a 12 dB transition zone where the program linearly interpolates between the two methods based on the position in the zone. In this transition zone a small amount of odd harmonic distortion is added to the signal, but it's very low in level.
I am attaching a zip file with the program source and a Windows executable (the program compiles fine on Ubuntu Linux and probably most others). This has not been tested too much (especially in error conditions) so be careful!
David
Nick.C
Oct 15 2007, 01:01
Thanks for the code - I will certainly have a look at it to see how you did it!
On amplitude reduction, lossyWAV no longer reduces amplitude by default - the user has to specify the "-noclip" parameter.
Many thanks,
Nick.
halb27
Oct 15 2007, 01:59
QUOTE(Nick.C @ Oct 14 2007, 23:45)

Reducing to very few bins (i.e. 1 or 2) drastically reduces the bits_to_remove and has not been implemented.
Thank you for the new version. Will try it out as soon as possible.
If averaging over 1 or 2 bins yields unappropriate bitrates your current approach is most appropriate I think.
Just for clarity:
a) is bits_to_remove too low also when applied to very short fft lengths when averaging over 1 or 2 bins in the frequency range below ~ 700 Hz?
In the end you must have done something like that when averaging over entire critical bands - bits_to_remove was not too low then.
b) is it also not worth while averaging over say 2 bins in the low frequency range with very short fft lengths when considering it being applied to quality mode -1?
Another question as -tak etc. is not enabled yet:
Is codec blocksize now a constant 1024 with any quality mode?
BTW as you are doing the hard work: Please remove me from the author list of lossyWav.exe. It's not appropriate. I'm glad I could contribute a bit with the wavIO unit, but in the end it's absolutely minor contribution. Of course I will continue to maintain wavIO, so feel free to tell me about any changes you like to have realised.
Nick.C
Oct 15 2007, 05:09
"-tak" is not yet enabled, default codec_block_size is 1024 samples for all quality levels as previously discussed.
I am looking at other permutations of spreading, including one which has 3 intermediate frequency splits and averages as follows:
20Hz to 800Hz : 2 bins;
800Hz to 3.7kHz : 3 bins;
3.7kHz to 8kHz : 4 bins;
8kHz > 16kHz : 5 bins;
I'll let you know how this one works out.
Nick.
halb27
Oct 15 2007, 05:35
Sounds good.
Please don't see it as a bad thing in case bits_to_remove should go down a bit.
After all we are still left with guruboolez' sample he could abx.
Nick.C
Oct 15 2007, 07:27
lossyWAV alpha v0.3.9 attached. Superseded.
Default spreading method made slightly more conservative;
Code rationalised for spreading methods 1 to 3;
Spreading method 4 introduced, 2 fft bin averaging 20Hz to 800Hz; 3 fft bin averaging 800Hz to 3.7kHz; 4 bin averaging 3.7kHz to 16kHz. (5 fft bin averaging 8kHz to 16kHz was not successful - too many bits removed).
CODE
lossyWAV alpha v0.3.9 : .....WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C from a Matlab script, www.hydrogenaudio.org
Usage : lossyWAV <input wav file> <options>
Example : lossyWAV musicfile.wav
Options:
-1, -2 or -3 quality level (1:overkill, 2:default, 3:compact)
-nts <n> noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
(reduces overall bits to remove, -1 bit = -6.0206dB)
-o <folder> destination folder for the output file
-force forcibly over-write output file if it exists.
Advanced Options:
-spread <n> select spreading method : 0<=n<=4; default=0
0 = fft bin averaging : 3 or 4 bins, (less agressive than orig.);
1 = fft bin averaging : 3 bins below 3.7kHz, 4 bins above;
2 = fft bin square mean root : 4 bins;
3 = fft bin square mean root : 3 bins below 3.7kHz, 4 bins above
4 = fft bin averaging : 2 bins from 20Hz to 800Hz; 3 bins from
800Hz to 3.7kHz; 4 bins from 3.7kHz to 16kHz.
-skew <n> skew results of fft analyses by n dB (0.0<=n<=12.0, default=0.0)
with a (sin-1) shaping over the frequency range 20Hz to 3.7kHz.
(artificially decrease low frequency bins to take into account
higher SNR requirements at low frequencies)
-dither dither output using triangular dither; default=off
-noclip clipping prevention amplitude reduction; default=off
-quiet significantly reduce screen output
-nowarn suppress lossyWAV warnings
-detail enable detailled output mode
-below set process priority to below normal.
-low set process priority to low.
Special thanks:
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
halb27
Oct 15 2007, 07:31
Wonderful. Thanks a lot.
Nick.C
Oct 15 2007, 14:45
I have been processing permutations with v0.3.10 (only faster than v0.3.9) and -spread 4 seems to be a candidate for default spreading function. However, I feel that the 800Hz / 3.7kHz / 8kHz intermediate steps might need moved to more suitable points in the frequency range between 20Hz and 16kHz.
Another thing I need advice with is licensing - portions of the code are (heavily modified) LGPL, so LGPL seems to be the way to go, however, I don't know exactly what I need to add to the .exe or license.txt file no enact it. As well as that, the method is David Robinson's implementation of an idea - all I have done is transcode and tweak a bit.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.