lossyWAV Development

Topic: lossyWAV Development (Read 568635 times) previous topic - next topic

0 Members and 2 Guests are viewing this topic.

lossyWAV Development

Reply #250 – 2007-10-03 19:49:46

~~lossyWAV alpha v0.3.4 attached.~~ Superseded by alpha v0.3.5.

new parameter -info to show WAV file rate, channels, bps and length;
code tidy up and speed up.

Have fun!

lossyWAV Development

Reply #251 – 2007-10-05 09:11:50

~~lossyWAV alpha v0.3.5 attached.~~ Superseded by alpha v0.3.6

new parameter -spread, replaces -vsfl. An experimental take on spreading.
code tidy up and (quite significant) speed up.

Have fun!

lossyWAV Development

Reply #252 – 2007-10-09 22:40:35

Quote from: guruboolez on 2007-09-29 15:53:39

Lossywav is impressive on harpsichord recordings.
...
There were no noise, no artefact, but something hard to define (audiophile would call it "lack of soundstage" or something similar).

Do you mind trying -cbs 1024? 2Bdecided once mentioned that he created the procedure with such a blocksize in mind and was a bit unsure about the outcome of shorter block sizes. Resulting FLAC filesize should be roughly the same according to my experience.

If this isn't sufficient can you please try -nts x as suggested by Nick.C or maybe also -skew y and -spread?

Sorry I can't do it myself as I'm not able to abx your provided samples.

lossyWAV Development

Reply #253 – 2007-10-09 23:01:54

Quote from: Nick.C on 2007-10-05 09:11:50

... -spread, replaces -vsfl. An experimental take on spreading. ...

Sorry, but I'm not sure whether it's a promising procedure to try out different weights in building the average of 3 or 4 bins. My feeling is that in the overall view that's not significant variation and may produce better results in one case and worse in other ones.

I'm still a bit worried about David Bryants comment on the spreading function: that the critical bands have a different width, with corner frequencies according to Bark of 0, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500 Hz.

So to me it's plausible to vary the number of bins over which to build the average not only according to the fft_length as you did already with the previous -vfsl option, but also on the frequency range the corresponding bin belongs to. Taking it into account roughly may be sufficient (for instance averaging over 2 bins in the range up to 1720 Hz, 3 bins in the range up to 3700 Hz and over 4 bins otherwise).

lossyWAV Development

Reply #254 – 2007-10-10 10:20:05

@ Nick.C:

I also feel a bit uncomfortable about the many options. It is not inviting for potential listeners who have a hard job as quality is already very good, and a sufficient amount of listening experience is what is missing most at the moment.

It's all a matter of taste, but I think it will be good to return back more to the essentials of primary quality settings.
As for additional options I thinks it's good to have the dithering and the clipping option (default: no dithering and no anti-clipping strategy).
But other than that my feeling is that everything should go into -1, -2, -3.
Moreover we should concentrate on getting an extremely good quality in the ~500 kbps range. I think current experience is enough to show that achieving significantly lower bitrate while keeping up excellent quality is not possible with the current approach without additions like those proposed by SebastianG.
So I think we should leave the -3 option behind until more details about such an approach are available.

On concentrating on -1 and -2 I think to target at -2 at a level that makes any known sample transparent to any listener, and we should keep -1 details only slightly above these qualitywise. This gives any listener the chance to switch from -2 to -1 in case he has a sample which is not transparent.
As a consequence what was -1 should then become -2 in the next version (or a small promising variant of -1), and a new -1 should be created.

Suggestion for -2: what is -2 right now with no skewing, and a spreading function which does just arithmetic averaging, but with the number of bins participating in averaging depending on bin frequency as described in my last post, and also depending on fft length as you did already. Moreover I'd welcome a blocksize of 1024 instead of 576. No serious disadvantage in resulting bitrate but more secure.

Suggestion for -1: specifics of -1 like in the existing version, other details like with -2 but a tiny bit more demanding, for instance a slightly lowered noise threshold and a small skewing factor (as the first trial - can be increased if necessary).

lossyWAV Development

Reply #255 – 2007-10-10 10:34:43

You must leave noise threshold shift in as a command line option.

Either the frequency skewing or the variable spreading length appear to be needed to make it work properly.

I agree that lots of options are confusing, but I thought they were only in there for testing. There will eventually be no direct user control of any of them, I hope, because someone will figure out the optimal settings and default them.

Cheers,
David.

lossyWAV Development

Reply #256 – 2007-10-10 12:17:26

Quote from: 2Bdecided on 2007-10-10 10:34:43

You must leave noise threshold shift in as a command line option.

Sure that is an advantage per se. But on the other hand it breaks the simple division of a simple quality parameter and options targeting at more or less additional features like dithering. Moreover differences in threshold shift are incorported in the difference between -2 and -1.

Quote

Either the frequency skewing or the variable spreading length appear to be needed to make it work properly.

This is the case with my suggestion for -1 and -2.

Quote

I agree that lots of options are confusing, but I thought they were only in there for testing. There will eventually be no direct user control of any of them, I hope, because someone will figure out the optimal settings and default them.

Sure, but I'm afraid the fact that we don't have a lot of listeners in the testing phase is not only due to the difficulties in abxing samples at the high quality already achieved but to some extent also to the amount of options not everybody knows what they are good for.
Looking at guruboolez (certainly the most welcome tester) it looks like he doesn't want to play around with options.

Sure these things are also related to my personal opinion that varying spreading function by varying weights in the average formula is not worth while. Variable spreading length however is promising IMO.
It's also related to my beleive that a significant saving in bitrate is not possible with the current approach, and I don't care much about whether it's finally 530 kbps or 480 kbps on average. After all we're targeting at a significantly lower bitrate than going lossless, while keeping up transparency to a high degree of security. The latter part is what I care about most, and IMO we should do everything to encourage testers.

lossyWAV Development

Reply #257 – 2007-10-10 12:48:52

I would like people to feed all their transform problem samples and start testing lossywav. Problem is that hybrids make easy work of most transform problems. It would still be usefull I think even though i don't think we will see a good abx result even for -2 (hopefully).

lossyWAV Development

Reply #258 – 2007-10-10 13:14:53

I hear what's being said, but my ears / listening environment are not up to finalising the settings by myself.

The current (unreleased alpha v0.3.6) command line parameter list is as follows:

Code: [Select]

lossyWAV alpha v0.3.6 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C & Halb27 from a script, www.hydrogenaudio.org

Usage: lossyWAV <input wav file> <options>

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists.

-cbs <n>      analysis codec_block_size (512<=n<=4608, default=576 samples)
              (should match codec block size used in target compression codec)
-nts <n>      noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
              (reduces overall bits to remove)
-spread       select variable spreading functions.(incompatible with -weight)
-weight       select weighted spreading functions.(incompatible with -spread)
              (weighted average of fft bins during convolution of fft results
              weighted towards lower frequency fft bins, 5/8:3/8)
-skew <n>     skew results of fft analyses by n dB (0.0<=n<=12.0, default=0.0)
              with a (sin-1) shaping over the frequency range 20Hz to 3.7kHz.
              (artificially decrease low frequency bins to take into account
              higher SNR requirements at low frequencies)

-dither <n>   dither selection, 0<=n<=2, default=0
              (0=no dither; 1=rectangular dither; 2=triangular dither)
-clipping <n> clipping prevention selection, 0<=n<=1, default=0. 0=none;
              1=fixed clipping prevention amplitude reduction, taking into
              account dither amplitude (if any).
-overlap <n>  fft_overlap = fft_length/n (2<=n<=8, default=2)
              (increases number of fft analyses per codec block)

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode
-info         display WAV file information
-below        set process priority to below normal.
-low          set process priority to low.

Options not yet implemented:

-bitdepth <n> forced output bitdepth (16 or 24)
-flac         optimizations for use with FLAC
-wv           optimizations for use with wavPack
-tak          optimizations for use with TAK

[/size]

However, I think that it may be beneficial to reduce this to

Code: [Select]

lossyWAV alpha v0.3.6 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C & Halb27 from a script, www.hydrogenaudio.org

Usage: lossyWAV <input wav file> <options>

Options:

-1, -2 or -3  Classic quality level (1:overkill, 2:default, 3:compact)
-nts <n>      noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
              (reduces overall bits to remove)
-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists.

-dither       dither output using triangular dither, default=off
-clipping <n> clipping prevention selection, 0<=n<=1, default=0. 0=none;
              1=fixed clipping prevention amplitude reduction, taking into
              account dither amplitude (if any).

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-below        set process priority to below normal.
-low          set process priority to low.

[/size]and tweak the parameters implicit in -1,-2 & -3. Possibly implement additional test settings to see whether a listener prefers -2 or -20? Codec block size needs to be stated for each quality setting or the user will not know how to optimally compress the output.

As an aside, I used v0.3.5 to compress 30GB of FLAC files at quality -2 and got 15.2GB out - average bitrate approx 420kbps.

As there are no real process developments (other than code optimisation) in v0.3.6, I will defer release until a way forward is agreed on internal quality settings development.

Nick.

Quote from: halb27 on 2007-10-09 23:01:54

Quote from: Nick.C on 2007-10-05 09:11:50

... -spread, replaces -vsfl. An experimental take on spreading. ...

Sorry, but I'm not sure whether it's a promising procedure to try out different weights in building the average of 3 or 4 bins. My feeling is that in the overall view that's not significant variation and may produce better results in one case and worse in other ones.

I'm still a bit worried about David Bryants comment on the spreading function: that the critical bands have a different width, with corner frequencies according to Bark of 0, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500 Hz.

So to me it's plausible to vary the number of bins over which to build the average not only according to the fft_length as you did already with the previous -vfsl option, but also on the frequency range the corresponding bin belongs to. Taking it into account roughly may be sufficient (for instance averaging over 2 bins in the range up to 1720 Hz, 3 bins in the range up to 3700 Hz and over 4 bins otherwise).

[/size]

The most recent experimental take on spreading (in the original thread) uses simple 3 bin average at short FFT lengths (2 to 64 samples) and shifts gradually to max of adjacent bins and current bin (a simple attempt at masking) at long FFT lengths (1024 to 32768 samples). If anyone has any algorithmic ideas with regard to spreading, then please let me know. Bear in mind that the default quality settings have always used 4 bin averaging (-2 & -3) and 3 bin averaging (-1).

lossyWAV Development

Reply #259 – 2007-10-10 14:31:40

Quote from: halb27 on 2007-10-10 12:17:26

Quote from: 2Bdecided on 2007-10-10 10:34:43

You must leave noise threshold shift in as a command line option.

Sure that is an advantage per se. But on the other hand it breaks the simple division of a simple quality parameter and options targeting at more or less additional features like dithering. Moreover differences in threshold shift are incorported in the difference between -2 and -1.

I know. But if you want to use lossyFLAC in multiple generations of encoding (50 or more) you ought to use about -12.

Also, if anyone does find a problem sample, the obvious question is how far must the threshold shift before it's solved. If you remove the switch, no one can answer this!

Besides, the noise threshold shift is the most fundamental parameter in lossyFLAC. It was probably the first line of code that I coded! I wrote threshold_shift=0; with the assumption that really it shouldn't be zero and I'd figure it out later!

Cheers,
David.

lossyWAV Development

Reply #260 – 2007-10-10 14:38:58

Quote from: 2Bdecided on 2007-10-10 14:31:40

I know. But if you want to use lossyFLAC in multiple generations of encoding (50 or more) you ought to use about -12.

This doesn't appear to correlate with my findings with multi-generational processing - with no dither the output matches the input after about 4 or 5 generations - beyond that, generation n = generation n-1.

lossyWAV Development

Reply #261 – 2007-10-10 14:39:10

Quote from: Nick.C on 2007-10-10 13:14:53

...However, I think that it may be beneficial to reduce this to

Code: [Select]

lossyWAV alpha v0.3.6 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C & Halb27 from a script, www.hydrogenaudio.org

Usage: lossyWAV <input wav file> <options>

Options:

-1, -2 or -3  Classic quality level (1:overkill, 2:default, 3:compact)
-nts <n>      noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
              (reduces overall bits to remove)
-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists.

-dither       dither output using triangular dither, default=off
-clipping <n> clipping prevention selection, 0<=n<=1, default=0. 0=none;
              1=fixed clipping prevention amplitude reduction, taking into
              account dither amplitude (if any).

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-below        set process priority to below normal.
-low          set process priority to low.

I welcome such an approach very much.
As for codec block size sure it must be known. At the moment I think it's best to concentrate on FLAC and use a blocksize of 1024 (with any quality setting).
Whenever the demand comes for other lossless codecs I think it's best to bring the -tak etc. paramters to life and use a codec specific blocksize. Or may be bring them to life immediately with a promising blocksize.

Quote

The most recent experimental take on spreading (in the original thread) uses simple 3 bin average at short FFT lengths (2 to 64 samples) and shifts gradually to max of adjacent bins and current bin (a simple attempt at masking) at long FFT lengths (1024 to 32768 samples). If anyone has any algorithmic ideas with regard to spreading, then please let me know. Bear in mind that the default quality settings have always used 4 bin averaging (-2 & -3) and 3 bin averaging (-1).

To me this sounds plausible and IMO should be incorporated into the quality settings for -2 and (slightly more demanding) for -1. A rigid justification for such a procedure isn't necessary IMO.
You don't write about David Bryant's concern about the varying width of the critical bands. Isn't it plausible to you? Are there problems with implementation?

lossyWAV Development

Reply #262 – 2007-10-10 17:10:17

Quote from: Nick.C on 2007-10-10 14:38:58

Quote from: 2Bdecided on 2007-10-10 14:31:40
I know. But if you want to use lossyFLAC in multiple generations of encoding (50 or more) you ought to use about -12.
This doesn't appear to correlate with my findings with multi-generational processing - with no dither the output matches the input after about 4 or 5 generations - beyond that, generation n = generation n-1.

I didn't know that! Still, I can see why it might be true. Not sure I'm certain it's "proven" behaviour yet.

Still, useful multi-generational encoding means that you're actually going to do something with the audio between encodes. So the audio will keep being changed, and then re-quantised by lossyFLAC/WAV. When I tested this (early on) I ended up with 12dB more noise than I wanted after 50 iterations (which is quite amazingly good, because standard 16-bit dither can be audible after 50 iterations!). Lowering the noise threshold shift will solve this, though I should check that with the current version I guess.

Cheers,
David.

lossyWAV Development

Reply #263 – 2007-10-10 20:16:59

Quote from: halb27 on 2007-10-10 14:39:10

You don't write about David Bryant's concern about the varying width of the critical bands. Isn't it plausible to you? Are there problems with implementation?

I take the point that critical bands have varying widths - but as someone who has only become recently aware of most of the concepts being used in the method, I am a bit at a loss as to how to proceed with implementing an element of the method which would take this into account.

One thought that has just occurred to me:

Is there any merit in averaging / taking the minimum of FFT results across analyses carried out for each FFT length for a codec block rather than or as well as along the FFT analysis results? Would this give some time spreading? Or have I just drunk too much coffee today?

Thinking about default settings:

-1 : codec block size=2304 samples; 4 analyses; 64, 256, 1024 & 4096 sample FFT lengths; noise_threshold_shift=-3.0; spreading_function_length=3;

-2 : codec block size=1152 samples; 3 analyses; 64, 256 & 1024 sample FFT lengths; noise_threshold_shift=-1.5; spreading_function_length=4;

-3 : codec block size=576 samples; 2 analyses; 64 & 1024 sample FFT lengths; noise_threshold_shift=-1.0; spreading_function_length=4;

or, should the spreading_function_length=n be replaced by the experimental 3 bin average to 3 bin max spreading?

I am stripping excess command line parameters out and will play with the temporal fft averaging / minimum algorithm.

lossyWAV Development

Reply #264 – 2007-10-10 23:19:02

Quote from: halb27 on 2007-10-10 12:17:26

.. I'm afraid the fact that we don't have a lot of listeners in the testing phase is not only due to the difficulties in abxing samples at the high quality already achieved but to some extent also to the amount of options not everybody knows what they are good for.

Looking from the sideline I add my 2 cents.
The only thing that really matters is: the default should be "the Right Thing?".

Having a scale like -1 -2 -3 also helps to appear simple, but first you must have -2 (the default), before worrying about the user interface. The others should be sufficiently different in size (or maybe speed) with a tradeoff in quality.

Right now you need the options to find out what the best strategy is. (e.g. why remove -skew when it has proven useful?)

There could be another reason, the concept of lossylossless might not appeal to many and is certainly hard to ABX once you reach a certain low noise level.

BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. I played around with replaygain a bit and (in a very sub-optimal listening environment, not even through a headphone) somewhere at about +40 dB replaygain the noise was masked by harddisk and fan noise. Not very useful I'm sure.

Quote from: Nick.C on 2007-08-11 08:29:15

However, a foobar2000 DSP plugin has to be at the top of my wishlist - it would make it all *so* much easier, and would more easily preserve tagging information.

I was wondering if that would work, as in the foobar2000 0.9 DSP pipeline everything is passed as 32 bit floats? It might be no problem to remove bits though.

lossyWAV Development

Reply #265 – 2007-10-10 23:25:20

Quote from: GeSomeone on 2007-10-10 23:19:02

...the default should be "the Right Thing?".

I wholeheartedly agree!

Quote from: GeSomeone on 2007-10-10 23:19:02

Having a scale like -1 -2 -3 also helps to appear simple, but first you must have -2 (the default), before worrying about the user interface. The others should be sufficiently different in size (or maybe speed) with a tradeoff in quality.

That's what we've tried to do, the settings in v0.3.5 are close those arrived at with Halb27 and Wombat in this thread.

Quote from: GeSomeone on 2007-10-10 23:19:02

Right now you need the options to find out what the best strategy is. (e.g. why remove -skew when it has proven useful?)

Point taken, coincidentally, it hasn't yet been removed - I won't yet.

Quote from: GeSomeone on 2007-10-10 23:19:02

BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. I played around with replaygain a bit and (in a very sub-optimal listening environment, not even through a headphone) somewhere at about +40 dB replaygain the noise was masked by harddisk and fan noise. Not very useful I'm sure.

Which version of lossyWAV was that? Recent versions default to no dither, so this problem should not happen unless you dither.

Thanks for the input!

Nick.

lossyWAV Development

Reply #266 – 2007-10-11 00:01:37

~~lossyWAV alpha v0.3.6 attached.~~ Superseded, see later.

lossyWAV Development

Reply #267 – 2007-10-11 08:11:17

Quote from: Nick.C on 2007-10-10 20:16:59

I take the point that critical bands have varying widths - but as someone who has only become recently aware of most of the concepts being used in the method, I am a bit at a loss as to how to proceed with implementing an element of the method which would take this into account.

Don't the coefficients returned by the FFT relate to frequencies which equidistantly cover the frequency range (linear partitioning)?
That's my maybe naive imagination.

Quote from: Nick.C on 2007-10-11 00:01:37

lossyWAV alpha v0.3.6 attached. ...

Thank you.

a) The codec blocksize of -1/-2/-3 is now 2304/1152/576?
b) The spreading_length of -1/-2/-3 is now 3/4/4
and simple averaging is done in the spreading function when not using advanced option -spread?
c) Noise threshold shift default of -1/-2/-3 is now -3.0/-1.5/-1.0?

What does -spread do?

lossyWAV Development

Reply #268 – 2007-10-11 09:03:56

Quote from: halb27 on 2007-10-11 08:11:17

a) The codec blocksize of -1/-2/-3 is now 2304/1152/576?
b) The spreading_length of -1/-2/-3 is now 3/4/4
and simple averaging is done in the spreading function when not using advanced option -spread?
c) Noise threshold shift default of -1/-2/-3 is now -3.0/-1.5/-1.0?

What does -spread do?

[/size]a) Yes;
b) Yes;
c) Yes;

-spread carries out the spreading which varies with fft length. See code fragment in original thread.

lossyWAV Development

Reply #269 – 2007-10-11 11:56:27

Wonderful.

So at the moment we're left with guruboolez' problem where he could abx a harpsichord sample.

@guruboolez: Are you out there?
It would be great if you could give your sample another try with this new version.

lossyWAV Development

Reply #270 – 2007-10-11 12:17:18

@Halb27: Taking on board what you were saying about using Bark band width to determine how many bins to average, I will start to work out a new spreading option which does (inspired by one of J.M.Valin's papers).

lossyWAV Development

Reply #271 – 2007-10-11 12:29:58

I think you're heading down a slippery slope here!

First you'll find yourself averaging over 100 bins at the highest frequency, and before you know it you'll be implementing a proper psychoacoustic model to sort it all out!

Cheers,
David.

lossyWAV Development

Reply #272 – 2007-10-11 12:42:59

At least I don't think of a sophisticated implementation of this principle.

For the extreme cases may be a spreading_length of 1 at the low end, and a spreading_length of 5 at the high end, or something like that, or maybe even less variation. Depending on fft_length. The principle may be worth implementing for a low or moderate fft_length.

For quality reasons (my main concern at the moment) the low end is the critical range, as a spreading_length of 4 or even 3 may not be appropriate here in cases. So taking this into account may be essential.

Allowing a very large spreading_length for the high frequency range is another story and might allow for a lower bitrate on average while keeping up excellent quality. At the moment however I see this rather as an option for the future.

lossyWAV Development

Reply #273 – 2007-10-11 17:45:47

Quote from: Nick.C on 2007-10-10 23:25:20

Quote from: GeSomeone on 2007-10-10 23:19:02
BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. ...
Which version of lossyWAV was that? Recent versions default to no dither, so this problem should not happen unless you dither.
Nick.

It was v0.3.5 with -skew 7 and nothing else.
I don't see it as a real problem though, it is more like a side effect in combination with replaygain. But it seems to proof that even from silence bit's can be removed

Update: I am now convinced the dithering from the foobar2000 converted was to blame. Even though it was set to "only dither lossy sources" it seemed to have kicked in somewhere. (I marked lossFlac as lossy destination). Retesting with setting to "Never Dither" was OK. No extra noise.

lossyWAV Development

Reply #274 – 2007-10-11 18:47:00

Quote from: GeSomeone on 2007-10-11 17:45:47

Quote from: Nick.C on 2007-10-10 23:25:20
Quote from: GeSomeone on 2007-10-10 23:19:02
BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. ...
Which version of lossyWAV was that? Recent versions default to no dither, so this problem should not happen unless you dither.
Nick.

It was v0.3.5 with -skew 7 and nothing else.
I don't see it as a real problem though, it is more like a side effect in combination with replaygain. But it seems to proof that even from silence bit's can be removed

It's not the normal dither - silence and near-silence should be (and with the MATLAB script, are) transparent irrespective of system gain or dither chosen, because lossyFLAC won't touch silence - it won't even re-dither it.

Nick, did you have "always dither" set to on in that version?

Cheers,
David.

Notice