Help - Search - Members - Calendar
Full Version: lossyWAV Development
Hydrogenaudio Forums > Hydrogenaudio Forum > Uploads
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
halb27
QUOTE(Nick.C @ Oct 22 2007, 13:59) *

I am currently looking at what impact a spreading_function_length of 1 would have and how to implement it. It could be as simple as if FFT_length<256 then spreading_function_length=1. if 256 or 512 then 1,2,3,4. if 1024 or above then 2,3,4,5.

Wonderful, thank you. In case this brings bits to remove too much down there's still room for compromise especially for FFT_length < 256. Guess for the high frequency range spreading_length needs not be 1 even with short FFT lengths.
Nick.C
QUOTE(halb27 @ Oct 22 2007, 14:39) *

QUOTE(Nick.C @ Oct 22 2007, 13:59) *

I am currently looking at what impact a spreading_function_length of 1 would have and how to implement it. It could be as simple as if FFT_length<256 then spreading_function_length=1. if 256 or 512 then 1,2,3,4. if 1024 or above then 2,3,4,5.

Wonderful, thank you. In case this brings bits to remove too much down there's still room for compromise especially for FFT_length < 256. Guess for the high frequency range spreading_length needs not be 1 even with short FFT lengths.
I added a final table to the bottom of the spreadsheet which takes the max(1,int(log2(number_of_bins_in_critical_band_width))) - this yields a sensible starting point.
halb27
QUOTE(Nick.C @ Oct 22 2007, 15:56) *

I added a final table to the bottom of the spreadsheet which takes the max(1,int(log2(number_of_bins_in_critical_band_width))) - this yields a sensible starting point.

Fine, this table shows under what circumstances Width of Critical Band Width in FFT Bins is < 1 which is most critical IMO. IMO it should be >1 (better: >= 2), resp. spreading_length should be 1 in case 'Width of Critical Band Width in FFT Bins > 1' cannot be achieved.
This is with respect to where these requirements are not fulfilled at the moment. I'm not talking about making spreading length larger than 5 in the high frequency area with long FFTs though to a cautiously chosen extent this may be possible - especially for -2 and more so -3. This is something that can be considered later.
Nick.C
QUOTE(halb27 @ Oct 22 2007, 15:25) *
QUOTE(Nick.C @ Oct 22 2007, 15:56) *
I added a final table to the bottom of the spreadsheet which takes the max(1,int(log2(number_of_bins_in_critical_band_width))) - this yields a sensible starting point.

Fine, this table shows under what circumstances Width of Critical Band Width in FFT Bins is < 1 which is most critical IMO. IMO it should be >1 (better: >= 2), resp. spreading_length should be 1 in case 'Width of Critical Band Width in FFT Bins > 1' cannot be achieved.
This is with respect to where these requirements are not fulfilled at the moment. I'm not talking about making spreading length larger than 5 in the high frequency area with long FFTs though to a cautiously chosen extent this may be possible - especially for -2 and more so -3. This is something that can be considered later.
I see where you're coming from.

With respect to the matrix calculation mentioned earlier, please note the average bitrates for my 52 sample set, processed at quality level -2 with -SNR and -SKEW as the only other parameters.
CODE
BitRate   SNR=00  SNR=03  SNR=06  SNR=09  SNR=12  SNR=15  SNR=18  SNR=21  SNR=24  SNR=27  SNR=30
SKEW=00   468.4   468.4   468.4   468.4   468.4   469.2   471.4   476.2   483.2   494.7   508.7
SKEW=03   468.7   468.7   468.7   468.7   468.8   469.8   472.0   477.3   484.9   497.3   512.1
SKEW=06   468.9   468.9   468.9   468.9   469.0   470.3   472.8   478.5   486.9   499.9   515.5
SKEW=09   469.5   469.5   469.5   469.5   469.6   471.0   473.8   479.9   488.9   502.4   518.7
SKEW=12   470.1   470.1   470.1   470.1   470.2   471.8   474.9   481.4   491.1   505.1   522.1
SKEW=15   470.9   470.9   470.9   470.9   471.1   472.7   476.2   483.1   493.5   507.7   525.4
SKEW=18   471.9   471.9   471.9   471.9   472.1   473.9   477.6   484.8   495.9   510.2   528.7
SKEW=21   473.3   473.3   473.3   473.3   473.5   475.3   479.2   486.7   498.3   513.0   531.9
SKEW=24   475.2   475.2   475.2   475.2   475.4   477.0   481.3   488.9   500.9   515.6   535.1
SKEW=27   477.5   477.5   477.5   477.5   477.7   479.2   483.6   491.2   503.7   518.6   538.3
SKEW=30   480.5   480.5   480.5   480.5   480.6   482.0   486.4   494.0   506.6   521.7   541.6
halb27
So from this table a higher value of skew than usual so far isn't critical as long as the snr value isn't chosen very high.
We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.
So values up to say skew=21 or 24 and snr=18 are well acceptable IMO for -1 judging from your table.
(Sure I have headroom in mind for the variable spreading function modifications).
Nick.C
QUOTE(halb27 @ Oct 22 2007, 17:09) *

So from this table a higher value of skew than usual so far isn't critical as long as the snr value isn't chosen very high.
We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.
So values up to say skew=21 or 24 and snr=18 are well acceptable IMO for -1 judging from your table.
(Sure I have headroom in mind for the variable spreading function modifications).
I think that the higher skew values increase bitrate on some samples, but not all, e.g. Atem_Lied.

I have re-written the spread procedure and it is now prepared to accept spreading_function_lengths which vary with fft_length, although I have not yet nailed down the exact relationship between fft_length / bin frequency and spreading_function_length - that's a job for tomorrow. The price of the re-write is about 5% added to the process time.
halb27
I wouldn't care about the 5% added processing time.

Sure everbody is different, but as a first approximation I guess anybody who accepts the file size increase from ~ 200 kbps of a transform codec to ~ 450 kbps of this approach in favor of an expected extremely high quality doesn't care very much about encoding speed (which is a two stage process here anyway).
Though more speed is welcome everything is fine as long as processing time doesn't really hurt.
Nick.C
I've tried a first attempt at spreading which varies with every fft_length. Reference: FLAC=788.6kbps / 67.91MB

1st iteration: no averaging at 64 sample fft_length, -2 yields 619.6kbps / 53.36MB (64:1,1,1,1,1; 256:1,1,2,2,3; 1024:2,3,3,4,5).

2nd iteration : less conservative version, -2 yields 485.8kbps / 41.84MB (64:2,2,2,3,3; 256:2,2,3,3,4; 1024:2,3,3,4,5).

3rd iteration (64:2,2,2,2,2; 256:2,2,2,3,3; 1024:2,3,3,4,5) yields 510.3kbps / 43.95MB. This same iteration with "-nts 0" yields 491.7kbps / 42.35MB.

This in comparison with the current fixed spreading yields 470.2 kbps / 40.49MB.

I've decided to release the 3rd iteration as alpha v0.3.16 - attached. Superseded.
halb27
QUOTE(Nick.C @ Oct 23 2007, 21:01) *

I've tried a first attempt at spreading which varies with every fft_length. Reference: FLAC=788.6kbps / 67.91MB

When there is no averaging at 64 sample fft_length, -2 yields 619.6kbps / 53.36MB (64:1,1,1,1,1; 256:1,1,2,2,3; 1024:2,3,3,4,5).

A less conservative version (still more conservative than previous 2,3,3,4,5 for all fft_lengths) yields 485.8kbps / 41.84MB (64:2,2,2,3,3; 256:2,2,3,3,4; 1024:2,3,3,4,5).

Another iteration (64:2,2,2,2,2; 256:2,2,2,3,3; 1024:2,3,3,4,5) yields 510.3kbps / 43.95MB

This in comparison with the current fixed spreading yields 470.2 kbps / 40.49MB.

Thank you.
IMO this shows the routes that are not promising and those that are::

(64:1,1,1,1,1; 256:1,1,2,2,3; 1024:2,3,3,4,5): a lot too conservative. Probably due to spreading_lenth too short in the mid and high frequency range.

(64:2,2,2,3,3; 256:2,2,3,3,4; 1024:2,3,3,4,5): this or a variation of this is a promising candidate IMO for a -1 spreading length strategy.

Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge.

Just a question: What's your sample set? If it's regular music we should try to hold bitrate down. If it's problem samples we shouldn't care about bitrate going up. Ideally bitrate is kept rather low with regular music and increases significantly with problem samples (not necessarily individually but as classes of well- and bad-behaving samples).
Nick.C
QUOTE(halb27 @ Oct 23 2007, 20:46) *
Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge.

Just a question: What's your sample set? If it's regular music we should try to hold bitrate down. If it's problem samples we shouldn't care about bitrate going up. Ideally bitrate is kept rather low with regular music and increases significantly with problem samples (not necessarily individually but as classes of well- and bad-behaving samples).
Done - attached alpha v0.3.16b : 494.2kbps / 42.56MB. Superseded.

My sample set is:
CODE
04 - Black Sabbath - Iron Man.wav
06_florida_seq.wav
10 - Dungeon - The Birth- The Trauma Begins.wav
14_Track03beginning.wav
16_Track03entreaty.wav
18_Track04cakewithtea.wav
34_Gabriela_Robin___Cats_on_Mars.wav
41_30sec.wav
A02_metamorphose.wav
A03_emese.wav
Angelic.wav
annoyingloudsong.wav
aps_Killer_sample.wav
Atem_lied.wav
ATrain.wav
Bachpsichord.wav
badvilbel.wav
bibilolo.wav
BigYellow.wav
birds.wav
bruhns.wav
cricket__insect___edit_.wav
dither_noise_test.wav
E50_PERIOD_ORCHESTRAL_E_trombone_strings.wav
eig.wav
Furious.wav
glass_short.wav
harp40_1.wav
herding_calls.wav
jump_long.wav
keys_1644ds.wav
ladidada_10s.wav
Liebe_so_gut_es_ging.wav
Moon_short.wav
Poets_of_the_fall___Shallow.wav
rach_original.wav
rawhide.wav
Rush___Hold_Your_Fire___Turn_the_Page.wav
S13_KEYBOARD_Harpsichord_C.wav
S30_OTHERS_Accordion_A.wav
S34_OTHERS_GlassHarmonica_A.wav
S35_OTHERS_Maracas_A.wav
S53_WIND_Saxophone_A.wav
SeriousTrouble.wav
swarm_of_wasps__edit_.wav
thewayitis.wav
the_product.wav
triangle.wav
triangle_2_1644ds.wav
trumpet.wav
VELVET.wav
wait.wav
If you're worried about the low frequency range, use more -skew.....
halb27
QUOTE(Nick.C @ Oct 23 2007, 21:53) *

QUOTE(halb27 @ Oct 23 2007, 20:46) *
Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge.
...
Done - attached alpha v0.3.16b : 494.2kbps / 42.56MB.

Thank you. So as 494.2kbps is the result of (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5) I think that's very, very promising, and this is especially true as your sample set consists more or less of short problem samples.
With this in mind I guess it's even acceptable to go a bit more conservative (as a target for -1 when we're done), something like
(64:1,1,1,2,4; 256:1,1,2,3,4; 1024:1,3,3,4,5) - looking at your wonderful 'Width of Critical Band Width in FFT Bins' table more closely.

I'd love to go through my 51 regular song collection I used before with this setting, if you can provide such a version. BTW default for -skew and -snr is still 12 for each of these options?
Nick.C
QUOTE(halb27 @ Oct 23 2007, 21:23) *
QUOTE(Nick.C @ Oct 23 2007, 21:53) *
QUOTE(halb27 @ Oct 23 2007, 20:46) *
Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge.
...
Done - attached alpha v0.3.16b : 494.2kbps / 42.56MB.
Thank you. So as 494.2kbps is the result of (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5) I think that's very, very promising, and this is especially true as your sample set consists more or less of short problem samples.
With this in mind I guess it's even acceptable to go a bit more conservative (as a target for -1 when we're done), something like
(64:1,1,1,2,4; 256:1,1,2,3,4; 1024:1,3,3,4,5) - looking at your wonderful 'Width of Critical Band Width in FFT Bins' table more closely.

I'd love to go through my 51 regular song collection I used before with this setting, if you can provide such a version. BTW default for -skew and -snr is still 12 for each of these options?
Your wish is my command...... lossyWAV alpha v0.3.16c attached : 536.5 kbps / 46.20MB Superseded. Yes, -snr 12 -skew 12 is the default for all options. The spreading table is fixed (currently, this will change) for all quality levels.

Looking at the quality levels more carefully, maybe all 3 should use the 64/256/1024 sample fft analyses that -2 uses and the only other variables would be -snr, -skew, codec_block_size (512 samples for -3) and -nts.

Oh, and I realise that I was taking the lower of the min(min_result,average_result-snr) then adding the (negative) noise threshold shift to that. I've changed this to min(min_result+noise_threshold_shift,average_result-snr). Which will reduce bitrate slightly but not carelessly. lossyWAV alpha v0.3.16d attached : 527.8kbps / 45.45mB. Superseded, although default spreading is the same in v0.3.18.
halb27
QUOTE(Nick.C @ Oct 23 2007, 22:40) *

... lossyWAV alpha v0.3.16c attached : 536.5 kbps / 46.20MB. ...

Thank you. Appropriate result for your more-or-less problem sample set IMO. But behavior on regular music is important. I'll run this version on my regular music sample set tonight and will report tomorrow.

halb27
Results (average bitrate according to foobar) for my 50 (51 was wrong) regular song collection:

a) prior result I had a few weeks ago (don't remember the version but certainly with a fixed spreading_length of 4): 507 kbps.

b) result of 0.3.16d: 475 kbps.

c) For comparison result of 0.3.15: 425 kbps.

No special options specified.


So I think for -1 this is an adequate spreading length strategy (more exact: a good start. Fine tuning is necessary).

A closer look at fiocco, the sample guruboolez was on the edge to abx (his result was at least good enough to show that the fiocco quality should improve though it was very good already).
guruboolez' versions (guess it was 0.3.1 - but certainly a fixed spreading_length of 4 version) result: 436 kbps.
0.3.16d result: 507 kbps. So this makes the expectation reasonable that this way the small remaining problem is gone. Sure it is most welcome if guruboolez could confirm.
0.3.15 result for comparison: 472 kbps. Already a very good step into the right direction. Very remarkable moreover as average bitrate came down in general with switching from a fixed spreading length of 4 to the variable spreading length.

As for fine tuning:
Judging from what we got so far:
- if it's up to hold average bitrate low it is essential to keep spreading length relatively long at the high frequency edge. Luckily this can easily be done also with respect to the heuristic requirement that several bins (at least 1) should fall into each critical band.
- if it's up to hold up the heuristic requirement that several bins should fall into each critical band (as far as it's possible at all) it's essential to hold spreading length low (usually 1) at the low frequency edge. Luckily if done carefully this doen't seem to have an unacceptable impact on average bitrate.

So fine tuning (finding promising compromises) can be done with these considerations in mind considering the extreme ends, and especially with respect to the target that average bitrate of regular samples should be held low while it's welcome if it goes up with problem samples. Sure everything within the restricted possibilities we have.

I welcome most your idea to have a fixed fft analysis strategy (fft length of 64, 256, 1024) for any quality setting (as done with -2 so far).
Sufficient IMO and makes fine tuning a lot more easy:
For fine tuning purposes can you provide spreading length options of the kind:
-spreading64 11234
-spreading256 12334
-spreading1024 23345
or similar.
This way anybody can try to find a promising spreading length strategy.
I'd love to search for such strategies for -1, -2, -3, and I wouldn't have to bother you with building new lossyWav versions for whatever comes to my mind.
Nick.C
QUOTE(halb27 @ Oct 24 2007, 07:07) *
I welcome most your idea to have a fixed fft analysis strategy (fft length of 64, 256, 1024) for any quality setting (as done with -2 so far).
Sufficient IMO and makes fine tuning a lot more easy:
For fine tuning purposes can you provide spreading length options of the kind:
-spreading64 11234
-spreading256 12334
-spreading1024 23345
or similar.
This way anybody can try to find a promising spreading length strategy.
I'd love to search for such strategies for -1, -2, -3, and I wouldn't have to bother you with building new lossyWav versions for whatever comes to my mind.
I was thinking about this early this morning: it might be easier to implement a -spread parameter that takes a 15 character hexadecimal numeric input (would we ever exceed spreading_function_length=15?) and puts the results in the spreading_function table for each analysis length. This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). I'm very glad that the problem samples are improving while the average bitrate is not growing too much.

So, expect a new build with the possibility to use "-spread 112341233423345" to control the spreading function. Now, where's the cliParameter unit, I must rip it apart and rebuild it....... smile.gif

Okay, cliParameter unit duly ripped and rebuilt. There is an unexplained considerable slowdown of processing, but for evaluation of spreading functions it should be okay. lossyWAV alpha v0.3.17 attached. Superseded, slowdown "cured".

CODE
lossyWAV alpha v0.3.17 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      set noise_threshold_shift to n dB (-15dB<=n<=0dB, default=-1.5dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB, default=12dB)
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db, default=12dB)
              in the frequency range 20Hz to 3.45kHz
-spf <15hex>  manually input the 3 spreading functions as 3 x 5 hex characters;
              e.g. 444444444444444, default=111241123423345; Hex characters
              must be one of 1,2,3,4,5,6,7,8,9,A,B,C,D,E,F (zero excluded).
-o <folder>   destination folder for the output file
-clipping     disable clipping prevention by iteration; default=off
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
halb27
Excellent. Thank you.
Means I will have a lot of (interesting) work this evening.

QUOTE(Nick.C @ Oct 24 2007, 08:54) *

... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...

Sorry, I don't understand this. Can you please explain it a bit?
Nick.C
QUOTE(halb27 @ Oct 24 2007, 09:20) *
QUOTE(Nick.C @ Oct 24 2007, 08:54) *

... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...
Sorry, I don't understand this. Can you please explain it a bit?
Basically, you will need to input a 15 character hexadecimal string, regardless of how many analyses will actually be carried out at the specified quality level (-1 = 2048/1024/256/64 sample fft_length; -2 = 1024/256/64 sample fft_length; -3 = 1024/64 sample fft_length). What would happen is that the user always inputs 3 spreading functions and those three are mapped to 64, 256 and 1024 fft_length spreading. Then, copies are made into the spreading functions for 128, 512 and 2048 fft_length spreading functions.
halb27
QUOTE(Nick.C @ Oct 24 2007, 13:24) *

QUOTE(halb27 @ Oct 24 2007, 09:20) *
QUOTE(Nick.C @ Oct 24 2007, 08:54) *

... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...
Sorry, I don't understand this. Can you please explain it a bit?
Basically, you will need to input a 15 character hexadecimal string, regardless of how many analyses will actually be carried out at the specified quality level (-1 = 2048/1024/256/64 sample fft_length; -2 = 1024/256/64 sample fft_length; -3 = 1024/64 sample fft_length). What would happen is that the user always inputs 3 spreading functions and those three are mapped to 64, 256 and 1024 fft_length spreading. Then, copies are made into the spreading functions for 128, 512 and 2048 fft_length spreading functions.

I imagined it to be like that - just wanted to make sure.
In this case the user doesn't have full control of the spreading length for every fft length.
If for instance it turns out to be important for the 1024 bin fft that there is a 1 in the spreading like in (1,3,3,4,5), it would be so for a 2048 bin fft as well and might have a negative impact on bitrate.
There are dependancies which I'd prefer to see avoided.

I thought you wanted to be content with 3 analyses. So do you still think of using a fft length of 2048 for -1?
If yes I'd prefer a 20 character hex string covering all fft lengths used (64, 256, 1024, 2048) in this order, and you ignore the 256 and 2048 part if -3 is used resp. you ignore the 2048 part if -2 is used.
Nick.C
QUOTE(halb27 @ Oct 24 2007, 12:56) *
QUOTE(Nick.C @ Oct 24 2007, 13:24) *
QUOTE(halb27 @ Oct 24 2007, 09:20) *
QUOTE(Nick.C @ Oct 24 2007, 08:54) *
... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...
Sorry, I don't understand this. Can you please explain it a bit?
Basically, you will need to input a 15 character hexadecimal string, regardless of how many analyses will actually be carried out at the specified quality level (-1 = 2048/1024/256/64 sample fft_length; -2 = 1024/256/64 sample fft_length; -3 = 1024/64 sample fft_length). What would happen is that the user always inputs 3 spreading functions and those three are mapped to 64, 256 and 1024 fft_length spreading. Then, copies are made into the spreading functions for 128, 512 and 2048 fft_length spreading functions.
I imagined it to be like that - just wanted to make sure.
In this case the user doesn't have full control of the spreading length for every fft length.
If for instance it turns out to be important for the 1024 bin fft that there is a 1 in the spreading like in (1,3,3,4,5), it would be so for a 2048 bin fft as well and might have a negative impact on bitrate.
There are dependancies which I'd prefer to see avoided.

I thought you wanted to be content with 3 analyses. So do you still think of using a fft length of 2048 for -1?
If yes I'd prefer a 20 character hex string covering all fft lengths used (64, 256, 1024, 2048) in this order, and you ignore the 256 and 2048 part if -3 is used resp. you ignore the 2048 part if -2 is used.
Changed to 20 hexchar string, 128 & 512 fft_length removed. I do want to move to only 3 analyses, just don't want to upset anybody..... lossyWAV alpha v0.3.18 attached. Superdeded;
CODE
lossyWAV alpha v0.3.18 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      set noise_threshold_shift to n dB (-15dB<=n<=0dB, default=-1.5dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB, default=12dB)
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db, default=12dB)
              in the frequency range 20Hz to 3.45kHz
-spf <4x5hex> manually input the 4 spreading functions as 4 x 5 hex characters;
              e.g. 44444-44444-44444-44444, default=11124-11234-23345-34456;
              Hex characters must be one of 1 to 9 and A to F (zero excluded).
-o <folder>   destination folder for the output file
-clipping     disable clipping prevention by iteration; default=off
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
Test results for v0.3.18:

My 52 sample set: WAV: 121.53MB; FLAC: 68.2MB / 792.0kbps; -1: 46.42MB / 539.0kbps; -2: 45.45MB / 527.8kbps; -3: 38.88MB / 451.5kbps.

Guru's 150 sample set: WAV: 252.36MB; FLAC: 122.17MB / 683.2kbps; -1: 95.95MB / 536.5kbps; -2: 93.81MB / 524.6kbps; -3: 84.96MB / 475.1kbps.
halb27
Wonderful. Thank you.
GeSomeone
QUOTE(halb27 @ Oct 22 2007, 17:09) *

We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.

What I understood what SKEW was for, it is an "offset" to SNR to give the low freqs (where we would more easily discern noise) a better snr. (with a stretch you could call it a form of noise shaping)

So if you change SNR, this will impact the values where SKEW is applied too.

If I'm correct the effect on quality (==snr?) would be
- when you raise SKEW you (only) give better snr to the lower frequenties
- when you raise SNR and lower SKEW (at the same time) you (only) give the high freqs a better snr.

So choose where you want the extra quality... smile.gif or just vary the SNR.

BTW. Has anybody found that SKEW above 9 improves a problem sample?
halb27
QUOTE(GeSomeone @ Oct 24 2007, 20:24) *

QUOTE(halb27 @ Oct 22 2007, 17:09) *

We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.

What I understood what SKEW was for, it is an "offset" to SNR to give the low freqs (where we would more easily discern noise) a better snr. (with a stretch you could call it a form of noise shaping)

So if you change SNR, this will impact the values where SKEW is applied too.

If I'm correct the effect on quality (==snr?) would be
- when you raise SKEW you (only) give better snr to the lower frequenties
- when you raise SNR and lower SKEW (at the same time) you (only) give the high freqs a better snr.

So choose where you want the extra quality... smile.gif or just vary the SNR.

BTW. Has anybody found that SKEW above 9 improves a problem sample?

Well, the skew option is more meaningful to me than the snr option just because I have an imagination about the effect of skew (though I don't really know how useful it is), but I personally don't really understand the idea behind snr. Maybe Nick can help.
I personally accept that we are partially doing a bit of rather wild experimenting as long as this is done in a pretty conservative way that makes sure the very good quality already achieved.
I have liked the idea of skew as I have always seen too much averaging at the low frequency edge IMO. Now that this is gonna change due to variable spreading maybe the skew option will partially loose it's usefulness. For being conservative, especially with -1, however skew may still be welcome.
I also see snr in favor of conservatism, but because of lacking insight so far my heart is more with skew.
Let's see what will come out.
Nick.C
QUOTE(halb27 @ Oct 24 2007, 20:28) *
QUOTE(GeSomeone @ Oct 24 2007, 20:24) *
QUOTE(halb27 @ Oct 22 2007, 17:09) *
We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.
What I understood what SKEW was for, it is an "offset" to SNR to give the low freqs (where we would more easily discern noise) a better snr. (with a stretch you could call it a form of noise shaping)

So if you change SNR, this will impact the values where SKEW is applied too.

If I'm correct the effect on quality (==snr?) would be
- when you raise SKEW you (only) give better snr to the lower frequenties
- when you raise SNR and lower SKEW (at the same time) you (only) give the high freqs a better snr.

So choose where you want the extra quality... smile.gif or just vary the SNR.

BTW. Has anybody found that SKEW above 9 improves a problem sample?
Well, the skew option is more meaningful to me than the snr option just because I have an imagination about the effect of skew (though I don't really know how useful it is), but I personally don't really understand the idea behind snr. Maybe Nick can help.
I personally accept that we are partially doing a bit of rather wild experimenting as long as this is done in a pretty conservative way that makes sure the very good quality already achieved.
To me, -snr is a safety net that calculates the average of all the relevant fft bins and then deducts the value (default=12) to derive a threshold value. If the minimum result of the relevant fft bins is below the threshold value then the minimum result is used, if above then the threshold value is used. It is easily disabled with -snr 0.
GeSomeone
QUOTE(Nick.C @ Oct 24 2007, 20:36) *

To me, -snr is a safety net that calculates the average of all the relevant fft bins and then deducts the value (default=12) to derive a threshold value. If the minimum result of the relevant fft bins is below the threshold value then the minimum result is used, if above then the threshold value is used.

If that's all then they are not related, and I was wrong. I must be mixing up -SNR with some other noise threshold.
Nick.C
QUOTE(GeSomeone @ Oct 24 2007, 22:14) *
QUOTE(Nick.C @ Oct 24 2007, 20:36) *
To me, -snr is a safety net that calculates the average of all the relevant fft bins and then deducts the value (default=12) to derive a threshold value. If the minimum result of the relevant fft bins is below the threshold value then the minimum result is used, if above then the threshold value is used.
If that's all then they are not related, and I was wrong. I must be mixing up -SNR with some other noise threshold.
If you introduce a large -skew value then the minimum *may* be affected, but the average will definitely be affected as the fft results are skewed before the spreading / averaging is done.
Josef Pohm
Comparison of 0.3.18 and 0.3.15 on my SetF.

Bits to remove table.

CODE

------- -------------------- -------------------- --------------------
|      |       0.3.15       |       0.3.18       |      18 vs 15      |
|       ------ ------ ------ ------ ------ ------ ------ ------ ------
|      |   1  |   2  |   3  |   1  |   2  |   3  |   1  |   2  |   3  |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------
|  512 | 5,13 | 5,64 | 5,93 | 5,22 | 5,36 | 5,88 |  ,09 | -,28 | -,05 |
| 1024 | 4,88 | 5,25 | 5,48 | 4,84 | 4,93 | 5,44 | -,04 | -,32 | -,04 |
| 2048 | 4,48 | 4,93 | 5,17 | 4,40 | 4,52 | 5,11 | -,08 | -,41 | -,06 |
| 4096 | 4,11 | 4,55 | 4,78 | 3,91 | 4,05 | 4,71 | -,20 | -,50 | -,07 |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------


TAK 1.0.2b1 -p3m bitrate table (lossless 862kbps).
CODE

------- ----------------- ----------------- -----------------
|      |  TAK on 0.3.15  |  TAK on 0.3.18  |    18 vs 15     |
|       ----- ----- ----- ----- ----- ----- ----- ----- -----
|      |  1  |  2  |  3  |  1  |  2  |  3  |  1  |  2  |  3  |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
|  512 | 465 | 426 | 405 | 458 | 447 | 409 | - 7 |  21 |   4 |
| 1024 | 470 | 441 | 424 | 472 | 465 | 426 |   2 |  24 |   2 |
| 2048 | 492 | 457 | 439 | 498 | 488 | 443 |   6 |  31 |   4 |
| 4096 | 517 | 482 | 465 | 532 | 521 | 470 |  15 |  39 |   5 |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
halb27
Well, I got a lot of work left (will have to do it the day after tomorrow as I'll be busy tomorrow), but I can report about my first findings which I think show pretty much the way to go to a rather large extent.

I took 12 full tracks of regular music and 8 samples that are suspected to be problematic for LossyWav and went thru a lot of tests. Here's an extract which shows up the way to go in a rather consequent way:

I used only the new -spf parameter, so I drop the spreading values for 2048 here.

a) I started with 23345-23345-23345 as this was the reference setting for a long time yielding good results. => regular tracks: 425 kbps on average, problem tracks: 481 kbps.
Quite a good differentiation already.

b) With the critical band approach it's most vital to have a spreading length of 1 at the low frequency edge.
13345-13345-13345 => 428 kbps (regular) vs. 499 kbps (problems).
So obeying to the critical band principle is nearly for free here, and we get an improved differentiation regular vs. problems.

c) Looking at the next frequency range at the low edge spreading length should be 1 for FFT length=64 and 256, and can be up to 4 with a 1024 bin FFT.
11345-11345-13345 => 434 kbps (regular) vs. 512 kbps (problems).
A pretty acceptable bitrate increase IMO and an improved spread between regular and problematic tracks.
Using 2 instead 3 for the 1024 bin FFT provides nearly the same result (435 vs. 512 kbps).

d) For a FFT length of 64 spreading length should be 1 for the frequency range next lowest. This however increases bitrate significantly. Should only be done with -1 IMO.
So let's make compromise and use a spreading length of 2 here. With a FFT length of 64 in the next frequency range spreading length should be 3. So we got 11235 for the 64 bin FFT if we stick with 5 for the spreadinig length at the HF end.
With 256 FFT bins spreading length should be 3 for the 3.4...8.3 kHz range. With anything else left we arrive at 11235-11345-13345, and this yields 437 kbps (regular) vs. 515 kbps (problems).
Corresponds closely to c).

e) With those digits in spreading formula d) that are not bold we are free to do some variations on them trying for instance cautiously a rather long spreading length, especially at the high frequency edge, but - more cautiously - also with the 8.3..12.4 kHz range.
Remember changing spreading strategy to 23345 brought down bitrate significantly due to this small increase in spreading length at the HF end.
I'm just trying these things and will report about them. I think this is a good area of differentiation among the different quality modes.
Just a promising candidate for -2: 11235-11357-13379 => 416 kbps (regular music) vs. 512 kbps (problems).

Not bad, isn't it? I'll try to get it a bit more defensive for -2 while keeping these good properties to a large extent.



halb27
Spreading strategy for -2 settled:

11235-11336-1234D for 64-256-1024 FFT length.

Yields 420 kbps (regular music) resp. 514 kbps (problem samples) on average with my sample sets.

This is roughly the same bitrate as that of v0.3.15 (using 23345-23345-23345), but with a significantly improved security against problems.
Up to 12.4 kHz the spreading length is lower than or equal to that of the v0.3.15 strategy.
For the 12.4+ kHz range and an FFT length >=256 the longer spreading length shouldn't be an issue with so many bins in this range (each of the averages covers only a small frequency range). Moreover our ears' sensitivity drops quicikly in this area, and this is especially true for our sensitivity towards noise which peaks at around 6 kHz.

Just a bit strange looking at an FFT length of 1024 and the 12.4+ kHz range:
If I lower the 'D' to '5', average bitrate for regular music increases to 436 kbps.
So noise behavior in the 12.4+ kHz range has an influence on deciding between '5' and 'D'.
But that's a bit of a contradiction towards the fact that the 23345 setting yields a bitrate of 425 kbps.
I know these things can happen in a world of averages, but this behavior is a bit strong and I wonder whether there may be another issue causing it.

Anyway I suggest to use 11235-11336-1234D-1245F as an internal default (with 1245F having to be refined later).

I'll find a spreading strategy for -3 next.

Thanks, Nick, for your wisdom of using HEX values for the spreading length. When I was thinking about a spreading length parameter I had only spreading lengths of up to 9 in mind.

@Josef Pohm: Do you mind trying your setF with option -spf 11235-11336-1234D-1245F (just quality -2)?

@Nick: As I'll be working with -3 for the first time the codec block size question comes to me. Guess from your and Josef Pohm's results for -3 it makes sense to use a smaller codec blocksize. As I'm using FLAC guess 576 is the most welcome blocksize for -3.
How I can achieve this?
Brings back the question of blocksize control how to. For experimenting your former codec blocksize option wasn't bad.
But we can go a bit more into the final direction I think. Could be something like:
Default blocksize without special codec options (like -tak): 1024 as a general default (current behavior).
-tak behavior: 1024 for -1 and -2, 512 for -3.
-flac behavior: 1024 for -1 and -2, 576 for -3.
-wv behavior: IIRC David Bryant said wavPack doesn't like small blocksizes. Ideally he can say what's best. Right now I think we should just stick to the default blocksize of 1024.
2Bdecided
In regular lossless coding, the choice of most efficient block length depends on the content.

The same is true of lossy/lossless coding, but the sweet spot is probably a shorter block length.

Unless some adaptive block length switching is employed (I don't suggest this!) then the optimum block length should be judged on a wide range of content, and possibly judged on different genres separately and the results published.

With the block length tied to the encoding quality pre-set, you risk the bizarre (though possibly inevitable) situation of certain content giving a higher bitrate at lower quality, because the short block length is inappropriate for that content.

Just a thought. I don't have an answer!

Cheers,
David.
halb27
QUOTE(2Bdecided @ Oct 26 2007, 11:50) *

... With the block length tied to the encoding quality pre-set, you risk the bizarre (though possibly inevitable) situation of certain content giving a higher bitrate at lower quality, because the short block length is inappropriate for that content. ...

Hopefully there will be progress for a long time covering more and more special situations, but at the moment I'm very content if we'll arrive at a very good average bitrate.
I wouldn't care much about certain 'unnatural' bitrate increases as long it's restricted and as long as average bitrate is good.
I am more worried about 'bizarre' quality drops, that's why I didn't want to consider a lower block size than 1024 until recently. But I think with -3 it's okay. On one hand I don't see a real a priori danger that we'll run into trouble, and on the other hand -3 users do want relatively low bitrate while keeping up excellent quality - but they accept that they expose their encodings a bit more to the risk that quality is suboptimal. Within this framework to me it's okay to use a blocksize in the 5xx range for -3.
Josef Pohm
QUOTE

@Josef Pohm: Do you mind trying your setF with option -spf 11235-11336-1234D-1245F (just quality -2)?

Great work Halb! Comparison of 0.3.18-Halb and 0.3.18 on my SetF. While I was at it, I tried your settings also on -1 and -3.

Bits to remove table.
CODE

------- -------------------- -------------------- --------------------
|      |       0.3.18H      |       0.3.18       |     18H vs 18      |
|       ------ ------ ------ ------ ------ ------ ------ ------ ------
|      |   1  |   2  |   3  |   1  |   2  |   3  |   1  |   2  |   3  |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------
|  512 | 5,69 | 5,92 | 6,21 | 5,22 | 5,36 | 5,88 | 0,47 | 0,56 | 0,33 |
| 1024 | 5,40 | 5,55 | 5,80 | 4,84 | 4,93 | 5,44 | 0,56 | 0,62 | 0,36 |
| 2048 | 4,99 | 5,19 | 5,46 | 4,40 | 4,52 | 5,11 | 0,59 | 0,67 | 0,35 |
| 4096 | 4,57 | 4,78 | 5,07 | 3,91 | 4,05 | 4,71 | 0,66 | 0,73 | 0,36 |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------


TAK 1.0.2b1 -p3m bitrate table (lossless 862kbps).
CODE

------- ----------------- ----------------- -----------------
|      |  TAK on 0.3.18H |  TAK on 0.3.18  |    18H vs 18    |
|       ----- ----- ----- ----- ----- ----- ----- ----- -----
|      |  1  |  2  |  3  |  1  |  2  |  3  |  1  |  2  |  3  |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
|  512 | 423 | 406 | 385 | 458 | 447 | 409 | -35 | -41 | -24 |
| 1024 | 430 | 418 | 401 | 472 | 465 | 426 | -42 | -47 | -25 |
| 2048 | 453 | 437 | 418 | 498 | 488 | 443 | -45 | -51 | -25 |
| 4096 | 481 | 465 | 443 | 532 | 521 | 470 | -51 | -56 | -27 |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----


QUOTE(2Bdecided @ Oct 26 2007, 11:50) *

... With the block length tied to the encoding quality pre-set, you risk the bizarre (though possibly inevitable) situation of certain content giving a higher bitrate at lower quality, because the short block length is inappropriate for that content. ...


I had a short test session on that matter in the early days of LossyFLAC.

From my post here a frame size of 512 (and also 256) SEEMS to offer better compression ratios for all codecs but wavpack. That said, WavPack SEEMS to work well with a frame size of 1024, where it performs, in any case, slightly better than Flac.

Frame size of 128, on the other hand, SEEMS to result in generalized loss of compression performance for all codecs.

Moreover, David Bryant unveiled here a couple of quite promising news for possible further optimizations...

So I agree on using a frame size 1024/512 for TAK, 1152/576 for FLAC (though -1:1024/-2:512/-3:256 may also be tempting, even if in the past I heard use of smaller frames is not considered completely safe) and to better clarify WV status.
halb27
Thanks a lot. Looks good.

So you suggest a codec blocksize of 1152/576 for FLAC.
Does anybody see a problem in that this is not in correspondance with the FFT lengths?
Josef Pohm
QUOTE(halb27 @ Oct 26 2007, 18:28) *

...So you suggest a codec blocksize of 1152/576 for FLAC...

No, sorry for I wasn't clear enough, but I didn't mean that. Actually I don't have an ultimate opinion whether to go for <1152;576>, <1024;512> or a mixed solution, concerning FLAC.

I only meant I agree that [1152 (or 1024) for <-1;-2>] and [576 (or 512) for <-3>], should be okay for FLAC. I wanted to keep it simple and ended up being inaccurate. Sorry again.
halb27
No problem, thank you for clarification.
halb27
Spreading strategy for -3 settled:

11236-1246E for 64-1024 FFT length.

Yields 390 kbps (regular music) resp. 493 kbps (problem samples) on average with my sample sets (using FLAC with a blocksize of 1024).

Quite remarkable is the difference of 103 kbps between regular and problematic samples.
This is more than the 94 kbps difference of the -2 setting I found. So maybe in combination with a more defensive value for -nts or another option this setting may be a better basis for -2. Will try later when I have found out more about the other options.

Everbody who wants to try this -3 setting may use the options:-3 -spf 11236-FFFFF-1246D-FFFFF.

Before finding an adequate setting for -1 I will try to analyze the effects of -skew and -snr.
My regular and problematic sample sets seem to be quite adequate to find out about differentiating behavior of option values in this respect.

As I said before my heart is pretty much with skew, but after having thought about your remark, Nick, that -snr strengthens the effect of skew I'm curious learning about the behavior of both of these options.
Mitch 1 2
Out of curiosity, I processed a whole album with lossyWAV, and encoded it to Windows Media Audio 9.2 Lossless (WMALSL).
For comparison, I used FLAC -5 (default), and used lossyWAV -spf 11235-11336-1234D-1245F with both codecs.

CODE
Comparison of FLAC and WMALSL, with and without lossyWAV pre-processing

Format      | Total Size        | % of WAV Size | % of Unpreprocessed Size | Avg. Bitrate
        WAV | 691 905 184 bytes | 100.00        |                          | 1411 kbps
       FLAC | 384 957 786 bytes | 55.64         |                          |  785 kbps
  lossyFLAC | 233 040 957 bytes | 33.68         | 60.54                    |  475 kbps
     WMALSL | 373 569 806 bytes | 53.99         |                          |  822 kbps
lossyWMALSL | 208 287 236 bytes | 30.10         | 55.76                    |  448 kbps


Clearly, WMA Lossless benefits significantly from lossyWAV pre-processing.
shadowking
QUOTE(Mitch 1 2 @ Oct 27 2007, 17:03) *

Out of curiosity, I processed a whole album with lossyWAV, and encoded it to Windows Media Audio 9.2 Lossless (WMALSL).
For comparison, I used FLAC -5 (default), and used lossyWAV -spf 11235-11336-1234D-1245F with both codecs.

CODE
Comparison of FLAC and WMALSL, with and without lossyWAV pre-processing

Format      | Total Size        | % of WAV Size | % of Unpreprocessed Size | Avg. Bitrate
        WAV | 691 905 184 bytes | 100.00        |                          | 1411 kbps
       FLAC | 384 957 786 bytes | 55.64         |                          |  785 kbps
  lossyFLAC | 233 040 957 bytes | 33.68         | 60.54                    |  475 kbps
     WMALSL | 373 569 806 bytes | 53.99         |                          |  822 kbps
lossyWMALSL | 208 287 236 bytes | 30.10         | 55.76                    |  448 kbps



Clearly, WMA Lossless benefits significantly from lossyWAV pre-processing.


Nice. Good work Nickc, halb27, 2bdecided and everyone else involved.
halb27
I finished my analysis of -skew and -snr:

Results for -3 -spf 11236-FFFF-1246E-FFFF -skew x -snr y (encoded with FLAC using a blocksize of 1024)
Results are given as bitrate in kbps for regular / for problematic tracks:

CODE

        | -skew 0 | -skew 12| -skew 20| -skew 24| -skew 36
-snr 0  |389 / 483|390 / 493|393 / 504|398 / 514|435 / 551
-snr 12 |         |390 / 493|         |398 / 514|435 / 551
-snr 24 |         |397 / 500|         |407 / 524|439 / 560

Looking at the first row (-snr 0):
Nick's default -skew 12 yields a significant security margin practically for free!
-skew 20 increases it, and it's still more or less for free.
From roughly -skew 24 on there's a price to pay: bitrate of the problematic samples increases, but so does the bitrate for regular music. The relation is still favorable at around -skew 24, but we're starting getting diminishing returns concerning the relation of the bitrate increase of problematic vs. regular tracks.

Looking at the other rows:
-snr 12 yields the same results as -snr 0 and thus is not interesting.
Loooking at -snr 24 there is something going on. Roughly speaking however it's more of a general bitrate increase as can be achieved more directly via -nts. It's not exactly true with -skew 36 -snr 24 where bitrate increase is higher for the problematic tracks, but -skew 36 isn't very interesting (see below).

-skew has an astoshing effect on security, and it's more or less for free (for -skew <~ 24).
However we have to face the fact that it covers improved security only in the frequency range below 3.5 kHz (and most of the effect goes into the 1.5- kHz region).
So IMO it wouldn't be a balanced strategy to use a large -skew value. We would pay for benefits restricted to this frequency area. It's okay to pay a little bit, but if we want to pay much, IMO we should do it more generally (use a more defensive -nts value).

I thought -3 is based on a codec blocksize of 1024 but I was wrong: it's 512. So it's wise to use this blocksize with FLAC as well.
For -2 of course I used FLAC wirh a blocksize of 1024.

So my final settings for -2 and -3 and the results for my test sets are:

-3 -spf 11236-FFFFF-1246E-FFFFF -skew 24 -snr 0 => 386 kbps (regular music) / 508 kbps (problem samples)

-2 -spf 11235-11336-1234D-FFFFF -skew 24 -snr 0 => 426 kbps (regular music) / 534 kbps (problem samples)

Now that we've reached the 3xx kbps region hopefully some nice guys come up and do some listening tests.
It's not about just problem samples, also regular music may be harmed by our rather simple method when in the 3xx kbps range.

I'll work on the -1 setting within the next days, but first will give me some rest.

Edited: -skew 20 changed to -skew 24 in the final setting. IMO that's better relation security vs. price.
halb27
Spreading strategy for -1 settled.

To put everything in one place:

-1 -spf 11124-11225-11236-12347 -skew 24 -snr 0 (yielding 488 / 560 kbps on avg. for my regular resp. problem sample set)
-2 -spf 11235-11336-1234D-FFFFF -skew 24 -snr 0 (yielding 426 / 534 kbps on avg. for my regular resp. problem sample set)
-3 -spf 11236-FFFFF-1246E-FFFFF -skew 24 -snr 0 (yielding 386 / 508 kbps on avg. for my regular resp. problem sample set)

Even -1 yields a bitrate below 500 kbps (with my set).

Looks pretty well graduated with respect to resulting bitrate as well as the degree to which average building is done defensively within the 5 frequency regions obeying the critical band criterion.

Nick, what do you think about putting this into fixed software and leave the -nts option as the only quality related option for the user? Not right now but after a certain time giving room for improvement.
Nick.C
QUOTE(halb27 @ Oct 27 2007, 22:23) *

Spreading strategy for -1 settled.

To put everything in one place:

-1 -spf 11124-11225-11236-12347 -skew 24 -snr 0 (yielding 488 / 560 kbps on avg. for my regular resp. problem sample set)
-2 -spf 11235-11336-1234D-FFFFF -skew 24 -snr 0 (yielding 426 / 534 kbps on avg. for my regular resp. problem sample set)
-3 -spf 11236-FFFFF-1246E-FFFFF -skew 24 -snr 0 (yielding 386 / 508 kbps on avg. for my regular resp. problem sample set)

Even -1 yields a bitrate below 500 kbps (with my set).

Looks pretty well graduated with respect to resulting bitrate as well as the degree to which average building is done defensively within the 5 frequency regions obeying the critical band criterion.

Nick, what do you think about putting this into fixed software and leave the -nts option as the only quality related option for the user? Not right now but after a certain time giving room for improvement.
I don't know - my home broadband goes down on Friday morning, I have no access to the internet for 3 days - and all hell breaks loose on the thread!!! wink.gif

@Halb27 - Many thanks for the *extensive* testing to get the spreading function parameters fixed. I will implement your latest as default (including -skew 24).

@Mitch 1 2 - Excellent find! Should extend the userbase of David's method......

As an aside, I noticed a bug in v0.3.18: -snr was not working correctly. I have amended and will post.

In the interim, I've been playing with assembler and have optimised the code somewhat, so it should run faster. I only have Intel C2D platforms for testing, so (selfishly?) the optimisations are for this chip.
halb27
Hi Nick,

I've really worried what has happened to you as you've always been so busy with this thread and we haven't heard of you for so long. Just an internet breakdown - not too bad giving place for other things to do.

Well, as -snr wasn't correctly in place with v0.3.18 I'll try again -skew/-snr combinations as soon as you can provide a fixed version.
Nick.C
QUOTE(halb27 @ Oct 29 2007, 09:03) *
Hi Nick,

I've really worried what has happened to you as you've always been so busy with this thread and we haven't heard of you for so long. Just an internet breakdown - not too bad giving place for other things to do.

Well, as -snr wasn't correctly in place with v0.3.18 I'll try again -skew/-snr combinations as soon as you can provide a fixed version.
lossyWAV alpha v0.3.19 attached: Superseded; faster, -snr now working "correctly", -spf now allows 1..9;A..Z input to allow up to 35 bin averaging(!).

Having no broadband at home is really tedious......

[edit] My 52 sample set (default parameters other than -1, -2 & -3): WAV: 121.53MB; FLAC: 68.2MB / 791.9kbps; -1: 50.15MB / 582.3kbps; -2: 44.09MB / 512kbps; -3: 39.5MB / 458.7kbps [/edit]
halb27
Looking at your result I guess you include already the -spf values for -1, -2, -3 which I found.

Your sample set is to a large extent a set of hard samples. I think for bitrate consideration it is good to have a hopefully representive set of full length tracks from your collection on one hand, and a set of sample tracks supposed to require a very high bitrate on the other hand.
As you can consider your 52 sample set to be more or less a set of the second kind, an additional set with regular music would be most welcome IMO. Bitrate of this set will be considerably lower.
Nick.C
QUOTE(halb27 @ Oct 29 2007, 11:44) *
Looking at your result I guess you include already the -spf values for -1, -2, -3 which I found.

Your sample set is to a large extent a set of hard samples. I think for bitrate consideration it is good to have a hopefully representive set of full length tracks from your collection on one hand, and a set of sample tracks supposed to require a very high bitrate on the other hand.
As you can consider your 52 sample set to be more or less a set of the second kind, an additional set with regular music would be most welcome IMO. Bitrate of this set will be considerably lower.
Yes, I forgot to mention that the revised -spf defaults are per your testing. I will start to transcode a selection from my archive and revert.
Nick.C
Following testing of alpha v0.3.19 on a few albums:
CODE
lossyWAV alpha v0.3.19

| Artist - Album                                    | Lossless; FLAC -8 |    -2; FLAC -8    |    -3; FLAC -8    |

| AC-DC - Dirty Deeds Done Dirt Cheap               |  220MB / 781 kbps |  122MB / 435 kbps |  112MB / 399 kbps |
| B-52's - Good Stuff                               |  398MB / 993 kbps |  184MB / 459 kbps |  169MB / 423 kbps |
| China Crisis - Flaunt The Imperfection            |  238MB / 774 kbps |  132MB / 431 kbps |  121MB / 394 kbps |
| Chris Isaak - Chris Isaak                         |  227MB / 878 kbps |  114MB / 441 kbps |  104MB / 404 kbps |
| Climie Fisher - Everything                        |  308MB / 910 kbps |  149MB / 440 kbps |  137MB / 406 kbps |
| Dave Stewart and the Spiritual Cowboys - Honest   |  330MB / 835 kbps |  172MB / 436 kbps |  157MB / 397 kbps |
| Fish - From The Mirror                            |  274MB / 854 kbps |  136MB / 425 kbps |  125MB / 390 kbps |
| Gary Moore - Out In The Fields (The Very Best Of) |  495MB / 976 kbps |  226MB / 447 kbps |  208MB / 412 kbps |
| Gerry Rafferty - City To City                     |  307MB / 802 kbps |  165MB / 431 kbps |  150MB / 392 kbps |
| Iron Maiden - Can I Play With Madness             |  206MB / 784 kbps |  118MB / 45O kbps |  110MB / 419 kbps |
| Jean Michel Jarre - Oxygene                       |  219MB / 773 kbps |  143MB / 506 kbps |  130MB / 459 kbps |
| Marillion - Real to Reel (Live)                   |  305MB / 821 kbps |  172MB / 464 kbps |  158MB / 425 kbps |
| Mike Oldfield - Discovery                         |  237MB / 804 kbps |  129MB / 438 kbps |  118MB / 399 kbps |
| Mike Oldfield - QE2                               |  243MB / 855 kbps |  133MB / 469 kbps |  121MB / 425 kbps |
| Scorpions - Best of Rockers'N'Ballads             |  451MB / 922 kbps |  225MB / 460 kbps |  209MB / 428 kbps |
| The Shamen - Boss Drum                            |  433MB / 922 kbps |  220MB / 470 kbps |  202MB / 431 kbps |
| Van Morrison - Astral Weeks                       |  255MB / 757 kbps |  148MB / 440 kbps |  136MB / 404 kbps |
| Voice of the Beehive - Honey Lingers              |  213MB / 938 kbps |   99MB / 434 kbps |   92MB / 402 kbps |

| Average                                           | 5369MB / 863 kbps | 2796MB / 449 kbps | 2567MB / 412 kbps |
user
Congratulations to your great development of all the nice people involved !

Cool to see these results and the team spirit !

(though on a side note, I question myself, if I will apply and try out it in future, if i should get a flac capable device or if many people will use it, and not only some tech HA experienced.
As it would mean another transcoding or parallel encoding step and additional space for storage, as the true lossless is kept anyways, as i think (by myself also), that people interested that in quality to consider using 400 kbit/s, they have true Loslsess interest anyways.
Though it is possible to lower Lossless bitrates by 50% from eg. 860 kbit/s down to 400-450 kbit/s now, it isn't lossless anymore, and still way above eg. the "standard 320 kbit/s bitrate" which is considered by most either as overkill or as already nearly transparent in most cases dependent on the codec and the point of views. <-- uu, long sentence.
I think, lossy wavpack eg. could have similar bitrates and probably same transparency at these bitrates (as lossy wavpack is tested down to only ca. 200 kbit/s). Nevertheless only tech experienced guys, even only few at HA, use lossy wavpack (or other modern codecs at highest quality settings, consider ogg, mpc, aac at such bitrates).
Of course, for flac it is interesting due to increasing hardware/portable support to offer a space limit orientated bitrate solution. (well, still, who uses seriously eg. 320 k mp3 for portable usage?)
For home HiFi usage, you have nearly unlimited space due to big and quite cheapo HDs or DVD+-R as even cheaper storage, so it doesn't matter really if the bitrate is 400-450 or averaged between 800-1000 like for Lossless (flac).)
halb27
QUOTE(Nick.C @ Oct 29 2007, 19:04) *

Following testing of alpha v0.3.19 on a few albums:
CODE
lossyWAV alpha v0.3.19
...
| Artist - Album                                    | Lossless; FLAC -8 |    -2; FLAC -8    |    -3; FLAC -8    |
| Average                                           | 5369MB / 863 kbps | 2796MB / 449 kbps | 2567MB / 412 kbps |


Nice results - though I'm a bit disapointed about the -3 result which I had expected to have of lower bitrate.

Should we try to go a bit deeper in bitrate with -3?
But maybe once I was busy a lot with -3 I'm sporting too much to achieve a bitrate below 400 kbps on average with regular music.

Feedback welcome.
halb27
QUOTE(user @ Oct 29 2007, 19:25) *

... As it would mean another transcoding or parallel encoding step and additional space for storage, as the true lossless is kept anyways, as i think (by myself also), that people interested that in quality to consider using 400 kbit/s, they have true Loslsess interest anyways. ...

The practical use of this procedure is certainly limited to rather few people. mp3, vorbis, aac or mpc make nearly everbody happy for portable use or even for home hifi use. For archiving purposes storage technology is thus that most of us can afford archiving lossless. But there are niches where people might find it useful. I personally want to use it as a space saving alternative to a lossless codec on my DAP. With my 40 GB DAP and selective collection I can afford using a codec which requires an average bitrate in the 400 kbps range. Using it this way can be done right now. Another interest may be to use the -1 quality level for archiving which can be useful even today for owners of huge musical collections. In this case however it may be wise to wait until some more feedback is available regarding quality.

As for that everybody is highly welcome to share practical experience. Using -3 I guess there is a chance to prove the current state of this approach as worth improving.
Nick.C
QUOTE(halb27 @ Oct 29 2007, 19:05) *
QUOTE(Nick.C @ Oct 29 2007, 19:04) *
Following testing of alpha v0.3.19 on a few albums:
CODE
lossyWAV alpha v0.3.19
...
| Artist - Album                                    | Lossless; FLAC -8 |    -2; FLAC -8    |    -3; FLAC -8    |
| Average                                           | 5369MB / 863 kbps | 2796MB / 449 kbps | 2567MB / 412 kbps |

Nice results - though I'm a bit disapointed about the -3 result which I had expected to have of lower bitrate.

Should we try to go a bit deeper in bitrate with -3?
But maybe once I was busy a lot with -3 I'm sporting too much to achieve a bitrate below 400 kbps on average with regular music.

Feedback welcome.
User error I'm afraid - I forgot to FLAC recode at -b 512...... amended results as follows:
CODE

lossyWAV alpha v0.3.19

| Artist - Album                                    | Lossless; FLAC -8 |    -2; FLAC -8    |    -3; FLAC -8    |

| AC-DC - Dirty Deeds Done Dirt Cheap               |  220MB / 781 kbps |  122MB / 435 kbps |  110MB / 391 kbps |
| B-52's - Good Stuff                               |  398MB / 993 kbps |  184MB / 459 kbps |  162MB / 404 kbps |
| China Crisis - Flaunt The Imperfection            |  238MB / 774 kbps |  132MB / 431 kbps |  117MB / 382 kbps |
| Chris Isaak - Chris Isaak                         |  227MB / 878 kbps |  114MB / 441 kbps |  101MB / 392 kbps |
| Climie Fisher - Everything                        |  308MB / 910 kbps |  149MB / 440 kbps |  131MB / 387 kbps |
| Dave Stewart and the Spiritual Cowboys - Honest   |  330MB / 835 kbps |  172MB / 436 kbps |  152MB / 385 kbps |
| Fish - From The Mirror                            |  274MB / 854 kbps |  136MB / 425 kbps |  120MB / 377 kbps |
| Gary Moore - Out In The Fields (The Very Best Of) |  495MB / 976 kbps |  226MB / 447 kbps |  202MB / 400 kbps |
| Gerry Rafferty - City To City                     |  307MB / 802 kbps |  165MB / 431 kbps |  147MB / 383 kbps |
| Iron Maiden - Can I Play With Madness             |  206MB / 784 kbps |  118MB / 45O kbps |  106MB / 405 kbps |
| Jean Michel Jarre - Oxygene                       |  219MB / 773 kbps |  143MB / 506 kbps |  127MB / 450 kbps |
| Marillion - Real to Reel (Live)                   |  305MB / 821 kbps |  172MB / 464 kbps |  154MB / 414 kbps |
| Mike Oldfield - Discovery                         |  237MB / 804 kbps |  129MB / 438 kbps |  115MB / 390 kbps |
| Mike Oldfield - QE2                               |  243MB / 855 kbps |  133MB / 469 kbps |  118MB / 416 kbps |
| Scorpions - Best of Rockers'N'Ballads             |  451MB / 922 kbps |  225MB / 460 kbps |  203MB / 415 kbps |
| The Shamen - Boss Drum                            |  433MB / 922 kbps |  220MB / 470 kbps |  190MB / 405 kbps |
| Van Morrison - Astral Weeks                       |  255MB / 757 kbps |  148MB / 440 kbps |  133MB / 395 kbps |
| Voice of the Beehive - Honey Lingers              |  213MB / 938 kbps |   99MB / 434 kbps |   88MB / 385 kbps |

| Average                                           | 5369MB / 863 kbps | 2796MB / 449 kbps | 2484MB / 399 kbps |
halb27
Wonderful. Something like this is what I expected.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.