Help - Search - Members - Calendar
Full Version: lossyWAV Development
Hydrogenaudio Forums > Hydrogenaudio Forum > Uploads
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
halb27
QUOTE(Nick.C @ Jan 10 2008, 21:47) *

... at present only 2 [edit2] 1024 sample FFT[/edit2] analyses are carried out on a 512 sample codec_block -512:511 and 0:1023. This gives 50% overlap over the length of the file. ...

As we once decided to have an overlapping of<more than 50% the window length it would be good to have an improvement here. I remember my proposal of using these windows: -448:575, -64:959.
I think overlapping is good as is coverage of the edges. What do you think?
Nick.C
QUOTE(halb27 @ Jan 10 2008, 21:54) *
QUOTE(Nick.C @ Jan 10 2008, 21:47) *
... at present only 2 [edit2] 1024 sample FFT[/edit2] analyses are carried out on a 512 sample codec_block -512:511 and 0:1023. This gives 50% overlap over the length of the file. ...
As we once decided to have an overlapping of<more than 50% the window length it would be good to have an improvement here. I remember my proposal of using these windows: -448:575, -64:959.
I think overlapping is good as is coverage of the edges. What do you think?
My most recent speedup is reliant on 50% overlap either side of the codec_block. Adding in the extra analysis gives: -512:511; -256:767 & 0:1023 - at no speed penalty compared to v0.6.4. Any other overlap would not give even coverage - look at what happens with adjacent codec_blocks and plot the FFT lengths....
Alex B
Let's see if I can do some testing tomorrow. As we know, trying to test codecs at this quality level is exhausting.

Hearing a small pitch change is like a visual experience. One tiny bit of sound ends at a bit higher "position" than the other. If you lose concentration for a second you are out and it may take a while before you can hear the difference again.

As far as I understand, the problem may well be caused by small differences in the reproduction of the highest harmonics.

Edit: typo
shadowking
This track is compressed rock. Its very strange that one would abx it because all the instruments and vocals are going at it at once. With wavpack (should be similar with other hybrids) those were the hardest to abx.
halb27
The problem may be small but so far we should consider it a pitch problem to be solved.
I couldn't sleep this night and so I could think about it a lot.
I constructed the error file last night and listened to it (and looked at it with a wave editor).
I am convinced that the primary problem isn't caused by the noise level being too high. When listening to the error file what's most annoying is not the noise itself but the fluctuation in noise. Especially at the blocks' edges this fluctuation can form a strong transient.

I was a bit sceptical before about this abrubt noise level change with respect to the anti clipping strategy. But that's too short sighted. We do have this potential problem whenever there's a strong change in bits to remove from one block to the next.

To work against this we should take care that bits to remove changes only 1 bit at the blocks' edges. If for a sequence of 10 blocks bits to remove is 1, and for the next 10 blocks bits to remove is 8, we should not immediately go from 1 bits to remove to 8 bits to remove, but do it gradually, so the bits to remove in the 20 blocks is 1,1,1,1,1,1,1,1,1,1,2,3,4,5,6,7,8,8,8,8. If bits to remove of the first 10 blocks is 8 and the next 10 blocks is 1, bits to remove should be 8,8,8,8,7,6,5,4,3,2,1,1,1,1,1,1,1,1,1,1. Unfortunately this means having potentially to work on past blocks so this means buffering and deferred output.

I think we should do it this way for -2 and -1.
For -3 the number of intermediate steps with their restricted advantage of the removed bits should be lowered IMO. For -3 I think we can allow for a stepsize of 2 bits to remove when going from one block to the next. But we should do it in a way that the error level never has an immediate change of 2 bits to remove. We can easily do this by changing bits to remove by 1 bit for the first 256 samples in the block and another 1 bit for the last 256 samples. By just looking at 1 block this doesn't bring a compression improvement compared to change bits to remove by 1 for the entire block. The advantage is in the fact that we have roughly half of the intermediate blocks. So going from 1 bit to remove to 8 bits to remove as in the sample above looks like this: 1,1,1,1,1,1,1,1,1,1,2 resp. 3,4 resp. 5,6 resp. 7, 8,8,8,8,8,8,8.
Nick.C
QUOTE(halb27 @ Jan 11 2008, 08:16) *
The problem may be small but so far we should consider it a pitch problem to be solved.
I couldn't sleep this night and so I could think about it a lot.
I constructed the error file last night and listened to it (and looked at it with a wave editor).
I am convinced that the primary problem isn't caused by the noise level being too high. When listening to the error file what's most annoying is not the noise itself but the fluctuation in noise. Especially at the blocks' edges this fluctuation can form a strong transient.

I was a bit sceptical before about this abrubt noise level change with respect to the anti clipping strategy. But that's too short sighted. We do have this potential problem whenever there's a strong change in bits to remove from one block to the next.

To work against this we should take care that bits to remove changes only 1 bit at the blocks' edges. If for a sequence of 10 blocks bits to remove is 1, and for the next 10 blocks bits to remove is 8, we should not immediately go from 1 bits to remove to 8 bits to remove, but do it gradually, so the bits to remove in the 20 blocks is 1,1,1,1,1,1,1,1,1,1,2,3,4,5,6,7,8,8,8,8. If bits to remove of the first 10 blocks is 8 and the next 10 blocks is 1, bits to remove should be 8,8,8,8,7,6,5,4,3,2,1,1,1,1,1,1,1,1,1,1. Unfortunately this means having potentially to work on past blocks so this means buffering and deferred output.

I think we should do it this way for -2 and -1.
For -3 the number of intermediate steps with their restricted advantage of the removed bits should be lowered IMO. For -3 I think we can allow for a stepsize of 2 bits to remove when going from one block to the next. But we should do it in a way that the error level never has an immediate change of 2 bits to remove. We can easily do this by changing bits to remove by 1 bit for the first 256 samples in the block and another 1 bit for the last 256 samples. By just looking at 1 block this doesn't bring a compression improvement compared to change bits to remove by 1 for the entire block. The advantage is in the fact that we have roughly half of the intermediate blocks. So going from 1 bit to remove to 8 bits to remove as in the sample above looks like this: 1,1,1,1,1,1,1,1,1,1,2 resp. 3,4 resp. 5,6 resp. 7, 8,8,8,8,8,8,8.
Given the way that lossyWAV adds noise / reduces bits, I do not understand how pitch can be changed.

It would be relatively simple to ensure that each codec_block will have no more than 1 more bit removed than the last codec_block. To go the other way as well would be a large amount of coding.

I think that one initial approach would be to re-examine the -spf 22224 / 2246C for 64 / 1024 samples to see if the problem can be eradicated. I will re-post beta v0.6.2 to allow manipulation of those parameters removed at v0.6.4 RC1. I will also post beta v0.6.5 which incorporates the speedup and the extra 1024 sample FFT analysis per block.

[edit] Right, beta v0.6.5 appended to post #1 of this thread along with beta v0.6.2 as mentioned previously. Beta v0.6.5 limits the increase in bits_to_remove between blocks to 1 bit and incorporates the 3 1024 sample FFT analyses amendment. For my 53 sample set, beta v0.6.5 -3 / flac -5 produces 445.2kbps; -2 / flac -5 produces 508.7kbps and -1 / flac -5 produces 559.5kbps. [/edit]
halb27
QUOTE(Nick.C @ Jan 11 2008, 10:31) *

Given the way that lossyWAV adds noise / reduces bits, I do not understand how pitch can be changed.

It would be relatively simple to ensure that each codec_block will have no more than 1 more bit removed than the last codec_block. To go the other way as well would be a large amount of coding.

I think that one initial approach would be to re-examine the -spf 22224 / 2246C for 64 / 1024 samples to see if the problem can be eradicated. I will re-post beta v0.6.2 to allow manipulation of those parameters removed at v0.6.4 RC1. I will also post beta v0.6.5 which incorporates the speedup and the extra 1024 sample FFT analysis per block.

Pitch of the original signal can't change of course but the way we add noise can give the impression that pitch has changed. I did have this very impression with former listening tests. And I'm absolutely convinced it's not the noise due to bits to remove but the modulation of the noise due to the abrupt noise level changes. The way we realize 2Bdecide's basic principles at the moment causes this particular problem. We do take good care of the low to medium frequency range when doing the bits to remove analysis, but we do add a significant amount of noise there afterwards because of the noise modulation side effect.

You may convince yourself by first looking at the error signal with a wave editor. See how artificially strange this signal looks because of the abrupt changes in noise level. Then listen to it while within the wave editor. You can hear the noise as thus, but what's real annoying isn't the noise itself, it's the noise modulation due to abrupt changes in level.

Sorry that working backwards causes you a lot of trouble, and I can understand that you'd like to have another solution. But I definitely don't see a sense in giving the -spf setting a higher sensitivity for the HF range. Guess it's already unnecessarily high there (maybe the last change in this respect which was caused by problems with eig wasn't a good choice, cause maybe the problem is caused by the very problem we're talking about). Maybe gradually changing bits to remove gives room for being less defensive in -spf and -nts setting with -3 thus giving the chance to arrive at a lower average bitrate. Just speculation of course but what I want to say is there's no way around taggling the real problem. I think if you look and listen to the error signal you can understand.
Of course we can always bring bits to remove down and thus reduce the problem. But I think that's not the way to go.

I've thought about the working backward procedure. It's not nice of course, but I think the amount of effort necessary isn't extremely high. Whenever you output a block right now you can just write it to a buffer containing 16 blocks. You also record the current state of the number of bits to remove for the block and add this to the buffer space provided for the block. So whenever you have to work backwards you just address the bits to remove state of the blocks in the buffer.
The buffer is organized as a ring. So before putting the current block into the buffer you really output that block that is in the buffer for the longest time.
Sure the ring buffer has to be managed but I think that's not very difficult. Sure it's easy for me to talk about it and you having to do it in case you like to. Sorry about that.
Nick.C
I've incorporated the bits_to_remove delta limit = +1 for subsequent codec_blocks in beta v0.6.5 - I think that it would be worth listening to to see if we are more sensitive to increases in noise rather than decreases in noise - this version limits the increase in noise to 6dB per codec_block. [edit] The extra 1024 sample FFT analysis is also incorporated. [/edit]

I will think on your method of looping the blocks to be written and revert.
halb27
Thanks a lot.
Alex B
halb27,

I created a few smaller clips of the original and the lossy version. I tried to isolate the possible problems. Maybe these help in confirming that I have heard something. The clips should be accurately cutted (I used exact numerical values when creating the selections). While cutting these I inspected the difference signal (invert-mix-paste) in Audition. I saw the abrudly changing noise you explained. In addition, the Spectral Phase and Pan displays show small differences when the original and lossy version are compared.

I have yet to try to ABX them, except the first snare drum hit (00000_00595ms) which I already did. I think I can hear similar differences in the other clips, but ABXing them is more difficult.

For example, the cymbal crash in the 09400_10400ms clip may be slightly altered. I not saying that the actual pitch has changed, but the crash may be a bit brighter in one of the clips, which creates the impression of changed tuning.

The new lossyWAV clips are directly cutted from my first (-3) lossyWAV sample. I think it would be useful if someone else could hear one or more differences before trying other settings.

Click to view attachment
Click to view attachment
Click to view attachment
Click to view attachment
Click to view attachment
Click to view attachment
Click to view attachment
Click to view attachment
Nick.C
QUOTE(Alex B @ Jan 11 2008, 11:06) *

Halb27,

I created a few smaller clips of the original and the lossy version. I tried to isolate the possible problems. Maybe these help in confirming that I have heard something. The clips should be accurately cutted (I used exact numerical values when creating the selections). While cutting these I inspected the difference signal (invert-mix-paste) in Audition. I saw the abrudly changing noise you explained. In addition, the Spectral Phase and Pan displays show small differences when the original and lossy version are compared.

I have yet to try to ABX them, except the first snare drum hit (00000_00595ms) which I already did. I think I can hear similar differences in the other clips, but ABXing them is more difficult.

For example, the symbal crash in the 09400_10400ms clip could to be slightly altered. I not saying that the actual pitch has changed, but the crash may be a bit brighter in one of the clips, which creates the impression of changed tuning.

These are all from my first (-3) lossy sample. I think it would be useful if someone else could hear one or more differences before trying other settings.


I processed the sample in v0.6.2 and v0.6.5 (-detail re-enabled...) and got the following:
CODE
lossyWAV beta v0.6.2 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org
%lossyWAV Warning% : Quality level 3 selected.
%lossyWAV Warning% : Forcibly over-write output file if it exists.
%lossyWAV Warning% : Detailled output mode enabled
Processing : livin_in_the_future.wav
Format     : 44.10kHz; 2 ch.; 16 bit.
Progress   :
Block    Time   00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Tot.
====================================================================
    0    0.00s.  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  8   8
   16    0.19s.  8  9  9 10 10  9  7  7  7  7  6  6  8  8  8  8 127
   32    0.37s.  9  9  9  7  7  7  7  5  5  8  7  7  9  8  8  8 120
   48    0.56s.  8  8  7  7  8  7  7  8  8  6  6  8  9  7  9 10 123
   64    0.74s. 10  9 10 10 10 10 10 10 10  0 10 10 10 10  8 10 147
   80    0.93s.  0 10 10 10  0 10  0 10 10 10  0  9 10 10 10 10 119
   96    1.11s. 10 10  9 10 10 10 10 10  9  9 10 10 10  9  9 10 155
  112    1.30s.  9 10 10  0 10 10 10 10 10 10 10 10  8 10 10  9 146
  128    1.49s. 10 10  9  9  9  9 10  9  9 10 10 10  9  9  9  9 150
  144    1.67s.  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9 10 145
  160    1.86s.  0 10 10 10  8  9 10 10  9  9 10  9 10  9  9  9 141
  176    2.04s.  9  9  9  9  9  9  9  9 10 10 10 10 10  9  9  9 149
====================================================================
Average    : 8.7384; bits; [22580/2584; 22.65x; CBS=512]
%lossyWAV Warning% : 666 bits not removed due to clipping.

lossyWAV beta v0.6.5, Copyright (C) 2007,2008 Nick Currie. Portions (C) 1996
Don Cross. lossyWAV is issued with NO WARRANTY WHATSOEVER and is free software.
%lossyWAV Warning% : Detailled output mode enabled
Processing : livin_in_the_future.wav
Format     : 44.10kHz; 2 ch.; 16 bit.
Progress   :
Block    Time   00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Tot.
====================================================================
    0    0.00s.  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1   1
   16    0.19s.  2  3  4  5  6  7  7  7  7  7  6  6  7  8  7  8  97
   32    0.37s.  8  9  7  7  7  7  7  5  5  6  7  7  8  8  8  8 114
   48    0.56s.  8  8  7  7  8  7  7  8  8  6  6  7  8  7  8  9 119
   64    0.74s. 10  9 10 10 10 10 10 10 10  0  1  2  3  4  5  6 110
   80    0.93s.  0  1  2  3  0  1  0  1  2  3  0  1  2  3  4  5  28
   96    1.11s.  6  7  8  9 10 10 10 10  9  9 10 10 10  9  9 10 146
  112    1.30s.  9 10 10  0  1  2  3  4  5  6  7  8  8  9 10  9 101
  128    1.49s.  9  7  8  9  8  9 10  9  9 10 10 10  9  9  9  9 144
  144    1.67s.  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9 10 145
  160    1.86s.  0  1  2  3  4  5  6  7  8  9 10  9  9  9  9  9 100
  176    2.04s.  9  9  9  9  9  9  9  9 10 10 10 10 10  9  9  9 149
  ...    ......  ..................................................
====================================================================
Average    : 8.0232 bits; [20732/2584; 20.17x; CBS=512]
%lossyWAV Warning% : 0.1947 bits not removed due to clipping.


Alex B, if you have time, could you try the sample with beta v0.6.5?
2Bdecided
Be careful with restricting the deltas. It could increase the bitrate quite a lot for (as yet) no proven gain.

I was worried by the abrupt changes in noise to start with, and had strategies for cross fading block boundaries in the dithered and noise shaped versions. I didn't bother with the non-dithered version, but it would be possible here too by adding extra noise briefly and fading it out/in.

I couldn't find any situation where this cross fading was needed, so I dumped it.


If lossyWAV goes from no noise to 48dB of noise (8-bits) in a single block, that's because it believes that the audio in the entirety of that block (and slightly either side - remember overlap!) can take it.


Psychoacoustically, there are different thresholds for constant noise vs modulated noise, though I'm not sure if anyone has tested switched noise. I guess it too could be fractionally more audible.

There were almost no psychoacoustics in lossyFLAC, but my intention was to keep the noise well below both these thresholds (if they are indeed different). However, if it's below the threshold for constant noise, and above the threshold for modulated noise, then of course smoothing transitions or restricting deltas will help.

However, if the noise is simply too high in a given block because the calculations are wrong, and you introduce restricted deltas which happen to drag it down in that block, then of course you will stop the noise being audible, but you won't know if restricted deltas were really needed to solve it. Single block unlimited deltas (as now) with a slightly lower noise for that block might be the "better" solution.


I fear that raises more questions that it answers. Sorry!

Cheers,
David.
2Bdecided
Hang on a moment though - I think you guys are over reacting.

Isn't this what you designed the setting "-3" for? Probably transparent almost all the time. If someone can ABX something, does that mean that setting wants changing?

If it can be ABXed at -2, then you have work to do! wink.gif


I can't ABX it at -3, but I can see that the added noise is getting quite close to the signal over the 10-16k region, and is above it over 16k. (see attached pictures). I assume (because I haven't seen you mention it) that you still ignore things over 16k? Or not?

Cheers,
David.
halb27
So you too see the switch noise to be a potential problem.
So why not trying to avoid it? Sure average bitrate may come down significantly, but we don't know in advance. Moreover even in this case we have the option to gradually change the number of bits removed within a block as suggested with -3 to minimize the number of intermediate blocks while still smoothing error level.

I don't see it as a viable argument that this procedure would hide other problems. In principle this can be an unwanted side effect with any quality improving action. With this very action I think its rather the other way around: decreasing bits to remove by increasing -nts, -snr or whatsoever may well hide this very problem. If there should be a problem with the decision about how many bits to remove due to inpuit analysis it is expected to show up earlier or later also when using this smoothing strategy.
halb27
QUOTE(2Bdecided @ Jan 11 2008, 13:49) *

Isn't this what you designed the setting "-3" for? Probably transparent almost all the time. If someone can ABX something, does that mean that setting wants changing?

If it can be ABXed at -2, then you have work to do! wink.gif ....

You are right, but unluckily we haven't had a lot of testing so far. I guess it was me who has done most of the testing so far, especially in the recent months, and my 58 year old ears aren't very good witnesses.
We are very thankful as for AlexB's testing especially as his hearing seems to be excellent.
So we should take any reported issue seriously and look for improvement. This does not necessarily mean that something is changed in the end.
Problem in this case is that Nick would have to do a lot of work in case he follows my suggestions, and it cannot be excluded that it is good for nothing.
QUOTE(2Bdecided @ Jan 11 2008, 13:49) *

... I can see that the added noise is getting quite close to the signal over the 10-16k region, and is above it over 16k. (see attached pictures). I assume (because I haven't seen you mention it) that you still ignore things over 16k? Or not? ....

Well that's an important finding. So maybe a higher -nts value is the solution. But it's still an open question to what extent the noise level in the 10+ kHz region is generated by the switch noise. Do you mind trying -3 -nts 0 and -3 -nts 3? In case the switch noise participates in the problem the SNR in the 10+ kHz region is not expected to improve very much.
Alex B
QUOTE(Nick.C @ Jan 11 2008, 13:11) *
Alex B, if you have time, could you try the sample with beta v0.6.5?


It's better. I tried it with the 00000_00595ms sample. I couldn't reliably ABX it.

In addition I compared 0.64rc vs 0.65b. The ABX result was 9/10.

The bitrate increased from 494 to 555 kbps
(using FLAC -8 --padding 80. The small padding block is for the replay gain tag. foobar seems to take the tags into account when it calculates bitrates.)
Nick.C
QUOTE(2Bdecided @ Jan 11 2008, 11:49) *
I can't ABX it at -3, but I can see that the added noise is getting quite close to the signal over the 10-16k region, and is above it over 16k. (see attached pictures). I assume (because I haven't seen you mention it) that you still ignore things over 16k?
The cutoff is 16kHz - however, I already suggested changing 2246C to 22469 for the 1024 sample FFT - this brings bits_to_remove down a bit by reducing the spreading at high frequencies.

As an aside is it better to carry out 2 x FFT's (-512:511; 0:1023) or 1 (-256:767) at 1024 samples? The thinking behind the single FFT is that it is centred on the codec_block in question and is still overlapped 50% with the next FFT.
Alex B
QUOTE(2Bdecided @ Jan 11 2008, 13:49) *

Hang on a moment though - I think you guys are over reacting.

Isn't this what you designed the setting "-3" for? Probably transparent almost all the time. If someone can ABX something, does that mean that setting wants changing?

If it can be ABXed at -2, then you have work to do! wink.gif

Those are my thoughts too. Unless my finding gets backup from others and several similar samples are found I don't think you need to worry too much.

QUOTE(2Bdecided @ Jan 11 2008, 13:49) *
I can't ABX it at -3, but I can see that the added noise is getting quite close to the signal over the 10-16k region, and is above it over 16k. (see attached pictures). I assume (because I haven't seen you mention it) that you still ignore things over 16k? Or not?

Perhaps a young tester who can easily hear up to 20 kHz or more would find easier to ABX this. My practical limit is about 17-18 kHz, I think.

QUOTE(halb27 @ Jan 11 2008, 14:11) *

I guess it was me who has done most of the testing so far, especially in the recent months, and my 58 year old ears aren't very good witnesses.
We are very thankful as for AlexB's testing especially as his hearing seems to be excellent. ...

I think we all hear things a bit differently. You have often pinpointed things that I might not have noticed. I may be sensitive to this kind of problem which sounds like a slight pitch change to me. I heard a similar effect in your "French lady" LAME -V0 sample, if you remember.


Edit: a typo again
Nick.C
QUOTE(Alex B @ Jan 11 2008, 12:49) *
Perhaps a young tester how can easily hear up to 20 kHz or more would find easier to ABX this. My practical limit is about 17-18 kHz, I think.

QUOTE(halb27 @ Jan 11 2008, 14:11) *
I guess it was me who has done most of the testing so far, especially in the recent months, and my 58 year old ears aren't very good witnesses.
We are very thankful as for AlexB's testing especially as his hearing seems to be excellent. ...
I think we all hear things a bit differently. You have often pinpointed things that I might not have noticed. I may be sensitive to this kind of problem which sounds like a slight pitch change to me. I heard a similar effect in your "French lady" LAME -V0 sample, if you remember.
I'd like to re-iterate halb27's thanks for initially identifying the problem and subsequently carrying out the ABX tests.

Thinking about the problem, it seems that the drop from 10 to 0 and back to 10 at codec_block 72/73/74 is due to clipping prevention rather than low minimum signal.

I agree that 10/0/10 is a bit of a steep change, but is a restricted_delta of +1 a bit conservative? Would +2 or +3 be acceptable? The higher the restricted_delta value, the fewer subsequent codec_blocks required to get back to the actual calculated value rather than sequential last_btr+restricted_delta values, i.e. 10,0,10,10,10,10,10 with restricted_delta=2 > 10,0,2,4,6,8,10.
2Bdecided
QUOTE(halb27 @ Jan 11 2008, 12:11) *
But it's still an open question to what extent the noise level in the 10+ kHz region is generated by the switch noise.
The switching doesn't "generate" noise. With white noise, the transient at the start is exactly as "loud" (if you want to put it that way) as the noise itself - no more or less.

It's not like a tone, where an instant start could be perceived as a click.

Cheers,
David.
2Bdecided
QUOTE(halb27 @ Jan 11 2008, 11:54) *
I don't see it as a viable argument that this procedure would hide other problems. In principle this can be an unwanted side effect with any quality improving action. With this very action I think its rather the other way around: decreasing bits to remove by increasing -nts, -snr or whatsoever may well hide this very problem. If there should be a problem with the decision about how many bits to remove due to inpuit analysis it is expected to show up earlier or later also when using this smoothing strategy.
Of course either approach can be the wrong one, yet appear to solve the problem.

All I was pointing out is that, for this reason, you really need to figure out a way of finding out which is right, but this is necessarily difficult.

My bet would be that it has nothing to do with switching transients, and everything to do with a simple nts.

At worst, it might be that the nts is "more wrong" for noise-like signals than tone-like signals - and that, specifically, it needs to find the peaks in the spectrum (as well as the troughs) and ensure that the noise is always at least 25dB (say) below them. Noise 18dB down from a peak can change the peak by 1dB, noise 25dB down can change it by 0.5dB. For most signals, the added noise is already much lower than 25dB below the spectral peak, but for signals which are originally noise-like anyway, it can currently get close to this limit.

Just a thought - IIRC you might well have (something like) this in there already!

Cheers,
David.
halb27
QUOTE(Nick.C @ Jan 11 2008, 15:08) *

.... I agree that 10/0/10 is a bit of a steep change, but is a restricted_delta of +1 a bit conservative? Would +2 or +3 be acceptable? ...

Maybe this is the best way out. Within the intermediate block(s) the total change can still be done in 1 bit steps - the way I suggested it for -3. Thus only few intermediate blocks, and still a smoothly changing resolution. Resolution 1 bit wise can change for instance every 128 samples thus allowing a total resolution change of 4 bits from block to block.
We can even adapt analysis to this 128 sample subblock scheme and let only those FFT results influence the bits to remove calculation which really are related to the actual 128 sample subblock. This makes the analysis more exact and has the potential to lower average bitrate.
GeSomeone
QUOTE(Nick.C @ Jan 11 2008, 14:08) *

Thinking about the problem, it seems that the drop from 10 to 0 and back to 10 at codec_block 72/73/74 is due to clipping prevention rather than low minimum signal.
But then again wouldn't this be around a peak value where masking (the noise or change thereof) would work optimal?

QUOTE(Nick.C @ Jan 11 2008, 14:08) *
I agree that 10/0/10 is a bit of a steep change, but is a restricted_delta of +1 a bit conservative? Would +2 or +3 be acceptable?

Those are good questions, first it has to be determined if switching the noise is the problem, secondly, if so, what to do to make it not a problem.
The whole method is base on modulating noise. Even with restricted delta, the noise is still modulated, only in a different way which might cause different side effects (maybe lower frequency artifacts?).

Sorry I can just think a little bit with you about the theory but not really help with abx-ing all these possibilities.
Nick.C
lossyWAV beta v0.6.6 attached to first post of this thread.
Bourne
can we expect full transparency when it reaches 1.0 final ? This is pretty cool.
Nick.C
QUOTE(Bourne @ Jan 12 2008, 15:50) *
can we expect full transparency when it reaches 1.0 final ? This is pretty cool.
The stated aim is full transparency for -3 with -2 and -1 being more conservative options for the user. At present -3 is pretty near transparent (although to my ears it is, but my ears certainly aren't the best on the planet....), but we're trying to iron out the problem for v0.6.4 RC1 with Bruce Springsteen's Livin In The Future identified by Alex B. Beta v0.6.5 was a pretty good comeback as Alex B's ABX'ing was inconclusive (I take it that means somewhere being able to and not being able to ABX the resulting WAV file).

With more finely tuned ears listening out for artefacts we'll get closer and closer to transparent (though to get there absolutely would probably take an infinite amount of time).

As I said, -3 is currently transparent for me, but transparency is in the ear of the beholder....
halb27
QUOTE(Alex B @ Jan 11 2008, 13:06) *

...
Your samples 00000_00595ms, 09400_10400ms, 19800_21000ms, 21600_23100ms
...

Finally I found the time to abx your samples (I had a lot of trouble trying to bring my system to an uptodate state - now I'm back to my old configuration).

With your 00000_00595 samples I got at a 6/7 which in the end was 7/10.
With 19800_21000 I also have the suspicion that something's wrong but could not abx it.
With 21600_23100 I got at 6/8 and ended up 6/10.

Though these aren't good results I think it's enough for a confirmation.

I tried 0.6.6 on your samples. The results are better, but with 00000_00595 I got at 7/9 and ended up 7/10.
So the problem is still there.

I went back to 0.6.4RC1 and used a setting of -3 -nts 0.
Now I can't abx the problem any more.

So this is evidence that 2Bdecided is right and it's just a -nts problem.

As for this I suggest we default -3 to -nts 0, -2 to -nts 2 and -1 to -nts 4, and keep -spf the way it was done with 0.6.4RC1 (IMO the high frequency range is covered already well by the short FFT with its low spreading value).

I still feel uncomfortable with abrupt noise level changes, but maybe this is a wrong idea. At least it's not backed up by this sample.

Average bitrage will increase again - something which isn't liked especially with -3. In the wiki there's encouragement already to use a higher -nts value than default for people who prefer a smaller filesize and accept minor errors. Maybe we should find a formulation which enforces this encouragement.
Nick.C
QUOTE(halb27 @ Jan 12 2008, 21:46) *

QUOTE(Alex B @ Jan 11 2008, 13:06) *

...
Your samples 00000_00595ms, 09400_10400ms, 19800_21000ms, 21600_23100ms
...

Finally I found the time to abx your samples (I had a lot of trouble trying to bring my system to an uptodate state - now I'm back to my old configuration).

With your 00000_00595 samples I got at a 6/7 which in the end was 7/10.
With 19800_21000 I also have the suspicion that something's wrong but could not abx it.
With 21600_23100 I got at 6/8 and ended up 6/10.

Though these aren't good results I think it's enough for a confirmation.

I tried 0.6.6 on your samples. The results are better, but with 00000_00595 I got at 7/9 and ended up 7/10.
So the problem is still there.

I went back to 0.6.4RC1 and used a setting of -3 -nts 0.
Now I can't abx the problem any more.

So this is evidence that 2Bdecided is right and it's just a -nts problem.

As for this I suggest we default -3 to -nts 0, -2 to -nts 2 and -1 to -nts 4, and keep -spf the way it was done with 0.6.4RC1 (IMO the high frequency range is covered already well by the short FFT with its low spreading value).

I still feel uncomfortable with abrupt noise level changes, but maybe this is a wrong idea. At least it's not backed up by this sample.

Average bitrage will increase again - something which isn't liked especially with -3. In the wiki there's encouragement already to use a higher -nts value than default for people who prefer a smaller filesize and accept minor errors. Maybe we should find a formulation which enforces this encouragement.
[Vino Rosso]Meh - oh well, just back from my company's Christmas party to a variation order for lossyWAV - no problem..... On the plus side, if v0.6.4 RC1 with -3 -nts 0 solves the problem then we will all benefit from the 50% speedup found when I started investigating Alex B's problem and potential solutions. Not the end of the world then - just a few kbps extra.....

On the face of it, maybe -nts 0 is the only acceptable starting point for the lowest quality option - so -nts -2 for -2 and -nts -4 for -1?

Ouch - 462kbps for my 53 sample set (40.98MB). But, we want transparency at all quality presents - so be it.[/Vino Rosso]
halb27
I've tried 0.6.4.RC1 -3 -nts 0 on my small regular track sample set which however has shown to be pretty representative for regular music. The average bitrate is 402 kbps.

I was a little fast last night with conclusions, probably because I was so happy having been able to abx the problem finally. What is missing at the moment IMO is AlexB's opinion towards -3 -nts 0.
AlexB, do you mind trying 0.6.4.RC1 -3 -nts 0?

Nick.C
QUOTE(halb27 @ Jan 13 2008, 11:25) *
I've tried 0.6.4.RC1 -3 -nts 0 on my small regular track sample set which however has shown to be pretty representative for regular music. The average bitrate is 402 kbps.

I was a little fast last night with conclusions, probably because I was so happy having been able to abx the problem finally. What is missing at the moment IMO is AlexB's opinion towards -3 -nts 0.
AlexB, do you mind trying 0.6.4.RC1 -3 -nts 0?
Spooky - my 10 album test set got 402kbps as well [edit] at -3 -nts 0; 450kbps at -2 -nts -2 and 494kbps at -1 -nts -4 [/edit] .....
halb27
This is an adequate and pretty evenly spread increase in bitrate to me for -3, -2, -1.
Nick.C
QUOTE(halb27 @ Jan 13 2008, 20:40) *
This is an adequate and pretty evenly spread increase in bitrate to me for -3, -2, -1.
Ok, I'll post v0.6.7 RC2 in the thread. You should notice a fairly impressive improvement in processing throughput.
halb27
Thank you, Nick.
Speed is very good.
Guess -nts defaults are -nts 0 for -3, -nts -2 for -2, and -nts -4 for -1. Right?
But what else is different compared to 0.6.4RC1? Average bitrate for my regular sample set is now 403 kbps.
Nick.C
QUOTE(halb27 @ Jan 13 2008, 22:02) *
Thank you, Nick.
Speed is very good.
Guess -nts defaults are -nts 0 for -3, -nts -2 for -2, and -nts -4 for -1. Right?
But what else is different compared to 0.6.4RC1? Average bitrate for my regular sample set is now 403 kbps.
If you were one of the first two to download v0.6.7 RC2 then you downloaded a version which still had the "maximum additional bits_to_remove increase per codec_block" mechanism active, with a delta of +2 bits. Sorry sad.gif, I tried to remove it as quick as I could - try re-downloading....
halb27
It looks fine now (I tried with AlexB's sample).

I'll change the wiki as I described the -nts defaults.
lexor
Sup all, I've got a couple of questions about lossyWAV:

1) the wiki is angling at standard lossless decoders (like flac, etc) decoding lossy.flac/etc., but will standard WAV decoder decode lossyWAV correctly?

2) if all level settings are aiming at transparency... why have level settings?
TBeck
QUOTE(lexor @ Jan 14 2008, 01:29) *

2) if all level settings are aiming at transparency... why have level settings?

I second this.

If -3 is beeing tuned to be transparent under any known condition, it would make sense for me to have one safer setting which handles possibly unknown problem files better. Beeing the more paranoid one, i probably would choose this (-2) . But i would never like to go even higher (-1). For me there is also some kind of a psychological barrier: For my taste lossy (wave) files should not have more than half the size of lossless files (on average)...

But this is just my taste...
carpman
QUOTE(TBeck @ Jan 14 2008, 01:02) *

I second this.


I don't.

I hope you'll keep the 3 levels.

So far I've been using lossy.wav -2 then encoding to flac (testing with vinyl restoration projects and results are very good).

For me it's like this:

-1 when it HAS to be transparent (eg. if I'd spent many many hours working on a piece in whatever capacity)
-2 when I really want it to be transparent (and figure that only in extreme cases it won't be, -2 is the perfect setting between MP3 320 and Lossless, and for me preferable to WavPack Hybrid).
-3 when I'd like it to be transparent, but I'm not too fussed if it isn't (I've got plenty of music which springs to mind).

So please keep the 3 levels -- and thanks for all your hard work.

C.

By the way -- has anyone done listening tests to MP3s transcoded from lossy.wav versus .wav?

In theory should there be any perceptual difference?

C.
buktore
Since it is still "lossy" I think that to have an option to choose for is still a better way to go. I mean, lossless codec do have an option even though it will work just fine without one or if developer decide to not include it. and still we got a lot of option anyway. ( which is good BTW.)

Oops. nearly for got what I'm here for. I drop by to show my gratitude & encouragement to everyone involve in this. (2Bdecided,Nick.C,halb27 and anyone else that I'm not mention) Thanks for your time and effort. smile.gif
Nick.C
QUOTE(lexor @ Jan 14 2008, 00:29) *
1) the wiki is angling at standard lossless decoders (like flac, etc) decoding lossy.flac/etc., but will standard WAV decoder decode lossyWAV correctly?
The WAV file is still a WAV file - there is no decoding to do as all that is different between the original lossless WAV file and the lossyWAV file is that some LSB's are zero.
QUOTE(lexor @ Jan 14 2008, 00:29) *
2) if all level settings are aiming at transparency... why have level settings?
Every lossy codec I've come across has quality settings - all presets aim at transparency, some fail with some tracks, with reducing likelihood as output bitrate increases.
QUOTE(carpman @ Jan 14 2008, 02:12) *
So please keep the 3 levels -- and thanks for all your hard work.

By the way -- has anyone done listening tests to MP3s transcoded from lossy.wav versus .wav?

In theory should there be any perceptual difference?
I found a post on anythingbutipod.com which tends to suggest that an OGG file transcoded from lossyWAV was bigger than lossless > OGG. As to perceptual differences, I think that's a question for David....
QUOTE(TBeck @ Jan 14 2008, 01:02) *
I second this.

If -3 is beeing tuned to be transparent under any known condition, it would make sense for me to have one safer setting which handles possibly unknown problem files better. Beeing the more paranoid one, i probably would choose this (-2) . But i would never like to go even higher (-1). For me there is also some kind of a psychological barrier: For my taste lossy (wave) files should not have more than half the size of lossless files (on average)...

But this is just my taste...
It is as you say, but -3 at v0.6.4 RC1 has proven *not* to be transparent within a couple of days of release. I can't say I was very happy, but I was delighted that Alex B's ears are so good that he was able to identify a problem with the track in question. So, -1 for paranoics, -2 for most people and -3 for DAP users (my preference being -3).
QUOTE(buktore @ Jan 14 2008, 02:48) *
Since it is still "lossy" I think that to have an option to choose for is still a better way to go. I mean, lossless codec do have an option even though it will work just fine without one or if developer decide to not include it. and still we got a lot of option anyway. ( which is good BTW.)

Oops. nearly for got what I'm here for. I drop by to show my gratitude & encouragement to everyone involve in this. (2Bdecided,Nick.C,halb27 and anyone else that I'm not mention) Thanks for your time and effort. smile.gif
Thanks for the appreciation - we've all had fun with this project!
2Bdecided
QUOTE(Nick.C @ Jan 14 2008, 07:43) *
I found a post on anythingbutipod.com which tends to suggest that an OGG file transcoded from lossyWAV was bigger than lossless > OGG. As to perceptual differences, I think that's a question for David....
I saw that too. It matches my early tests with mp3. It's not a big deal.

What is interesting is taking mp3 problem samples, and trying to ABX WAV>mp3 vs lossy.WAV>mp3. It would be nice if -1 (at least) could make that difference unABXable - but this might be unrealistic. I should get back to playing around with trumpet.wav or whatever it was called.

Cheers,
David.
GeSomeone
QUOTE(TBeck @ Jan 14 2008, 02:02) *
to my taste lossy (wave) files should not have more than half the size of lossless files (on average)..


[pedantic]I think you mean lossyFlac (or lossyTAK smile.gif ), as lossyWav files are the same size as the source wavs[/pedantic]

Yes, I would wish that too, but I found out that the nature of the source file makes a big difference.
just some examples:
a reasonably quiet track (a singer and a guitar) that rates 553 with FLAC -8 and 429 with lossyFlac -3 -nts 0
a lot louder track (another singer with just a guitar) rates 857 in FLAC -8 but 347 with lossyFlac -3 -nts 0

go figure wink.gif
Nick.C
QUOTE(GeSomeone @ Jan 14 2008, 21:08) *
Yes, I would wish that too, but I found out that the nature of the source file makes a big difference.
just some examples:
a reasonably quiet track (a singer and a guitar) that rates 553 with FLAC -8 and 429 with lossyFlac -3 -nts 0
a lot louder track (another singer with just a guitar) rates 857 in FLAC -8 but 347 with lossyFlac -3 -nts 0
It seems counter-intuitive, but looking at the nearly 3700 tracks that I've processed, the higher the initial bitrate, the lower the processed bitrate and vice-versa (subject to usual caveats about tracks which do not follow the generalism) [both processed bitrates less than the lossless bitrate].
halb27
QUOTE(GeSomeone @ Jan 14 2008, 23:08) *

... a reasonably quiet track (a singer and a guitar) that rates 553 with FLAC -8 and 429 with lossyFlac -3 -nts 0
a lot louder track (another singer with just a guitar) rates 857 in FLAC -8 but 347 with lossyFlac -3 -nts 0 ...

When there's only very few instruments probability is high that parts of the spectrum have low energy. The lossyWAV principle is based on preserving the low energy parts with reasonable accuracy. So 'simple' music needs more bits as a rule.
The more instruments the more noise-like becomes the music - technically speaking - and the harder it gets for a lossless codecs.

lossyWAV looks worst compared to pure lossless with quiet 'simple' music. lossyWAV has no chance to save a significant amount of bits in this case.

I see it in a positive way: in many cases lossyWAV saves a lot of bits compared to lossless. In those cases where the relation isn't so good it's for the most part because lossless is already very efficient.
Nick.C
Question: David mentioned in another thread about the number of actual bits remaining after rounding.

Is there any perceived benefit to be gained by implementing a(nother) safety net as follows:

When filling FFT array, OR a mask variable with the absolute value of each sample. This will allow the determination of the maximum set bit in the codec_block for that channel (max_bit).

Limit the bits_to_remove to the lower of the calculated value and Max(0,(max_bit-minimum_bits_to_keep)), thereby retaining at least minimum_bits_to_keep bits of actual resolution in that codec_block.

[edit] Also, if the number of clipped samples were restricted to, say, 5 per channel per codec_block (i.e. max of 10 for stereo, 0.977% of samples in the codec_block), would that seem reasonable? Even if they were all in series that would only be 0.1134 milliseconds. The reason I ask is that when I apply this to the livin_in_the_future problem track, although it clips, the bits_to_remove lost due to clipping is zero with only 196 clipping samples in the whole file (1323000 samples x 2 channels). [/edit]

[edit2] Say, -1 = 0 clips; -2 = 1 clip; -3 = 5 clips? [/edit2]
halb27
I guess it doesn't hurt, but I also think it won't reduce bitrate in a significant way.
Nick.C
QUOTE(halb27 @ Jan 15 2008, 15:14) *
I guess it doesn't hurt, but I also think it won't reduce bitrate in a significant way.
The first will increase the bitrate, the second certainly reduces it. I will post beta v0.6.8 in the first post of this thread, using minimum_bits_to_keep=5 and maximum_clips = (0,1,5).
halb27
Sorry for being not clear. I only addressed your second suggestion.
As for your first certainly it's another defensive action, but it looks a bit like not having confidence in the lossyWAV principle.
2Bdecided
Are you using minimum_bits_to_keep in a defensive way already? Sorry, I'm not keeping up. If it's key to maintaining quality as it is, then maybe you should add what you propose. If it's not, then extending downwards to help quieter blocks doesn't seem necesary. If it is necesary, it would be better to keep the nosie floor at least x dB below the peaks in the spectral domain, rather than in the time domain - which is what I was trying to get at in a post on the last page.

I'm not sure what you're setting at with the clipping. If you let one sample clip in a block, then there are no wasted bits in that block, surely? The sample is 1111111111111111 so no zeros, so wasted_bits=0. Not sure how other codecs handle it - I remember Bryant saying wavpack was different.

Cheers,
David.
Nick.C
QUOTE(2Bdecided @ Jan 15 2008, 16:59) *
Are you using minimum_bits_to_keep in a defensive way already? Sorry, I'm not keeping up. If it's key to maintaining quality as it is, then maybe you should add what you propose. If it's not, then extending downwards to help quieter blocks doesn't seem necesary. If it is necesary, it would be better to keep the nosie floor at least x dB below the peaks in the spectral domain, rather than in the time domain - which is what I was trying to get at in a post on the last page.

I'm not sure what you're setting at with the clipping. If you let one sample clip in a block, then there are no wasted bits in that block, surely? The sample is 1111111111111111 so no zeros, so wasted_bits=0. Not sure how other codecs handle it - I remember Bryant saying wavpack was different.

Cheers,
David.
On the clipping front, if bits_to_remove=6 then what would have been 10000000 00000000 (assuming there was no sign bit - that bit is done with floats) would be clipped to 01111111 11000000, i.e. as if it had been rounded down not up.

On the minimum_bits_to_keep front, at present maximum_bits_to_remove=bits_per_sample-minimum_bits_to_keep = 16-5 = 11 for *all* codec_blocks. With the new proposal, if the highest filled bit (taking the ABS of -ve numbers first) is the 8th then at most 3 bits would be removed, regardless of what the algorithm produced.

I do have faith in the method, I just like belt, braces and hands in pockets keeping trousers up......
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.