halb27
Jan 16 2008, 02:01
I wasn't aware that there is already a minimum_bits_to_keep of 5 out of 16.
Based on this I second your suggestion and prefer a minimum_bits_to_keep of 5 out of number of bits used in the block.
Nick.C
Jan 16 2008, 02:25
QUOTE(halb27 @ Jan 16 2008, 08:01)

I wasn't aware that there is already a minimum_bits_to_keep of 5 out of 16.
Based on this I second your suggestion and prefer a minimum_bits_to_keep of 5 out of number of bits used in the block.
Thanks, I will keep this then - however, it may be that we would want to "tweak" the value for different quality presets, in the same way as has been done for allowable clips.
On the allowable clips front, if you were to process livin_in_the_future at -3 and ABX it, looking for clipping artefacts, I would be very grateful, as it did suffer a bit from lost bits due to clipping reduction with v0.6.7 RC2 and doesn't with beta v0.6.8.
QUOTE(2Bdecided @ Jan 15 2008, 17:59)

Are you using minimum_bits_to_keep in a defensive way already? Sorry, I'm not keeping up. If it's key to maintaining quality as it is, then maybe you should add what you propose. If it's not, then extending downwards to help quieter blocks doesn't seem necesary. If it is necesary, it would be better to keep the nosie floor at least x dB below the peaks in the spectral domain, rather than in the time domain - which is what I was trying to get at in a post on the last page.
I agree.
QUOTE(Nick.C @ Jan 15 2008, 18:58)

On the minimum_bits_to_keep front, at present maximum_bits_to_remove=bits_per_sample-minimum_bits_to_keep = 16-5 = 11 for *all* codec_blocks. With the new proposal, if the highest filled bit (taking the ABS of -ve numbers first) is the 8th then at most 3 bits would be removed, regardless of what the algorithm produced.
I do have faith in the method, I just like belt, braces and hands in pockets keeping trousers up......
I don't...
If 2Bdecided's approach is working right, you will gain nothing except probably worse compression.
Long time ago i was using old fashioned logarithmic quantization to compress audio: High resolution for low levels, little resolution for high levels. That's very similar to your proposal.
Unfortunately this often doesn't work well. Think about a combination of a low frequency signal with high amplitude (bass guitar) and a higher frequncy signal with low volume. Your approach will calculate the bits_per_sample from the low frequency signal and introduce distortion for the high frequency signal.
It's also likely to fail with a pure low frequency signal of high amplitude. Here i always got annoying distortions with the logarithmic approach.
I don't think it would be a good idea to sacrifice compression for a very questionable improvement of the sound quality.
Sorry: Bad explaination because of my very limited english...
QUOTE(Nick.C @ Jan 15 2008, 09:58)

QUOTE(2Bdecided @ Jan 15 2008, 16:59)

Are you using minimum_bits_to_keep in a defensive way already? Sorry, I'm not keeping up. If it's key to maintaining quality as it is, then maybe you should add what you propose. If it's not, then extending downwards to help quieter blocks doesn't seem necesary. If it is necesary, it would be better to keep the nosie floor at least x dB below the peaks in the spectral domain, rather than in the time domain - which is what I was trying to get at in a post on the last page.
I'm not sure what you're setting at with the clipping. If you let one sample clip in a block, then there are no wasted bits in that block, surely? The sample is 1111111111111111 so no zeros, so wasted_bits=0. Not sure how other codecs handle it - I remember Bryant saying wavpack was different.
Cheers,
David.
On the clipping front, if bits_to_remove=6 then what would have been 10000000 00000000 (assuming there was no sign bit - that bit is done with floats) would be clipped to 01111111 11000000, i.e. as if it had been rounded down not up.
On the minimum_bits_to_keep front, at present maximum_bits_to_remove=bits_per_sample-minimum_bits_to_keep = 16-5 = 11 for *all* codec_blocks. With the new proposal, if the highest filled bit (taking the ABS of -ve numbers first) is the 8th then at most 3 bits would be removed, regardless of what the algorithm produced.
I do have faith in the method, I just like belt, braces and hands in pockets keeping trousers up......
Would that introduce a varying noise floor for same volume samples? If so, it may be detrimental to perceived audio quality.
2Bdecided
Jan 16 2008, 03:55
TBeck,
What Nick is proposing won't make anything sound worse, because it's just extending a safety net to lower amplitudes. It's never used to throw away more bits than the "find the noise floor and quantise below it" approach - only to keep at least 5 bits when fewer than 5 bits were going to be kept.
Nick: does this kick in very often?
Where I do agree with you TBeck is that it's not a great safety catch, for exactly the reasons you've explained. It needs to be done in a spectral domain, not the time domain.
Whether it's worth doing in either domain is open to question. It might be, but it's extra complexity. It's heading even further down the route of having a "psychoacoustic" model. There will come a point when it's better to "borrow" someone else's.
For now, I'm more inclined to be happy with what we have.
Cheers,
David.
halb27
Jan 16 2008, 03:59
QUOTE(TBeck @ Jan 16 2008, 10:58)

.. Unfortunately this often doesn't work well. Think about a combination of a low frequency signal with high amplitude (bass guitar) and a higher frequncy signal with low volume. ...
Yes, but it's just an additional safety action - not really necessary IMO, but as it's done already Nick.C's new approach is just more consequent than the old approach.
It's low level signals that benefit from the new approach with respect to this safety action.
EDIT: 2Bdecided was faster.
QUOTE(2Bdecided @ Jan 16 2008, 10:55)

What Nick is proposing won't make anything sound worse, because it's just extending a safety net to lower amplitudes. It's never used to throw away more bits than the "find the noise floor and quantise below it" approach - only to keep at least 5 bits when fewer than 5 bits were going to be kept.
QUOTE(halb27 @ Jan 16 2008, 10:59)

Yes, but it's just an additional safety action - not really necessary IMO, but as it's done already Nick.C's new approach is just more consequent than the old approach.
It's low level signals that benefit from the new approach with respect to this safety action.
I was aware of this but failed to express it right.
Nick.C
Jan 16 2008, 05:18
So, reading the last few posts:
I will remove the revised minimum_bits_to_keep method (no, it doesn't kick in very often at all, and it slows down the processing slightly).
Seeking concensus:
Should we retain the recent implementation of allowing a few clipped samples to be "rounded the other way"?
halb27
Jan 16 2008, 05:33
I don't care much about it with a slight negative feeling towards letting 5 samples clip though I don't think that would be audible.
I feel positive about letting isolated samples clip, but that's only because of AlexB's provided track where it happens that bits removed changed abruptly due to only 1 clipped sample.
For differentiating -3 from -2 my favorite is: let 1 sample per block clip for -2, let 2 samples clip for -3.
Nick.C
Jan 16 2008, 05:35
QUOTE(halb27 @ Jan 16 2008, 11:33)

I don't care much about it with a slight negative feeling towards letting 5 samples clip though I don't think that would be audible.
I feel positive about letting isolated samples clip, but that's only because of AlexB's provided track where it happens that bits removed changed abruptly due to only 1 clipped sample.
For differentiating -3 from -2 my favorite is: let 1 sample per block clip for -2, let 2 samples clip for -3.
I was thinking more: -1 = 0; -2 = 1; -3 = 5;
Code has speeded up yet again - now approaching 50% faster than v0.6.4 RC1....
2Bdecided
Jan 16 2008, 07:10
QUOTE(Nick.C @ Jan 15 2008, 17:58)

On the clipping front, if bits_to_remove=6 then what would have been 10000000 00000000 (assuming there was no sign bit - that bit is done with floats) would be clipped to 01111111 11000000, i.e. as if it had been rounded down not up.
If you're normally rounding to the nearest number, then rounding down when you should be rounding up means you're jumping further away from the wanted value on this sample than on any other, doesn't it? I.e. you're adding more noise - potentially 50% more.
As it's level dependent, it's not strictly noise - it's distortion.
With my apologies in advance if I've misunderstood!
Cheers,
David.
Nick.C
Jan 16 2008, 07:18
QUOTE(2Bdecided @ Jan 16 2008, 13:10)

If you're normally rounding to the nearest number, then rounding down when you should be rounding up means you're jumping further away from the wanted value on this sample than on any other, doesn't it? I.e. you're adding more noise - potentially 50% more.
As it's level dependent, it's not strictly noise - it's distortion.
With my apologies in advance if I've misunderstood!
Cheers,
David.
No, exactly right - but given the duration - does the benefit not outweight the potential cost?
halb27
Jan 16 2008, 08:00
It would be good to see the resulting difference in bitrate due to allowing this kind of restricted clipping for entire tracks. I think the difference is small. I will try with the only seriously clipping album in my collection.
Nick.C
Jan 16 2008, 08:30
QUOTE(halb27 @ Jan 16 2008, 14:00)

It would be good to see the resulting difference in bitrate due to allowing this kind of restricted clipping for entire tracks. I think the difference is small. I will try with the only seriously clipping album in my collection.
Using livin_in_the_future, there is a 3.5% reduction at -2 (1 clip per channel per codec_block allowed) and 5% reduction at -3 (5 clips...)
lossyWAV beta v0.6.9 attached to post #1 in this thread.
halb27
Jan 16 2008, 10:43
I guess that's the result for the 30 sec part - but maybe it's representative for the entire track.
3.5% resp. 5% isn't bad, but it also shows for this case that we get most of the effect with less then 5 samples allowed to clip which is a more cautious approach. Do you mind trying 2 allowed clipped samples per block?
Nick.C
Jan 16 2008, 10:46
QUOTE(halb27 @ Jan 16 2008, 16:43)

I guess that's the result for the 30 sec part - but maybe it's representative for the entire track.
3.5% resp. 5% isn't bad, but it also shows for this case that we get most of the effect with less then 5 samples allowed to clip which is a more cautious approach. Do you mind trying 2 allowed clipped samples per block?
Tried -2 and -3 at 2 allowable clips.
-2 : the FLAC file decreases from 1715683 to 1712152 bytes (no clips 1777874) : -3.50% to -3.70%;
-3 : the FLAC file increases from 1523876 to 1524359 bytes (no clips 1603204) : -4.95% to -4.92%
I think that -1 = 0; -2 = 1; -3 = 2 may be optimal.
halb27
Jan 16 2008, 12:30
Looks good. Looking at the -2 result there's only a negligible difference between allowing 1 or 2 samples to clip. So allowing for just 1 sample to clip may be preferable also with -3.
But that's only for AlexB's sample.
I just encoded 7 full length tracks (those in my selective collection) from Francoise Hardy's Album 'Le temps des souvenirs' which I know has a lot of clipping. 0.6.9 (5 allowed clipping samples) provides for a decrease in total filesize of -19,4% (against using 0.6.7RC2).
So for clipped tracks your suggestion yields a significant improvement.
I'd like to try these tracks with less samples allowed clipping. Can you provide such an experimental version, please?
Nick.C
Jan 16 2008, 12:52
Will do - should be up in an hour or so......
halb27
Jan 16 2008, 13:05
Meanwhile I can report about my listening test with 0.6.9 on AlexB's sample and 2 of my 7 tracks. It's alright to me - I could not abx any of those spots I had a suspicion that there's a slightly audible issue. What I considered 'wrong' in the encoding was also 'wrong' in the orginal - Francoise Hardy's album has a pretty bad quality.
So from this even 5 allowed clipped samples per block are ok to use. Anyway allowing for only a lower amount of clipped samples provides for a higher degree of safety and might be the better choice in case file size remains similar.
Nick.C
Jan 16 2008, 13:53
lossyWAV beta v0.7.0 attached to post #1 in this thread.
Very pleased with the speed now - beta v0.7.0 processes 125MB of WAV in 14 seconds (average 53.1x) on an Intel C2D E6600 @ 3.0GHz, 2 x 80GB HDD in RAID0, Windows XP SP2
halb27
Jan 16 2008, 14:55
My 0.7.0 result for AlexB's track:
-clips 3: -5,0%
-clips 2: -5,0% (file size with my FLAC setting: 1523674)
-clips 1: -4,7%
My 0.7.0 result for my 7 Francoise Hardy tracks:
-clips 3: -19,2%
-clips 2: -18,4%
-clips 1: -14,2%
So from this the essential reduction in filesize is achieved already with just 1 allowed sample per block to clip.
2 allowed samples to clip is attractive to some extent, but to a minor degree.
3 allowed samples to clip brings only an insignificant advantage and is not attractive.
More than 3 allowed samples to clip is useless in a practical sense.
So I think we have 4 useful choices:
a) 1 allowed clipped sample per block for -2, 2 allowed clipped samples per block for -3.
b) 1 allowed clipped sample per block for -2 and -3.
c) full clipping prevention with -2, 1 allowed clipped sample per block for -3.
d) full clipping prevention with -2, 2 allowed clipped samples per block for -3.
I personally don't care much about whether we should allow for 1 or 2 clipped samples with -3. I think both choices are fully in congruence with what we want to achieve with -3.
I'm more worried about whether or not we should allow for clipped samples with -2. I feel a bit uncomfortable due to the nature of -2 and the distortion 2Bdecided mentioned when allowing clipping to occur though I don't think that this can be audible.
Nick.C
Jan 16 2008, 15:03
QUOTE(halb27 @ Jan 16 2008, 20:55)

So I think we have 4 useful choices:
a) 1 allowed clipped sample per block for -2, 2 allowed clipped samples per block for -3.
b) 1 allowed clipped sample per block for -2 and -3.
c) full clipping prevention with -2, 1 allowed clipped sample per block for -3.
d) full clipping prevention with -2, 2 allowed clipped samples per block for -3.
I personally don't care much about whether we should allow for 1 or 2 clipped samples with -3. I think both choices are fully in congruence with what we want to achieve with -3.
I'm more worried about whether or not we should allow for clipped samples with -2. I feel a bit uncomfortable due to the nature of -2 and the distortion 2Bdecided mentioned when allowing clipping to occur though I don't think that this can be audible.
Is *one* "distorted" sample (22.68 microseconds) going to be of any real significance? Personally, unless over-ruled by someone with more expert knowledge, is -1=0; -2=1; -3=2, i.e. as per beta v0.7.0.
I'm really pleased about the reduction on your badly clipping tracks : -18.4% is excellent!
This modification may bring down the average bitrate for -3 a bit to bring is a bit closer to that for v0.6.4 RC1.
halb27
Jan 16 2008, 15:29
QUOTE(Nick.C @ Jan 16 2008, 23:03)

Is *one* "distorted" sample (22.68 microseconds) going to be of any real significance? Personally, unless over-ruled by someone with more expert knowledge, is -1=0; -2=1; -3=2, i.e. as per beta v0.7.0.
So OK then.
QUOTE(Nick.C @ Jan 16 2008, 23:03)

This modification may bring down the average bitrate for -3 a bit to bring is a bit closer to that for v0.6.4 RC1.
This is another story and depends heavily on the degree of clipped tracks in the user's collection.
For a short impression I encoded the small regular track set I used so often to find out about the average bitrate for a lossyWAV version:
0.6.7RC2 -3: Total filesize: 141231970
0.7.0 -3 -clips 2: Total filesize: 141227370
Difference: -0,003%
Nick.C
Jan 16 2008, 15:38
QUOTE(halb27 @ Jan 16 2008, 21:29)

For a short impression I encoded the small regular track set I used so often to find out about the average bitrate for a lossyWAV version:
0.6.7RC2 -3: Total filesize: 141231970
0.7.0 -3 -clips 2: Total filesize: 141227370
Difference: -0,003%
Hehehe..... there is a small penalty for including the -clips parameter : " -clips n" is added to the parameter string in the "fact" chunk in the wav file.....
I just transcoded my Mike Oldfield collection (261 tracks, 24h30m12s) in 40m48s : an average throughput (FLAC [from NAS] > WAV [local]> lossyWAV > FLAC) of 36x - and lo and behold, there was *no* difference at all in the total filesize (from v0.6.7 RC2 and beta v0.7.0) that couldn't be explained by the extra 9 bytes per file!
halb27
Jan 16 2008, 16:36
When it was up to introducing our current clipping prevention scheme I searched hard for clipped tracks with the result that my collection has clipping next to nothing.
IMO we should stick to your suggestion for -3. After all clipping exists. But I think my clipping album is an extreme case of clipping, and AlexB's sample is more representative for clipped tracks.
Because of this and the fact that clipping is very rare I suggest to allow only 1 clipped sample in a block for -3, and keep the clipping prevention scheme in full action with -2.
Nick.C
Jan 19 2008, 09:20
QUOTE(halb27 @ Jan 16 2008, 22:36)

When it was up to introducing our current clipping prevention scheme I searched hard for clipped tracks with the result that my collection has clipping next to nothing.
IMO we should stick to your suggestion for -3. After all clipping exists. But I think my clipping album is an extreme case of clipping, and AlexB's sample is more representative for clipped tracks.
Because of this and the fact that clipping is very rare I suggest to allow only 1 clipped sample in a block for -3, and keep the clipping prevention scheme in full action with -2.
I would certainly agree that -1 should have full clipping prevention. For -2, maybe 1 clip per channel per codec_block would be acceptable. For -3, 2 clips per channel per codec_block seems to work well. I will strip out the -clips parameter and post v0.7.1 RC3 sometime tomorrow.
I've been optimising in IA-32 / x87 again and the speed is getting marginally better.
Given that v0.6.7 RC2 has 95 downloads at the moment with no negative comments, I feel that we're *really* close to v1.0.0 final - we just have to agree amongst ourselves as to the exact number of (rounded down) clips acceptable for each quality preset.
halb27
Jan 19 2008, 15:58
QUOTE(Nick.C @ Jan 19 2008, 17:20)

I would certainly agree that -1 should have full clipping prevention. For -2, maybe 1 clip per channel per codec_block would be acceptable. For -3, 2 clips per channel per codec_block seems to work well ... we just have to agree amongst ourselves as to the exact number of (rounded down) clips acceptable for each quality preset.
Well to me the pretty rare event of clipping is an argument not to circumvent the clipping protection scheme with -2 and keep -2 in a 'pure' form. But we shouldn't continue this forever, so do go ahead with your favorite choice. I also don't think giving away clipping protection for 1 sample per channel and block will be audible.
Nick.C
Jan 22 2008, 15:13
QUOTE(halb27 @ Jan 19 2008, 21:58)

Well to me the pretty rare event of clipping is an argument not to circumvent the clipping protection scheme with -2 and keep -2 in a 'pure' form. But we shouldn't continue this forever, so do go ahead with your favorite choice. I also don't think giving away clipping protection for 1 sample per channel and block will be audible.
lossyWAV beta v0.7.1 attached to post #1 in this thread.
halb27
Jan 22 2008, 16:16
Thank you Nick, especially for -noclips.
Can you tell a bit about the new window function and the noise constants? Are these changes conservative, or is it necessary to do some testing?
Nick.C
Jan 22 2008, 16:24
QUOTE(halb27 @ Jan 22 2008, 22:16)

Thank you Nick, especially for -noclips.
Can you tell a bit about the new window function and the noise constants? Are these changes conservative, or is it necessary to do some testing?
I had a brief PM discussion with David and I realised that I was using a zero-ended window function - values 0.5 fft_length apart did not sum to 1. I modified this slightly and the values 0.5 fft_length apart now sum to 1. The noise constants were re-calculated and incorporated into the code.
The bitrate has come down slightly (462.22kbps [v0.6.7 RC2] to 461.54kbps [beta v0.7.1]) at -3 for my 53 sample set.
If you have the time, I would welcome validation of the new window function. However, I feel that all it does is to use more samples per codec_block (64 not 62, etc.) so it should not sacrifice quality.
Nick.C
Jan 23 2008, 15:43
I had a thought - at present -1 uses 4 FFT's (64, 128, 512 & 1024 samples); -2 uses 2 FFT's (64, 256 & 1024 samples) and -3 uses 2 FFT's (64 & 1024 samples).
I am thinking about implementing a "-extrafft" parameter to add an extra FFT analysis to the existing quality preset at the user's discretion, which will basically increase the processing time, but also increase the scope of the analysis.
In this way, -1 would use 5 FFT's (64, 128, 256, 512 & 1024 samples); -2 would use 4 FFT's (64, 128, 512 & 1024 samples) and -3 would use 3 FFT's (64, 256 & 1024 samples).
Thoughts, anyone?
halb27
Jan 24 2008, 09:06
From my understanding we need more than 1 FFT because for getting a good temporal resolution (catching transients) we need a rather short FFT, and for a good frequency resolution in the low to medium frequency range we need a long FFT (for the very high frequency range the short FFT should be sufficient).
From this and from practical results I don't see why we should have more than 2 FFTs. It's okay to have an addtional FFT for safety reasons when going from -3 to -2, and from -2 to -1, but I don't see why we should go beyond that.
Sorry for being so negative towards your recents suggestions. I see you're eager to get further improvements the one or other way.
To me personally I think things are very good and don't need refinement as long as no issues come up in practice.
The only thing I personally would like to have consideration one last time is the coverage of the FFT windows of the 1024 sample FFT over the block.
We have a different sight on this as you feel the need that there is a 50% overlap of FFT analyses between adjacent blocks. I don't see any overlapping necessary for the blocks, and I think your consideration is based on one of 2Bdecided's remarks but I beleive this is a misunderstanding. Bits to remove decision in my understanding is not a global decision, not a block-overlapping decision, IMO not even a block-orientated decision (but I think the latter has no practical impact). IMO we can assign to each singular sample a number of bits to remove based on the analysis of those FFT windows where the specific sample has a contribution.
Block consideration comes in as the lowest number of bits to remove (per sample) must be chosen in order to assign a bits to remove number to the block. Moreover it's useful to base the FFT window partitioning based on the block. What's necessary is a good overlapping of the FFTs in the block under consideration. According to 2Bdecided the overlapping of the FFT windows (within the block!) should be 50% or more. We had a discussion before from which I thought we have an overlapping of 5/8 but IIRC this is not practice currently.
My suggestion once was that in order to have the overlapping not to go far into neighboring blocks as their samples have nothing to do with the block under consideration, and I suggested to have the centre point of the most outward FFT windows a little bit within the block. 2Bdecided preferred the edge position because of a good temporal resolution at the very beginning and end of the block, but this cannot be an issue with the 1024 sample FFT, simply because this job is up to the short FFT.
So can you please reconsider using the following 1024 sample FFT windows: -448:575, -64:959. With this the center of the FFT window is just 64 samples (1/16 of the window length) away from the edges, and think this isn't a problem for catching problems at the edges (and temporal resolution issues are catched up by the short FFT, not the 1024 sample FFT, and the 64 sample FFTs can be centered at the edges). The advantage is in the middle area of the block as this area is covered better now by the 2 FFT windows. With the center points at the very edges we are already 50% away from the FFT centers when it's about the middle of the block, and the samples there participate only partially in the FFT analyses. If you do a third long FFT centered at the block center the way you wrote about (but I'm not sure whether this is in action right now) things are alright of course, but at the cost of an additional FFT window.
Nick.C
Jan 24 2008, 13:50
QUOTE(halb27 @ Jan 24 2008, 15:06)

From my understanding we need more than 1 FFT because for getting a good temporal resolution (catching transients) we need a rather short FFT, and for a good frequency resolution in the low to medium frequency range we need a long FFT (for the very high frequency range the short FFT should be sufficient).
From this and from practical results I don't see why we should have more than 2 FFTs. It's okay to have an addtional FFT for safety reasons when going from -3 to -2, and from -2 to -1, but I don't see why we should go beyond that.
Sorry for being so negative towards your recents suggestions. I see you're eager to get further improvements the one or other way.
To me personally I think things are very good and don't need refinement as long as no issues come up in practice.
The only thing I personally would like to have consideration one last time is the coverage of the FFT windows of the 1024 sample FFT over the block.
We have a different sight on this as you feel the need that there is a 50% overlap of FFT analyses between adjacent blocks. I don't see any overlapping necessary for the blocks, and I think your consideration is based on one of 2Bdecided's remarks but I beleive this is a misunderstanding. Bits to remove decision in my understanding is not a global decision, not a block-overlapping decision, IMO not even a block-orientated decision (but I think the latter has no practical impact). IMO we can assign to each singular sample a number of bits to remove based on the analysis of those FFT windows where the specific sample has a contribution.
Block consideration comes in as the lowest number of bits to remove (per sample) must be chosen in order to assign a bits to remove number to the block. Moreover it's useful to base the FFT window partitioning based on the block. What's necessary is a good overlapping of the FFTs in the block under consideration. According to 2Bdecided the overlapping of the FFT windows (within the block!) should be 50% or more. We had a discussion before from which I thought we have an overlapping of 5/8 but IIRC this is not practice currently.
My suggestion once was that in order to have the overlapping not to go far into neighboring blocks as their samples have nothing to do with the block under consideration, and I suggested to have the centre point of the most outward FFT windows a little bit within the block. 2Bdecided preferred the edge position because of a good temporal resolution at the very beginning and end of the block, but this cannot be an issue with the 1024 sample FFT, simply because this job is up to the short FFT.
So can you please reconsider using the following 1024 sample FFT windows: -448:575, -64:959. With this the center of the FFT window is just 64 samples (1/16 of the window length) away from the edges, and think this isn't a problem for catching problems at the edges (and temporal resolution issues are catched up by the short FFT, not the 1024 sample FFT, and the 64 sample FFTs can be centered at the edges). The advantage is in the middle area of the block as this area is covered better now by the 2 FFT windows. With the center points at the very edges we are already 50% away from the FFT centers when it's about the middle of the block, and the samples there participate only partially in the FFT analyses. If you do a third long FFT centered at the block center the way you wrote about (but I'm not sure whether this is in action right now) things are alright of course, but at the cost of an additional FFT window.
The -448/-64 method does not benefit from the code speedup as the 0:1023 existing FFT is recycled as the -512:511 in the next codec_block, neither does it benefit from a 50% overlap between FFT analyses.
I would rather go down the -256:767 route if we are going to deviate from the -512:511;0:1023 route. Someone with more knowledge than me should ultimately make the decision, but if the existing -512:511;0:1023 is not acceptable then my preference is clear.
halb27
Jan 24 2008, 16:35
QUOTE(Nick.C @ Jan 24 2008, 21:50)

The -448/-64 method does not benefit from the code speedup as the 0:1023 existing FFT is recycled as the -512:511 in the next codec_block ...
I see, this was the speedup trick. Clever done.
QUOTE(Nick.C @ Jan 24 2008, 21:50)

... neither does it benefit from a 50% overlap between FFT analyses. ....
I do not understand why you want a 50% FFT overlap for neighboring blocks. We have a per block analysis and determination of bits to remove. Ideally we don't consider samples at all from neighboring blocks, it is a negative side effect that we have to accept due to the nature of the FFT window. Sure we want accuracy at the block's edges so the FFT windows will reach into the neighboring block. But to do so to a smaller degree than 50% if we can allow is better than reaching 50% into the neighborhood.
But the speedup thing is valuable, especially for -3 with its excellent speed.
What do you think about the -448:575, -64:959 windows for -2, or at least for -1?
Nick.C
Jan 25 2008, 02:45
QUOTE(halb27 @ Jan 24 2008, 22:35)

QUOTE(Nick.C @ Jan 24 2008, 21:50)

The -448/-64 method does not benefit from the code speedup as the 0:1023 existing FFT is recycled as the -512:511 in the next codec_block ...
I see, this was the speedup trick. Clever done.
QUOTE(Nick.C @ Jan 24 2008, 21:50)

... neither does it benefit from a 50% overlap between FFT analyses. ....
I do not understand why you want a 50% FFT overlap for neighboring blocks. We have a per block analysis and determination of bits to remove. Ideally we don't consider samples at all from neighboring blocks, it is a negative side effect that we have to accept due to the nature of the FFT window. Sure we want accuracy at the block's edges so the FFT windows will reach into the neighboring block. But to do so to a smaller degree than 50% if we can allow is better than reaching 50% into the neighborhood.
But the speedup thing is valuable, especially for -3 with its excellent speed.
What do you think about the -448:575, -64:959 windows for -2, or at least for -1?
I hear what you say - all three options are now available in lossyWAV beta v0.7.2, attached to post #1 in this thread.
halb27
Jan 25 2008, 06:22
QUOTE(Nick.C @ Jan 25 2008, 10:45)

... I hear what you say - all three options are now available in lossyWAV beta v0.7.2, attached to post #1 in this thread.
Wonderful - you make me happy. Thank you very much.
2Bdecided
Jan 25 2008, 07:07
QUOTE(halb27 @ Jan 24 2008, 22:35)

Ideally we don't consider samples at all from neighboring blocks, it is a negative side effect that we have to accept due to the nature of the FFT window.
Not quite - the concepts of time and frequency are linked, and you can only have the frequency accuracy of a 1024-point FFT by looking at 1024 samples. If you want that accuracy (and I believe we do) you need that many samples. So no, even "ideally" we need to consider samples from neighboring blocks - just as, at the limit, a single sample tells you nothing.
As long as there is at least 50% overlap, and all the blocks is covered by a 0.5 or higher parts of the window function, it really doesn't matter which of the two or three proposed schemes you use.
You're looking for the quietest part, and that could be anywhere in the block. Focusing on the start, middle, end, or any point(s) in between has no advantage in this respect.
What we do know is that something special can happen at block boundaries which cannot happen anywhere else (we introduce a transition), so focussing on these has some merrit, but I wouldn't argue to the death for it!
The worst case scenario is this: you have a notch in the frequency spectrum that's narrow enough that you need a 1024 point FFT to catch it (otherwise the shorter FFT will catch it anyway, and the position of the 1024 point FFT doesn't matter!). Now, switch this notch in and out at block boundaries, so one block has it, and the next doesn't. If the notch is in white noise, we won't hear the switching transients, so we can switch in a single sample.
So, if you use a 64-point FFT, you can't see this notch - it's too narrow.
Yet if you use a 1024-point FFT, you'll hit your problem - the centred window sees more of the notch than the edge window.
Does it make any audible difference? I can't tell, but I've attached a sample if you want to check. It's only 1 second long, the first 1/2 has alternate filtered/not filtered 512-sample blocks. The second 1/2 is all unfiltered. You can clearly hear the difference between these two, but does lossywav processing change it at all, with either window position?
Cheers,
David.
Nick.C
Jan 25 2008, 07:22
Thanks David,
The idea of using the centred analysis, i.e. -256:767, has the whole codec_block in the 50% or higher zone and will also include 256 samples from the codec_blocks either side, although that does mean that the block edge samples are only at 50%.
However, prioritising the codec_block edges, the existing method (-512:511; 0:1023) has the samples at each end of the codec_block at 100% in one or other analysis.
[edit] Your sample using v0.7.2, FLAC -5, -3: (e) 54031bytes; -3 -overlap: (o) 54158bytes; -3 -centre: © 53326 bytes. (attached) Will try listening to them. [/edit]
[edit2] All I'm getting is a slight difference in tone of a sub-frequency that isn't the noise itself..... Mind you, my ears have visited !loud! environments a few times too often. [/edit2]
2Bdecided
Jan 25 2008, 10:19
It was the lowest frequencies that I removed, but my ears at least can't hear that they're absent/present/absent/present every 512 samples in the original file - they just sound quieter overall for the first half of the file.
Looking at the bits removed, the different modes are doing what you'd expect: the centred mode clearly picks up the blocks with the notch filter and removes fewer bits (and removes more bits where there's pure white noise), the others don't really notice a difference.
However, during the second 1/2 of the file (which is just white noise), the centred mode jumps around a lot in bits removed even though there's no difference (other than the noise being random) between blocks. I've attached an image showing how the added noise jumps around (all three lossy-original=difference signals boosted by 42dB for display).
All three lossy versions sound the same as the original to me.
If anyone can think of a more critical test sample, please post.
Cheers,
David.
halb27
Jan 25 2008, 12:05
QUOTE(2Bdecided @ Jan 25 2008, 15:07)

QUOTE(halb27 @ Jan 24 2008, 22:35)

Ideally we don't consider samples at all from neighboring blocks, it is a negative side effect that we have to accept due to the nature of the FFT window.
Not quite - the concepts of time and frequency are linked, and you can only have the frequency accuracy of a 1024-point FFT by looking at 1024 samples. If you want that accuracy (and I believe we do) you need that many samples. So no, even "ideally" we need to consider samples from neighboring blocks - just as, at the limit, a single sample tells you nothing.
As long as there is at least 50% overlap, and all the blocks is covered by a 0.5 or higher parts of the window function, it really doesn't matter which of the two or three proposed schemes you use.
You're looking for the quietest part, and that could be anywhere in the block. Focusing on the start, middle, end, or any point(s) in between has no advantage in this respect.
What we do know is that something special can happen at block boundaries which cannot happen anywhere else (we introduce a transition), so focussing on these has some merrit, but I wouldn't argue to the death for it!
The worst case scenario is this: you have a notch in the frequency spectrum that's narrow enough that you need a 1024 point FFT to catch it (otherwise the shorter FFT will catch it anyway, and the position of the 1024 point FFT doesn't matter!). Now, switch this notch in and out at block boundaries, so one block has it, and the next doesn't. If the notch is in white noise, we won't hear the switching transients, so we can switch in a single sample. ...
I think it's all a misunderstanding, probably I didn't make my point clear enough.
Of course we want a 1024 sample FFT, and of course every sample in the 1024 sample window counts, and of course if we want accuracy at the edge any 1024 sample FFT window which takes good care of the edges stretches its samples in a significant way into the neighboring block.
I just call this a negative side effect as we do want to assign a number of bits to remove to the block under consideration, and in this respect it's a negative (though necessary) side effect in my understanding.
In the end: do you think with the two FFT windows -448:575, -64:959 for the 0:511 block the edges are not covered well by these?
As for your sample I didn't understand what you want to show other than that a good accuracy for the edge region is needed for the 1024 sample FFT. That's again the question to me: don't you think the -448:575, -64:959 windows are a good choice for preserving the accuracy of the 1024 FFT at the edges?
According to your graphs for your sample BTW noise is (slightly) lower with these windows then with the exactly edge positioned ones.
I guess we have the same thing in mind: accuracy at the edges, but for that IMO the centre point needn't be exactly at the edge but can be a little bit interior to the block. The advantage is that with such a choice the centre region is taken better care of which is a bit underexposed with the center of the 2 FFT windows situated exactly at the edges.
halb27
Jan 25 2008, 16:57
As there were several changes since I tested lossyWAV the last time, I did it again (using -3 -noclips -overlap) and tried to abx my usual problem samples and 2 regular tracks with french female voices.
Everything's fine. The only slight suspicion was with badvilbel where I thought I could hear more noise than in the original. I arrived at 4/4 which turned into a 5/10 finally. So I can't abx it.
Nick.C
Jan 27 2008, 13:47
QUOTE(halb27 @ Jan 25 2008, 22:57)

As there were several changes since I tested lossyWAV the last time, I did it again (using -3 -noclips -overlap) and tried to abx my usual problem samples and 2 regular tracks with french female voices.
Everything's fine. The only slight suspicion was with badvilbel where I thought I could hear more noise than in the original. I arrived at 4/4 which turned into a 5/10 finally. So I can't abx it.
To allow better tuning of this particular variable, I'll revise the -overlap parameter to take a value (0..16) which will set the overlap of the 1024 sample FFT to 512-16*(overlap_value), i.e. 512..256 samples. I will revise the -centre parameter to add in a central 1024 sample FFT where overlap size>256.
lossyWAV beta v0.7.3 attached to post #1 in this thread.
halb27
Jan 27 2008, 15:36
Great, Nick! Thank you very much (cause I was thinking already that it wouldn't be bad to have the center of the 1024 sample windows a further bit more inside the block.
Just in order that I don't do something wrong can you please comfirm ot tell me I'm wrong:
a) -overlap 6 means: we have 2 1024 sample FFT windows per block with the center of each being in the block and 96 samples away from the edges?
b) -centre means: we have 3 1024 sample FFT windows per block, 1 with the centre at the block's centre and 2 with the centre at the block's edges?
Nick.C
Jan 27 2008, 15:41
QUOTE(halb27 @ Jan 27 2008, 21:36)

Great, Nick! Thank you very much (cause I was thinking already that it wouldn't be bad to have the center of the 1024 sample windows a further bit more inside the block.
Just in order that I don't do something wrong can you please comfirm ot tell me I'm wrong:
a) -overlap 6 means: we have 2 1024 sample FFT windows per block with the center of each being in the block and 96 samples away from the edges?
b) -centre means: we have 3 1024 sample FFT windows per block, 1 with the centre at the block's centre and 2 with the centre at the block's edges?
To summarise:
-overlap 0 := -512:511;0:1023;
-overlap 4 := -448:575;-64:959;
-overlap 8 := -384:639; -128:895;
-overlap 12 := -320:703; -192:831;
-overlap 16 := -256:767;
-centre := additional -256:767. (unless -overlap 16 has been specified, obviously

).
Have fun!
halb27
Jan 27 2008, 15:53
Thanks a lot!
Nick.C
Jan 27 2008, 16:06
Oops - immediate bug-fix (affects v0.7.2 and the first two downloaders of v0.7.3). The end_overlap of the second (possibly third) FFT analyses at 1024 sample length was being calculated incorrectly (it was still assuming end_overlap = fft_length div 2). Apologies for the error.
halb27
Jan 28 2008, 12:58
Using new v0.7.3 -3 -noclips -overlap 8 I tried my usual killer samples as well as some regular music again.
Everything's fine.
Average bitrate for my sample set of full length regular tracks is exactly 400 kbps.
Nick.C
Jan 28 2008, 13:22
QUOTE(halb27 @ Jan 28 2008, 18:58)

Using new v0.7.3 -3 -noclips -overlap 8 I tried my usual killer samples as well as some regular music again.
Everything's fine.
Average bitrate for my sample set of full length regular tracks is exactly 400 kbps.
Good to hear - I'll leave the -overlap parameter where it is at the moment.
I've added a few options to the quality presets:
-1 now has an additional variant -1a (1 added FFT length);
-2 now has 2 additional variants -2a & -2b (1 and 2 added FFT lengths respectively);
-3 now has 3 additional variants -3a, -3b & -3c (1, 2 and 3 added FFT lengths respectively);
In this way, the user can opt to spend a bit more time on the processing (if time is not an important factor) by carrying out FFT analyses at additional FFT lengths.
-extrafft parameter removed as superseded.
lossyWAV beta v0.7.4 attached to post #1 in this thread.
[edit] Immediate update required: I must have "broken" the 24-bit handling some time ago.... Now fixed at beta v0.7.5 in the usual place. [/edit]
silverfire
Jan 31 2008, 06:38
Not a big issue, but 0.7.5 beta still says 0.7.4

QUOTE
lossyWAV beta v0.7.4, Copyright © 2007,2008 Nick Currie.
Nick.C
Jan 31 2008, 07:01
QUOTE(silverfire @ Jan 31 2008, 12:38)

Not a big issue, but 0.7.5 beta still says 0.7.4

QUOTE
lossyWAV beta v0.7.4, Copyright © 2007,2008 Nick Currie.

Oops.... Will be corrected in beta v0.7.6 - I'm trying to implement the -merge parameter to revert lossy + lwcdf to lossless.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.