lossyWAV Development, WAV bit reduction by 2BDecided 
 No over 30 sec clips of copyrighted music. Cite properly and never more than necessary for the discussion.
 No copyrighted software without permission.
 Click here for complete Hydrogenaudio Terms of Service
lossyWAV Development, WAV bit reduction by 2BDecided 
Oct 29 2007, 22:23
Post
#401


lossyWAV Developer Group: Developer Posts: 1772 Joined: 11April 07 From: Wherever here is Member No.: 42400 
Wonderful. Something like this is what I expected. It might be worth being less conservative with the nts parameter, i.e. try nts 0 for 3 to see what that does to the bitrate. On my "problem" set:3 nts 0.5 skew 24 snr 12 > 458.7kbps; (default 3) 3 nts 0.5 skew 18 snr 12 > 446.1kbps; 3 nts 0.5 skew 18 snr 18 > 448.3kbps; 3 nts 0 skew 12 snr 6 > 433.3kbps; 3 nts 0 skew 12 snr 12 > 433.3kbps; 3 nts 0 skew 12 snr 18 > 435.8kbps; 3 nts 0 skew 18 snr 12 > 440.2kbps; 3 nts 0 skew 18 snr 18 > 442.9kbps. 3 nts 0 skew 24 snr 12 > 452.9kbps;  lossyWAV q X a 4 feedback 4 FLAC 8 ~= 320kbps



Oct 29 2007, 22:51
Post
#402


Group: Members Posts: 2414 Joined: 9October 05 From: Dormagen, Germany Member No.: 25015 
Hi Nick,
I just started examining the behavior of 3 with respect to skew and snr. I only started using snr cause I think there's something wrong: CODE  skew 0  skew 12 skew 24 skew 36 snr 0 390 / 510390 / 510390 / 510390 / 510 These values can't be identical to my former test, cause I used FLAC b 1024 then and FLAC b 512 now. But I wonder what's wrong hear: identical results with various skew values is not what I expected. 390/510 is a good result IMO, but is expected to be achieved with around skew 24. This post has been edited by halb27: Oct 29 2007, 22:52  lame3100m bCVBR 300



Oct 29 2007, 23:13
Post
#403


lossyWAV Developer Group: Developer Posts: 1772 Joined: 11April 07 From: Wherever here is Member No.: 42400 
Hi Nick, I've run some skew tests on my 52 sample set:I just started examining the behavior of 3 with respect to skew and snr. I only started using snr cause I think there's something wrong: CODE  skew 0  skew 12 skew 24 skew 36 snr 0 390 / 510390 / 510390 / 510390 / 510 These values can't be identical to my former test, cause I used FLAC b 1024 then and FLAC b 512 now. But I wonder what's wrong hear: identical results with various skew values is not what I expected. 390/510 is a good result IMO, but is expected to be achieved with around skew 24. 3 skew 0 snr 0 > 433.0kbps; 3 skew 6 snr 0 > 435.3kbps; 3 skew 12 snr 0 > 439.1kbps; 3 skew 18 snr 0 > 446.1kbps; 3 skew 24 snr 0 > 458.7kbps; 3 skew 30 snr 0 > 479.8kbps; 3 skew 36 snr 0 > 511.3kbps. Is it possible that *none* of your samples have a minimum result below 3.45kHz? This post has been edited by Nick.C: Oct 29 2007, 23:13  lossyWAV q X a 4 feedback 4 FLAC 8 ~= 320kbps



Oct 30 2007, 08:44
Post
#404


Group: Members Posts: 40 Joined: 2April 06 Member No.: 29099 
@Mitch 1 2  Excellent find! Should extend the userbase of David's method...... Nice to see that WMALSL is working. I gave it a quick run and, with my old version of WMALSL, it looks like best frame size for that codec is 2048. When somebody else can confirm that is the case also with newer versions, we may want to add a dedicated switch, to avoid people using it with frame size 512 or 1024. By the way @2048 WMALSL performs halfway between TAK and FLAC. Set F, WMALSLWMP9, 0.3.18 11236FFFFF1246DFFFFF CODE    
  1  2  3       512  434  427  425   1024  432  427  425   2048  430  424  422   4096  460  453  451      


Oct 30 2007, 08:55
Post
#405


lossyWAV Developer Group: Developer Posts: 1772 Joined: 11April 07 From: Wherever here is Member No.: 42400 
@Mitch 1 2  Excellent find! Should extend the userbase of David's method...... Nice to see that WMALSL is working. I gave it a quick run and, with my old version of WMALSL, it looks like best frame size for that codec is 2048. When somebody else can confirm that is the case also with newer versions, we may want to add a dedicated switch, to avoid people using it with frame size 512 or 1024.By the way @2048 WMALSL performs halfway between TAK and FLAC. Set F, WMALSLWMP9, 0.3.18 11236FFFFF1246DFFFFF CODE       1  2  3       512  434  427  425   1024  432  427  425   2048  430  424  422   4096  460  453  451       lossyWAV q X a 4 feedback 4 FLAC 8 ~= 320kbps



Oct 30 2007, 09:02
Post
#406


Group: Members Posts: 2414 Joined: 9October 05 From: Dormagen, Germany Member No.: 25015 
Hi Nick, I just started examining the behavior of 3 with respect to skew and snr. ... I've run some skew tests on my 52 sample set: 3 skew 0 snr 0 > 433.0kbps; 3 skew 6 snr 0 > 435.3kbps; 3 skew 12 snr 0 > 439.1kbps; 3 skew 18 snr 0 > 446.1kbps; 3 skew 24 snr 0 > 458.7kbps; 3 skew 30 snr 0 > 479.8kbps; 3 skew 36 snr 0 > 511.3kbps. Is it possible that *none* of your samples have a minimum result below 3.45kHz? My problem sample set should respond at least as heavy as yours on skew variation (and it did with my v0.3.18 test). Thanks for your test. I must have done something wrong and will look into it.  lame3100m bCVBR 300



Oct 30 2007, 09:20
Post
#407


Group: Members Posts: 2414 Joined: 9October 05 From: Dormagen, Germany Member No.: 25015 
So, a wm parameter to set codec_block_size to 2048 for all quality levels for WMALSL? Nice we have another candidate for a codecspecific option. As for the internal codec block size: I think if we're working internally with a blocksize of 1024, there is no problem to use a blocksize of 2048 with a lossless encoder if this is most effective with it in an overall sense. Lossless encoder blocksize should be just a multiple of the internal lossyWav blocksize. But it brings up the question: what is the meaning of our lossyWav internal blocksize at all? Taking the big view not looking at internal details we have a two stage process: Stage 1: Transform the input wav file to an output wav file with the effect of bringing as many LSBs of each sample to zero as long as we can expect this doesn't have an audible impact. Stage 2: Use a lossless codec on the output of stage 1. In principle there is no use talking about blocks within stage 1. We can think of the stage 1 process as of a process concerning each sample individually. We should give advice for blocksize use with the various encoders. Encoders take profit from short blocks as this adapts best to what's done in stage 1. But as encoders are partially not efficient with short blocks (wavPack, WMAlossless) a best general compromise has to be found for each codec. This seems to be not difficult. I guess thinking of a codec blocksize within stage 1 is mixed up with what it's really up to: FFT windowing. When getting it clearer we may improve things  maybe as well as with respect to quality as well as with respect to practical usage. This post has been edited by halb27: Oct 30 2007, 09:21  lame3100m bCVBR 300



Oct 30 2007, 11:26
Post
#408


Group: Members Posts: 31 Joined: 3October 06 From: Australia Member No.: 35904 
Nice to see that WMALSL is working. I gave it a quick run and, with my old version of WMALSL, it looks like best frame size for that codec is 2048. When somebody else can confirm that is the case also with newer versions, we may want to add a dedicated switch, to avoid people using it with frame size 512 or 1024. I came to the same conclusion, but I used a hex editor.  lossyFLAC (lossyWAV q 0; FLAC b 512 e)



Oct 30 2007, 11:35
Post
#409


Group: Members Posts: 2414 Joined: 9October 05 From: Dormagen, Germany Member No.: 25015 
Just for knowledge about the FFT windowing / codec block details:
Is my imagination about the current way of doing it correct?: For definiteness let's talk about 3 (codec block size: 512, FFT lengths of 64 and 1024), and for a moment let's ignore the effect of spreading, skewing and using snr. We're looking at a specific 512 sample codec block CB. FFT analysis of length 1024 is done starting with the first sample of CB. The analysis result is applied to all the 512 samples of CB. FFT analysis of length 64 is applied to the 8 consecutive 64samplesubblocks SB1, ... , SB8 of CB. In principle we can look at each of the SB1, ..., SB8 seperately and apply the FFT analysis of length 1024 to any of these subblocks currently under investigation: look for the lowest bin in both FFTs and decide about the number of bits to remove based on this minimum bin. In principle this needs to be restricted to only to the 64samplesubblocks, but we use it as a temporary result, look at all the subblocks of CB, and then  based on the subresults of each subblock decide on the bits to remove for the entire CB. In principle we can decide on the bits to remove on a 64 sample block basis which corresponds to the short 64 sample FFT. Sure we mix information that belong to 1024 samples with information that belong to 64 samples which formally is not correct. But if we want to be that correct we also may not use a codec blocksize of 512 with a FFT length of 1024 (or as with 1 a codec blocksize of 1024 with a FFT length of 2048 [which resulted from a probably bad idea of mine  so we should either return back to a blocksize of 2048 or maybe better skip the 2048 FFT]). Other than that we can improve when thinking of more adequate FFT windows  for instance build several 1024 sample FFT windows (8 in the extreme case) in a way that the 64 sample window under investigation is more or less in the center of a 1024 sample FFT window. Or something more intelligent. Brings back the idea of overlapping FFT analysis you offered already, Nick, in a specific form. Anyway, by a considerations like these we seperate FFT analysis considerations from codec block size considerations which should belong to the lossless encoder of stage 2 alone. Edited: nonsense removed. This post has been edited by halb27: Oct 30 2007, 13:43  lame3100m bCVBR 300



Oct 30 2007, 13:29
Post
#410


Group: Members Posts: 2414 Joined: 9October 05 From: Dormagen, Germany Member No.: 25015 
Just one more idea:
Though I love the idea of deciding (at least in principle) for each individual sample about the number of bits to remove we can see it a bit more practically: Overall view: Our stage 1 process provides blocks of 512 samples, and all samples within this block have the same number of bits removed. We do it under all circumstances, that is especially for 1, 2, 3. This way we are free with the stage 2 encoder to use any multiple of 512 as the blocksize, and for our best knowledge so far it's easy to find an appropriate blocksize (for instance 512 for FLAC and TAK, 1024 for wavPack, 2048 for WMAlossless). Especially bitrate for 2 would still go down a bit with FLAC and TAK and a blocksize of 512. Detail view for stage 1: With a 512 sample block we can easily let it consist of several consecutive length64FFT and (for 1, 2) length256FFT windows. We can build for each 512 sample block an individual length1024FFT in a way that our 512 sample block lies in the middle of the 1024 sample FFT window. (Looking at only the length1024FFT windows: these cover the entire track overlappingly). May be it's good to apply a complex FFT window function for the length1024FFT, but I guess the simple approach is good enough. The length1024FFT window contains information from 256 samples in front of and after the block which make up for an inaccuracy. These access sample window parts correspond to ~5.8 msec each  a pretty short period IMO. Moreover in case the shorter FFTs have an independent influence on the number of bits to remove I don't think this is a dangerous inaccuracy. What I mean is: if one of the shorter FFTs yields a very low value bin, and if there's no lower one in the length1024FFT, this low value bin from a shorter FFT decides on the number of bits to remove. But this is the place IMO where we should say goodbye to length2048FFTs. This post has been edited by halb27: Oct 30 2007, 14:01  lame3100m bCVBR 300



Oct 30 2007, 14:04
Post
#411


lossyWAV Developer Group: Developer Posts: 1772 Joined: 11April 07 From: Wherever here is Member No.: 42400 
Just one more idea: I will happily remove 2048 sample fft's from the analysis. Looking at fft analysis, currently there are separate fft analyses carried out on the data in the current codec block, some of the previous block and some of the next block (assuming we are not analysing the ends of the file). The overlap is fft_length/2 and the spacing of analyses is fft_length/2, so for a 1024 sample codec_block_size, 3 fft analyses are performed: 512 to 511; 0 to 1023 and 512 to 1535. For a 512 sample codec_block_size 2 analyses are performed: 512 to 511; 0 to 1023 (ve samples counts are in the previous block, +ve sample counts in excess of codec_block_size1 are in the next block). Though I love the idea of deciding (at least in principle) for each individual sample about the number of bits to remove we can see it a bit more practically: Overall view: Our stage 1 process provides blocks of 512 samples, and all samples within this block have the same number of bits removed. We do it under all circumstances, that is especially for 1, 2, 3. This way we are free with the stage 2 encoder to use any multiple of 512 as the blocksize, and for our best knowledge so far it's easy to find an appropriate blocksize (for instance 512 for FLAC and TAK, 1024 for wavPack, 2048 for WMAlossless). Detail view for stage 1: With a 512 sample block we can easily let it consist of several consecutive length64FFT and length256FFT windows. We can build for each 512 sample block an individual length1024FFT in a way that our 512 sample block lies in the middle of the 1024 sample FFT window. (Looking at only the length1024FFT windows: these cover the entire track overlappingly). May be it's good to apply a complex FFT window function for the length1024FFT, but I guess the simple approach is good enough. The length1024FFT window contains information from 256 samples in front of and after the block which make up for an inaccuracy. These access sample window parts correspond to ~5.8 msec each  a pretty short period IMO. Moreover in case the shorter FFTs have an independent influence on the number of bits to remove I don't think this is a dangerous inaccuracy. What I mean is: if one of the shorter FFTs yields a very low value bin, and if there's no lower one in the length1024FFT, this low value bin from a shorter FFT decides on the number of bits to remove. But this is the place IMO where we should say goodbye to length2048FFTs. So, for a 1024 sample codec_block size there are 3 1024 sample fft analyses carried out; 9 256 sample fft analyses carried out and 33 64 sample fft analyses carried out. Spreading, minimum searching and averaging is carried out on all of them and the smallest derived value used to determine bits_to_remove.  lossyWAV q X a 4 feedback 4 FLAC 8 ~= 320kbps



Oct 30 2007, 15:41
Post
#412


Group: Members Posts: 2414 Joined: 9October 05 From: Dormagen, Germany Member No.: 25015 
Thanks for clarification. So you do a lot of overlapping analyses.
Looking at this current way you do it I don't see a reason why not use a lossyWav blocksize of 512 throughout. (I'd like to call it lossyWav blocksize cause it's not neccesarily the blocksize of the encoding codec). In case there should be something not appropriate with this way of doing the analyses it is so as well with a lossyWav block size of 1024. A lossyWav blocksize of 512 gives way for any appropriate blocksize as a multiple of 512 in the stage 2 encoding process. What might be wrong with doing the analyses this way? Hopefully nothing of course, but I'm a bit afraid of the energy that originates from outside of the codec block influencing the analysis for the codec block. The way it's done energy from ~11.5 msec before and after the block make it into the decision making for the block. So a potential min bin may loose its min status due to energy from outside the block. If that's fine: alright, if it can be problematic statistics is in favor of 1024 sample lossyWav blocks as for each block say the 1024 sample FFTs provide for a 100% access samples being used whereas with 512 sample lossyWav blocks this extends to 200%. Anyway it should be problem free (at least problem poor) in any case. That's all about the way it is. But what would be the disadvantage with the approach of my last post: overlapping only in the case of length1024FFTs (with the 512 sample lossyWav block right in the middle), and with consecutive nonoverlapping FFT windows for the other FFT lengths. Would reduce the foreign energy problematic (in case there is one) and would reduce the number of FFTs. This post has been edited by halb27: Oct 30 2007, 15:41  lame3100m bCVBR 300



Oct 30 2007, 16:20
Post
#413


ReplayGain developer Group: Developer Posts: 4945 Joined: 5November 01 From: Yorkshire, UK Member No.: 409 
It's inefficient to remove more bits than a given lossless encoder can take advantage of.
So say, for example, you run lossyWAV with 512 and FLAC with 1024. That means, within any FLAC block, half the samples might have more zeros than FLAC can take advantage of (because the other half have fewer zeros, defining and limiting the number of "wasted_bits" within that FLAC block). "So what?" you might think. Well, removing more bits equates to adding more noise. And the more noise you add, the less efficient a lossless codec will be (excepting the special case where the "noise" is a string of zeros which it can take advantage of). So it's possible (and in my very early tests, true) that lossyWAV 512 with FLAC 1024 will give a higher bitrate than lossyWAV 1020 with FLAC 1024 (and, of course, a theoretically lower quality, though hopefully both are transparent). Cheers, David. The way it's done energy from ~11.5 msec before and after the block make it into the decision making for the block. So a potential min bin may loose its min status due to energy from outside the block. That's intentional. One of the FFT analyses is usually concentrated on the block boundary, which for a 1024point FFT at 44.1kHz is, as you say, about +/11.6ms  though the windowing means the effect at the edges is pretty small. The reason for doing this is to catch low energy moments near the block boundary, which could otherwise be completely missed. If you miss them, you add too much noise; worse still, you can put a hefty transition in there as you switch to more bits removed.More generally, if you don't overlap analysis blocks, then there are moments of the audio that you never check, so you won't know if the noise you're adding is above or below the noise floor during those moments. Cheers, David. 


Oct 30 2007, 17:02
Post
#414


Group: Members Posts: 2414 Joined: 9October 05 From: Dormagen, Germany Member No.: 25015 
It's inefficient to remove more bits than a given lossless encoder can take advantage of. So say, for example, you run lossyWAV with 512 and FLAC with 1024. ... That means, within any FLAC block, half the samples might have more zeros than FLAC can take advantage of (because the other half have fewer zeros, defining and limiting the number of "wasted_bits" within that FLAC block). ... Sure if we provide 512 sample blocks with lossyWav we loose efficiency when using an encoder with a blocksize of 1024 in case the encoder works efficiently with a blocksize of 512. The lossyWav512/FLAC1024 isn't attractive and should be replaced by lossyWav512/FLAC512. But encoders like wavPack or WMAlossless prefer larger blocksizes for efficiency so it's about finding the sweet spot combination. So maybe lossyWav512/WMAlossless2048 is the better combination than lossyWav512/WMAlossless512 (not for sure at all). But I can't see a mechanism that makes the lossyWav512/WMAlossless2048 inferior to the lossyWav2048/WMAlossless2048 combination. Sure encoder blocksize should always be an integer multiple of lossyWav blocksize. The way it's done energy from ~11.5 msec before and after the block make it into the decision making for the block. So a potential min bin may loose its min status due to energy from outside the block. That's intentional. One of the FFT analyses is usually concentrated on the block boundary, which for a 1024point FFT at 44.1kHz is, as you say, about +/11.6ms  though the windowing means the effect at the edges is pretty small. The reason for doing this is to catch low energy moments near the block boundary, which could otherwise be completely missed. If you miss them, you add too much noise; worse still, you can put a hefty transition in there as you switch to more bits removed.More generally, if you don't overlap analysis blocks, then there are moments of the audio that you never check, so you won't know if the noise you're adding is above or below the noise floor during those moments. Cheers, David. You certainly know more about these things than I do. But with a lossyWav blocksize of 512 the length1024FFT which is overlapping covers your fears. So at least there is no need to do this extensive overlapping with the 1024 FFTs. The shorter FFTs don't hurt my proposal done the way it is done now. Moreover there is the problem of unwanted energy from outside the block under investigation having an influence in bits to remove for the current block. With my proposal this influence is lower. Encoding speed improves (though IMO this is a minor aspect). So in the end: why not just use only 512 sample blocks in lossyWav and just 1 1024FFT for each of these 512lossyWav blocks, with the lossyWav block centered in the FFT window? BTW is there a windowing function like hanning used? With the overlapping it would be most welcome I think and it would reduce potential negative side effects of the 'foreign' samples. It would also reduce errors resulting from a rectangular window. This post has been edited by halb27: Oct 30 2007, 17:39  lame3100m bCVBR 300



Oct 30 2007, 19:35
Post
#415


lossyWAV Developer Group: Developer Posts: 1772 Joined: 11April 07 From: Wherever here is Member No.: 42400 
BTW is there a windowing function like hanning used? With the overlapping it would be most welcome I think and it would reduce potential negative side effects of the 'foreign' samples. It would also reduce errors resulting from a rectangular window. The Hanning window is used. I did toy with the idea of the centred analysis previously, but at that time I was more concerned with being able to duplicate exactly the results from David's Matlab script.
 lossyWAV q X a 4 feedback 4 FLAC 8 ~= 320kbps



Oct 30 2007, 21:38
Post
#416


Group: Members Posts: 2414 Joined: 9October 05 From: Dormagen, Germany Member No.: 25015 
The Hanning window is used. I did toy with the idea of the centred analysis previously, but at that time I was more concerned with being able to duplicate exactly the results from David's Matlab script. Yes, IMO that was the right thing to do then. But now we're ahead of that, and it's wonderful that we have the same idea. Edited: Removed the idea of having a smaller overlap area for the 64 and 256 sample FFT. Not a good idea. This post has been edited by halb27: Oct 30 2007, 23:22  lame3100m bCVBR 300



Oct 30 2007, 23:19
Post
#417


lossyWAV Developer Group: Developer Posts: 1772 Joined: 11April 07 From: Wherever here is Member No.: 42400 
overlap parameter added to reduce the end_overlap of FFT analyses to 25% FFT_length rather than 50%. CODE lossyWAV alpha v0.3.20 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org Usage : lossyWAV <input wav file> <options> Example : lossyWAV musicfile.wav Quality Options: 1 extreme quality level (cbs 1024 nts 3.0 skew 30 snr 24) 2 default quality level (cbs 1024 nts 1.5 skew 24 snr 18) 3 compact quality level (cbs 512 nts 0.5 skew 18 snr 12) o <folder> destination folder for the output file force forcibly overwrite output file if it exists; default=off Advanced / System Options: nts <n> set noise_threshold_shift to n dB (18dB<=n<=0dB) (reduces overall bits to remove by 1 bit for every 6.0206dB) snr <n> set minimum average signal to added noise ratio to n dB; (0dB<=n<=48dB) skew <n> skew fft analysis results by n dB (0db<=n<=48db) in the frequency range 20Hz to 3.45kHz cbs <n> set codec block size to n samples (512<=n<=4608, n mod 16=0) overlap enable aggressive fft overlap method; default=off spf <3x5chr> manually input the 3 spreading functions as 3 x 5 characters; e.g. 444444444444444; Characters must be one of 1 to 9 and A to Z (zero excluded). clipping disable clipping prevention by iteration; default=off dither dither output using triangular dither; default=off quiet significantly reduce screen output nowarn suppress lossyWAV warnings detail enable detailled output mode below set process priority to below normal. low set process priority to low. Special thanks: Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis. Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit. This post has been edited by Nick.C: Nov 7 2007, 22:40  lossyWAV q X a 4 feedback 4 FLAC 8 ~= 320kbps



Oct 30 2007, 23:28
Post
#418


Group: Members Posts: 2414 Joined: 9October 05 From: Dormagen, Germany Member No.: 25015 
OOPs, you're so fast!!
I've read a lot about fft overlapping windows, and I haven't seen anybody doing less than 50% overlapping. I've just removed this part from my post, and a second later seen you having realized it. Thanks for your version and sorry for the confusion! But now that you've done it: let's see what 2Bdecided and other people have to say about it. Anyway for the 1024 sample FFT I think we should do the 1 FFT center approach  at least as long as we're happy with a 50% overlapping of the other FFTs as this has pretty much the same feasibility background.  lame3100m bCVBR 300



Oct 30 2007, 23:32
Post
#419


lossyWAV Developer Group: Developer Posts: 1772 Joined: 11April 07 From: Wherever here is Member No.: 42400 
OOPs, you're so fast!! Or, what about a fixed proportion of the largest FFT_length as the end_overlap? Say, 256, i.e. 0.25 of the 1024, for *all* analyses?
I've read a lot about fft overlapping windows, and I haven't seen anybody doing less than 50% overlapping. I've just removed this part from my post, and a second later seen you having realized it. Thanks for your version and sorry for the confusion! But now that you've done it: let's see what 2Bdecided and other people have to say about it. Anyway for the 1024 sample FFT I think we should do the 1 FFT center approach  at least as long as we're happy with a 50% overlapping of the other FFTs as this has pretty much the same feasibility background.  lossyWAV q X a 4 feedback 4 FLAC 8 ~= 320kbps



Oct 30 2007, 23:53
Post
#420


Group: Members Posts: 2414 Joined: 9October 05 From: Dormagen, Germany Member No.: 25015 
Or, what about a fixed proportion of the largest FFT_length as the end_overlap? Say, 256, i.e. 0.25 of the 1024, for *all* analyses? I guess your concern is the same as mine: for the starting and ending 'overlap' half the FFT_length for the area outside the lossyWav block is a bit much and brings in wrong information to a major extent. Your approach of 25% seems appropriate to me and corresponds to the 50% overlap between adjacent FFT windows (meaning the most central 50% samples of the FFT windows are considered to take good care of by the hanning windowed FFT analysis). But why do you want to relate it to the longest FFT? IMO it should be 25% of the current FFT length. This more general procedure matches perfectly with the 1 FFT center spproach for a lossyWav blocksize of 512 and a 1024 sample FFT. This post has been edited by halb27: Oct 30 2007, 23:59  lame3100m bCVBR 300



Oct 31 2007, 00:01
Post
#421


lossyWAV Developer Group: Developer Posts: 1772 Joined: 11April 07 From: Wherever here is Member No.: 42400 
Or, what about a fixed proportion of the largest FFT_length as the end_overlap? Say, 256, i.e. 0.25 of the 1024, for *all* analyses? I guess your concern is the same as mine: for the starting and ending 'overlap' half the FFT_length for the area outside the lossyWav block is a bit much and brings in wrong information to a major extent. Your approach of 25% seems appropriate to me and corresponds to the 50% overlap between adjacent FFT windows (meaning the most central 50% samples of the FFT windows are taken good care of by the hanning windowed FFT analysis). But why do you want to relate it to the longest FFT? IMO it should be 25% of the current FFT length. This more general procedure matches perfectly with the 1 FFT center spproach for a lossyWav blocksize of 512 and a 1024 sample FFT. Although, niggling at the back of my mind is the thought that if it holds that you should overlap by 50% inside a codec block, why would we change that when looking outside the codec block in the end_overlap area? This post has been edited by Nick.C: Oct 31 2007, 00:02  lossyWAV q X a 4 feedback 4 FLAC 8 ~= 320kbps



Oct 31 2007, 00:08
Post
#422


Group: Members Posts: 2414 Joined: 9October 05 From: Dormagen, Germany Member No.: 25015 
I repeated my v0.3.19 skew and snr analysis (using 3) which I did wrong yesterday (first result: average bitrate of my full length regular music set, second result: average bitrate from my problem sample set):
CODE  skew 0  skew 12 skew 24 skew 36 snr 0 382 / 480383 / 490390 / 510421 / 547 snr 12 382 / 480383 / 490390 / 510421 / 547 snr 24 387 / 486393 / 501402 / 524429 / 560 Pretty much the same result as with v0.3.18. (keep in mind that the v0.3.18 test was done with a FLAC blocksize of 1024). This post has been edited by halb27: Oct 31 2007, 00:30  lame3100m bCVBR 300



Oct 31 2007, 00:20
Post
#423


Group: Members Posts: 2414 Joined: 9October 05 From: Dormagen, Germany Member No.: 25015 
Although, niggling at the back of my mind is the thought that if it holds that you should overlap by 50% inside a codec block, why would we change that when looking outside the codec block in the end_overlap area? If you overlap 50% inside of the lossyWav block this means you have confidence that the region 25% to either side of the FFT window center carries the necessary information. Let's take this as valid assumption (otherwise we would have to increase the overlapping). With a 50% overlap these '25% away from the center' regions consecutively and nonoverlappingly cover the lossyWav block. At the start this means you need to start the first window just 25% before the current lossyWav block. The lossyWav block then starts at the very beginning of our trusted region of the first FFT window. At the end it's the same thing as only the last 25% of the FFT window makes up for the trailing untrusted region. Most vital it's for the long FFT as a lot of foreign energy makes it into the current lossyWav block analysis with the current form we do it. This post has been edited by halb27: Oct 31 2007, 00:33  lame3100m bCVBR 300



Oct 31 2007, 00:30
Post
#424


lossyWAV Developer Group: Developer Posts: 1772 Joined: 11April 07 From: Wherever here is Member No.: 42400 
Although, niggling at the back of my mind is the thought that if it holds that you should overlap by 50% inside a codec block, why would we change that when looking outside the codec block in the end_overlap area? If you overlap 50% inside of the lossyWav block this means you have confidence that the region 25% to either side of the FFT window center carries the necessary information. Let's take this as valid assumption (otherwise we would have to increase the overlapping). With a 50% overlap these '25% away from the center' regions consecutively and nonoverlappingly cover the lossyWav block. At the start (analogously for the end) this means you need to start the first window just 25% before the current lossyWav block. The lossyWav block then starts at the very beginning of our trusted region of the first FFT window.Most vital it's for the long FFT as a lot of foreign energy would make it into the current lossyWav block analysis.  lossyWAV q X a 4 feedback 4 FLAC 8 ~= 320kbps



Oct 31 2007, 08:09
Post
#425


Group: Members Posts: 2414 Joined: 9October 05 From: Dormagen, Germany Member No.: 25015 
.... lossyWAV alpha v0.3.20 attached: overlap parameter added to reduce the end_overlap of FFT analyses to 25% FFT_length rather than 50%.... Hi Nick, Did you change this already or did it go unnoticed to me: Does that mean the overlap within a lossyWav block is 50% as before, but the overlap at the beginning and end of a lossyWav blocks stretches just 25% into the neighboring lossyWav blocks? Would be great, as I'm really worried about the behavior with 1024 sample FFTs where we have 2 FFT windows which get exactly the same amount of information from the neighboring lossyWav blocks as from the block under consideration, and no other FFT window in the case of lossyWav block size = 512 resp. just 1 more FFT window (so 1 out of 3) in the case of lossyWav block size = 1024 (this ione at least gets the right information). Min finding makes the situation worse. Hope I interpret your overlap option correctly cause 25% overlap in the interior wasn't a good idea. Sorry again for going wild. This post has been edited by halb27: Oct 31 2007, 08:10  lame3100m bCVBR 300



LoFi Version  Time is now: 17th April 2014  14:35 