Help - Search - Members - Calendar
Full Version: lossyWAV Development
Hydrogenaudio Forums > Hydrogenaudio Forum > Uploads
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
halb27
QUOTE(Nick.C @ Nov 6 2007, 10:05) *

...I will continue my quest to further optimise and speed-up the code. FP assembly language is not as painful as I first thought. I did download the Intel IA-32 Software Developers Manual and it's got lots of nice instructions in it.... However I would be worried about using instructions only available on later processors as I don't wish to alienate any users (and am not in the position [yet] to maintain separate builds).

Nice you do optimizing. IMO you're absolutely right in not going too far spezializing. Speed is welcome but even more is using your exe without getting into trouble (including your personal trouble as extreme optimizing can be troublesome).
TBeck
QUOTE(Nick.C @ Nov 6 2007, 09:05) *

I will continue my quest to further optimise and speed-up the code. FP assembly language is not as painful as I first thought. I did download the Intel IA-32 Software Developers Manual and it's got lots of nice instructions in it.... However I would be worried about using instructions only available on later processors as I don't wish to alienate any users (and am not in the position [yet] to maintain separate builds).

Do you know the bible of IA-32 optimization? If not: Optimizing assembly code (Agner Fog)

Thomas
Nick.C
QUOTE(TBeck @ Nov 6 2007, 08:35) *
QUOTE(Nick.C @ Nov 6 2007, 09:05) *
I will continue my quest to further optimise and speed-up the code. FP assembly language is not as painful as I first thought. I did download the Intel IA-32 Software Developers Manual and it's got lots of nice instructions in it.... However I would be worried about using instructions only available on later processors as I don't wish to alienate any users (and am not in the position [yet] to maintain separate builds).
Do you know the bible of IA-32 optimization? If not: Optimizing assembly code (Agner Fog)

Thomas
Ooooh! Thanks for that Thomas, I will certainly have a read before I get too heavily down the "Delphi wrapper around an assembly language program" route.
Nick.C
Well, big thanks to Thomas for the pointer to Agner Fog's excellent guide to optimising assembly language. The FFT routine is now completely in IA-32 assembler using only 32bit registers and the FPU. Even so, it is considerably faster than alpha v0.4.0.

lossyWAV alpha v0.4.1 attached: Superseded.

Code optimisation of FFT routine;

Slight change to the -overlap calculations regarding number of fft analyses to carry out for a given block and size of fft_overlap. This means that the "central" fft analysis may not be exactly in the centre of the codec block, but the end_overlap value is exactly half of the fft_length.
halb27
Hallo Nick,

Thank you very much for your new version.
Looks like quality has improved: With 'Under The Boardwalk' using plain -3 I'm far away now from being able to abx it. No chance at all.

I'm about to encode part of my collection using -3.
Doing so I wanted to try the 128 sample and 512 sample FFT using a full -spf string but with no effect. Are these FFT lengths reserved to -1?
Nick.C
QUOTE(halb27 @ Nov 8 2007, 21:31) *

Hallo Nick,

Thank you very much for your new version.
Looks like quality has improved: With 'Under The Boardwalk' using plain -3 I'm far away now from being able to abx it. No chance at all.

I'm about to encode part of my collection using -3.
Doing so I wanted to try the 128 sample and 512 sample FFT using a full -spf string but with no effect. Are these FFT lengths reserved to -1?
YGPM!
Nick.C
QUOTE(halb27 @ Nov 8 2007, 21:31) *
Doing so I wanted to try the 128 sample and 512 sample FFT using a full -spf string but with no effect. Are these FFT lengths reserved to -1?
Not any more. I have implemented a "-fft" parameter which takes a 5 character binary numeric input, each character of which corresponds to a specific fft_length, i.e. character 1 > 64 samples, character 2 > 128 samples, etc, character 5 > 1024 samples. So, the default for -2 would be -fft 10101, for -1 would be 10111.

I have also converted the spread and remove_bits procedures to IA-32 / FP assembly, so there's been a bit of a speed up as well. Incidentally, I found out that the size of the data segment for each unit will adversely affect the program speed if not carefully aligned to 8 or 16 byte boundaries (not exactly sure which).

lossyWAV alpha v0.4.2 attached: Superseded.
CODE
lossyWAV alpha v0.4.2 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme quality [4xFFT] (-cbs 1024 -nts -3.0 -skew 30 -snr 24
              -spf 11124-ZZZZZ-11225-11225-11236 -fft 10111)
-2            default quality [3xFFT] (-cbs 1024 -nts -1.5 -skew 24 -snr 18
              -spf 11235-ZZZZZ-11336-ZZZZZ-1234D -fft 10101)
-3            compact quality [3xFFT] (-cbs  512 -nts -0.5 -skew 18 -snr 12
              -spf 11235-ZZZZZ-11336-ZZZZZ-1234D -fft 10101)

-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-nts <n>      set noise_threshold_shift to n dB (-18dB<=n<=0dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB)
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db) in the
              frequency range 20Hz to 3.45kHz
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)
-fft <5xchr>  select fft lengths to use in analysis (1=on, 0=off)
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-overlap      enable conservative fft overlap method; default=off

-spf <5x5chr> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 44444-44444-44444-44444-44444 (Characters must be one of
              1 to 9 and A to Z (zero excluded).
-clipping     disable clipping prevention by iteration; default=off
-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
As a quick test, I ran my 52 sample set through at "-3 -fft 00100 -skew 36 -cbs 1024", which gives about a 3x speed increase (over -2):

WAV: 121.53MB; FLAC: 68.08MB, 790.6kbps; lossyWAV -2: 44.16MB, 512.8kbps, 46 secs.; lossyWAV -3 -fft 00100 -skew 36 -cbs 1024: 41.30MB, 479.6kbps, 16secs.

Surprisingly(?), this produces output which is satisfactory for my preferred DAP, in a third of the time and 94% of the diskspace.
halb27
Good news, thank you.

As I'm about to go productive I welcome very much the possibility to have further FFTs, especially at the short edge as the 64 sample FFT has some shortcomings in the low/mid frequency range.
Going productive I try to play it safe while staying within most of the current -2 framework (but -cbs 512 and -nts -1.0).
halb27
Hallo Nick,

There seems to be a problem. I tried v0.4.2 this way:

lossyWAV.exe utb.wav -2 -cbs 512 -nts -0.5 -skew 24 -snr 18 -spf 11235-11236-11336-12348-1234D -fft 11111

and lossyWav starts to output a .lossy.wav file but immediately after that stops working. No crash, it just hangs, doesn't come back to the command line, and produces no output. Sorry.
Nick.C
Ah.... It may be down to my inexperience with FP assembly - probably too few "FWAIT" instructions. I will amend and re-attach.

Nick.

[edit]lossyWAV alpha v0.4.3 attached: added a few more "FWAIT" instructions and reduced the permissible range of "-spf" input values back to hexadecimal characters (could cause problems at shorter FFT lengths).

Having fun with "-fft" and I was astounded to get casual listening compatible results on my 52 sample set (as above) with "-3 -fft 00100 -skew fffff-fffff-44579-fffff-fffff -skew 36" : 33.25MB, 386.1kbps

Regarding the crashing - could anyone else with this problem please speak up and also, if you could indicate CPU type that would be very welcome. The only machines I have to test on are Intel C2D.....

On a more serious note, and with regard to IA32 / FPU assembly language: when *should* I insert an FWAIT instruction into the code?[/edit]
halb27
Sorry same issue with new version and

lossyWAV.exe utb.wav -2 -cbs 512 -nts -0.5 -spf 11235-11236-11336-12348-1234D -fft 11111

lossyWav.exe echoes the options, starts producing the lossy.wav file, then hangs and produces no output. It does not crash. I can finish lossyWav by pressing Ctrl-C.

My cpu is a (32 bit) AMD mobile Athlon (= low power Barton), and I'm running Windows XP.
Nick.C
QUOTE(halb27 @ Nov 9 2007, 22:11) *
Sorry same issue with new version and

lossyWAV.exe utb.wav -2 -cbs 512 -nts -0.5 -spf 11235-11236-11336-12348-1234D -fft 11111

lossyWav.exe echoes the options, starts producing the lossy.wav file, then hangs and produces no output. It does not crash. I can finish lossyWav by pressing Ctrl-C.

My cpu is a (32 bit) AMD mobile Athlon (= low power Barton), and I'm running Windows XP.
Thanks - now for a bit more debugging headbang.gif
halb27
I should add I had no problem with v0.4.1 where you had already a lot of assembler code in it.
verbajim
It crashes immediately here when I run lossyWAV.exe file.wav. My processor is an AMD Athlon 64. I also had no problem with 0.4.1.

Edit: on second thought it doesn't terminate, I just get the crash report by windows, but it hangs like halb27 says.
Nick.C
I may have found a possible culprit.....

lossyWAV alpha v0.4.3b attached.
halb27
Sorry, same effect.
Nick.C
QUOTE(halb27 @ Nov 9 2007, 22:44) *
Sorry, same effect.
Is it doing this with no input parameters, or only when input parameters (other than name of file to process) are used?
halb27
Same effect with plain lossyWav.exe utb.wav.
Nick.C
Seems to be a problem with AMD processors at the moment......

lossyWAV alpha v0.4.3c attached: Maybe?

lossyWAV alpha v0.4.3d attached: FWAIT instructions removed. Just to see if that is it.
halb27
Nothing changes with v0.4.3c and with v0.4.3d.
robert
It seems to work on my Athlon64X2
CODE
E:\dev-privat\lossy-wav>lossyWAV "Q:\CD\Anastacia\2000-Not That Kind\01 Not That Kind.wav" -o .\
lossyWAV alpha v0.4.3 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org
Processing : 01 Not That Kind.wav
Format     : 44.10kHz; 2 ch.; 16 bit; 8858220 samples; 200.87 sec.
Average    : 6.4714 bits; [55984/8651]; 11.72x; CBS=1024]
%lossyWAV Warning% : 47 bits not removed due to clipping.
shadowking
QUOTE(halb27 @ Nov 10 2007, 09:33) *

Nothing changes with v0.4.3c and with v0.4.3d.



Crashing here too. PIII 550
robert
Does it happen often, that there are no bits removed at all?
CODE
E:\dev-privat\lossy-wav>lossyWAV "Q:\CD\Various\1990-Classic Hits der 20er Jahre
- CD 1\01 Am Sonntag will mein Süsser mit mir segeln gehn - Edith d'Amara.wav"
lossyWAV alpha v0.4.3 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org
Processing : 01 Am Sonntag will mein S³sser mit mir segeln gehn - Edith d'Amara.
wav
Format     : 44.10kHz; 2 ch.; 16 bit; 7926828 samples; 179.75 sec.
Average    : 0.0000 bits; [0/7742]; 11.95x; CBS=1024]
robert
It doesn't work on my Notebook, CPU is a Pentium-M.
Mitch 1 2
lossyWAV 0.4.2 has no issues with my laptop's AMD Mobile Sempron 3000+ (32-bit) CPU, except it crashes when the specified output folder doesn't exist.
[JAZ]
QUOTE(shadowking @ Nov 10 2007, 02:00) *

QUOTE(halb27 @ Nov 10 2007, 09:33) *

Nothing changes with v0.4.3c and with v0.4.3d.



Crashing here too. PIII 550



Nick.C : Have you added "SSE2" instructions ( operations with doubles )???? PIII, and Athlon XP don't have such, although an Athlon 64 does.
Nick.C
Thanks for the responses guys..... It seems to fail on some AMD and older Intel CPU's.

Sometimes no bits will be removed - that's the beauty of David's method - nothing is removed if it is not safe to do so.

No SSE / SSE2 instructions used, only 80x87 FPU instructions. I will try to revert to v0.4.1 with the functionality of v0.4.3 and attach.
Nick.C
QUOTE(Nick.C @ Nov 10 2007, 09:40) *
I will try to revert to v0.4.1 with the functionality of v0.4.3 and attach.
lossyWAV alpha v0.4.3e attached: Superseded.

Spread and Remove_Bits procedures have been rolled back to v0.4.1;

"-fft " parameter functionality remains.

Where's the smiley for "fingers-crossed" when you want it....?
halb27
Yeah, it works.
Thank you.

BTW: From my personal experience on performance optimization the most imprtant thing is to have a good and adequate software architecture. Low level optimization is important often in only isolated spots.
Sure this needn't necessarily apply to lossyWav.
Nick.C
QUOTE(halb27 @ Nov 10 2007, 10:47) *
Yeah, it works.
Thank you.

BTW: From my personal experience on performance optimization the most imprtant thing is to have a good and adequate software architecture. Low level optimization is important often in only isolated spots.
Sure this needn't necessarily apply to lossyWav.
I *was* only optimising the most frequently called procedures / functions. FFT, Spread and Remove_Bits are the functional core of the whole method. With all three converted to assembler I got an extra 10% speed compared to just FFT.

However, thankfully it is now working, and close to optimal speed. Have fun with your testing / transcoding!
Mitch 1 2
QUOTE(Mitch 1 2 @ Nov 10 2007, 14:43) *

lossyWAV 0.4.2 has no issues with my laptop's AMD Mobile Sempron 3000+ (32-bit) CPU, except it crashes when the specified output folder doesn't exist.


This problem still hasn't been fixed.
halb27
QUOTE(Nick.C @ Nov 10 2007, 12:50) *

... and close to optimal speed. ...

Yes, I'm very pleased by the speed.

ADDED:

As a result of the new possibilities of 5 fft lengths:

lossyWav -2 -cbs 512 -nts -1.0 -fft 11111 -spf 11235-11236-11336-12348-1234D (my favorite for going productive)

followed by FLAC --best -e -f -b 512

yields 438 kbps for my regular set and 546 kbps for my problem set. I'm very pleased with this ratio.
shadowking
It works now.
Nick.C
QUOTE(Mitch 1 2 @ Nov 10 2007, 11:08) *
QUOTE(Mitch 1 2 @ Nov 10 2007, 14:43) *
lossyWAV 0.4.2 has no issues with my laptop's AMD Mobile Sempron 3000+ (32-bit) CPU, except it crashes when the specified output folder doesn't exist.
This problem still hasn't been fixed.
Sorry Mitch, I will endeavour to fix it for the next revision. Thanks for the feedback people!

[edit]Thinking about the crashing - maybe it was an infinite loop....... Much investigation to come.[/edit]

[edit2] Some moron was using the FISTTP to store a truncated real to a mem32 integer.... blush.gif .... which is apparently an SSE3 instruction. I will rework the routines to avoid using this instruction and re-attach as alpha v0.4.3f (hopefully with the output directory crashing bug rectified). [/edit2]
Nick.C
lossyWAV alpha v0.4.4 attached: Superseded.

Use of FISTTP instruction (SSE3!) eradicated; Thanks for the pointer [JAZ] - I found it very quickly when I googled "80x87 instruction set" and FISTTP isn't on the list........

Spread and Remove_Bits procedures now assembler (again....);

Now checks for access to output directory if specified.
halb27
I encoded part of my collection using v0.4.4 without any problem, and according to my listening experience so far everything is very fine.
I used a variant of -2 which made me think more deeply afterwards about what's really important.

I'd like to suggest a discussion on two points concerning default bahavior:

1)
I would welcome - as I said before - a general default cbs of 512 samples. This will make most lossless codecs behave more efficiently on one hand, and on the other hand I can't see a logical reason why not to use it. If it's about holding average bitrate up for defensive reason we should use a more direct approach targeting directly at overcoming potential weaknesses.

2)
With -2 I suggest to use an additional 128 sample FFT, to be precise I'd like to see a default behavior according to -fft 11101 -spf 11235-11236-11336-FFFFF-1234D.
The 64 sample FFT yields only few bins in the low and lower mid frequency range, so it is welcome IMO to have another rather short FFT which improves significantly upon the situation in the important lower mid frequency range.
So I think it's a meaningful addition to use a 128 sample FFT.
Moreover it doesn't really hurt as lossyWav is very fast now, and the increase in average bitrate is very low.
With -1 btw (not much in my focus) I suggest to use the full 5 analyses.

What do you think?
Nick.C
QUOTE(halb27 @ Nov 12 2007, 11:47) *
I encoded part of my collection using v0.4.4 without any problem, and according to my listening experience so far everything is very fine.
I used a variant of -2 which made me think more deeply afterwards about what's really important.

I'd like to suggest a discussion on two points concerning default bahavior:

1)
I would welcome - as I said before - a general default cbs of 512 samples. This will make most lossless codecs behave more efficiently on one hand, and on the other hand I can't see a logical reason why not to use it. If it's about holding average bitrate up for defensive reason we should use a more direct approach targeting directly at overcoming potential weaknesses.

2)
With -2 I suggest to use an additional 128 sample FFT, to be precise I'd like to see a default behavior according to -fft 11101 -spf 11235-11236-11336-FFFFF-1234D.
The 64 sample FFT yields only few bins in the low and lower mid frequency range, so it is welcome IMO to have another rather short FFT which improves significantly upon the situation in the important lower mid frequency range.
So I think it's a meaningful addition to use a 128 sample FFT.
Moreover it doesn't really hurt as lossyWav is very fast now, and the increase in average bitrate is very low.
With -1 btw (not much in my focus) I suggest to use the full 5 analyses.

What do you think?
Sounds entirely reasonable. I have no problem with a 512 sample codec_block_size. I will implement the changes to the -2 and -1 quality levels.

On another topic, do we *really* need a -dither option - I have no problems with the quality of the output? Similarly, the -clipping option to switch off the iterative clipping reduction method also seems redundant. This would increase throughput a bit which would in turn offset the increased processing time due to the extra analyses.
halb27
I personally don't see a real reason for the -dither option.
But as it's not defaulted I don't care much about it. You created a good separation between standard options and advanced options, and -dither is well situated in the advanced options IMO.
Good reasons for eventually saying good bye to the -dither option are IMO
- if you should run into trouble with your software architecture keeping up the -dither option (guess you won't) when at the same time nobody seems to use -dither.
- if it comes to cleaning up all the advanced options - but as they're separated well into 'advanced options' there's no real need for such a cleaning procedure IMO. Sure the time may come where these things may be thought of being obsolete.

As we're talking about default bahavior: what about -3?
I see two targets for -3:

a) -3 as a minor variant of -2, expected to be excellent under all circumstances as we expect it from -2, but with a detail behavior which is not as defensive as is -2. Your choice of using the same -spf values as that of -2 points in this direction. If we want to have it like this I suggest we increase the -skew value a bit.

b) as a seriously less defensive alternative to -2 targeting at a larger average bitrate gap than with what we have at the moment. To be more precise: if -2 yields say ~440 kbps on average, -3 should yield ~400 kbps. I guess it's achievable while still getting excellent quality. May be an even larger gap makes sense when being aware that quality may be sacrificed on hopefully rare occasion.
For b) the default setting should change quite a lot IMO.
Having extremely good encoding speed (like with your doing just 1 FFT) as a target fits rather good into this framework.

I personally don't have a favorite for a) or b).
halb27
I totally forgot about the -clipping option.
If there wasn't David Bryant's remark about wavPack being able to make use of the MSBs being 1 I would easily say -clipping makes no sense. It looks like the 'iterative' anti-clipping strategy does not only preserve quality but also doesn't impact efficiency in a global sense.
David Bryant brought this wavPack feature back to mind recently so I think it's not so simple to drop the -clipping option (keeping in mind it was David Bryant who brought us the idea of taking care of the critical bands, and I think this idea was one of the major improvements in the progress of lossyWav).
My personal feeling however is as the 'iterative' anti-clipping strategy doesn't have a negative impact on efficiency in a global sense wavPack won't benefit significantly from letting clipping happen. Moreover even if it did it would do so because of allowing clipping to occur. But I'd like David Bryant see commenting on this. Maybe I understand this wavPack feature totally wrong.
Nick.C
QUOTE(halb27 @ Nov 12 2007, 13:03) *
As we're talking about default bahavior: what about -3?
I see two targets for -3:

a) -3 as a minor variant of -2, expected to be excellent under all circumstances as we expect it from -2, but with a detail behavior which is not as defensive as is -2. Your choice of using the same -spf values as that of -2 points in this direction. If we want to have it like this I suggest we increase the -skew value a bit.

b) as a seriously less defensive alternative to -2 targeting at a larger average bitrate gap than with what we have at the moment. To be more precise: if -2 yields say ~440 kbps on average, -3 should yield ~400 kbps. I guess it's achievable while still getting excellent quality. May be an even larger gap makes sense when being aware that quality may be sacrificed on hopefully rare occasion.
For b) the default setting should change quite a lot IMO.
Having extremely good encoding speed (like with your doing just 1 FFT) as a target fits rather good into this framework.

I personally don't have a favorite for a) or b).
My preference would be for b). Thinking about it, if at the end of the day the only options were -1, -2, -3, -nts and -fft; with -skew, -snr & -spf fixed according to the quality settings, then the user could decide how aggressive the processing was by using -fft and -nts alongside the -1, -2 or -3 quality setting.

On the other hand, maybe all of the analyses should use the same -skew, -snr and -spf values?

However, taking David's preference for only 4 command line options (-1, -2, -3 & -nts) then *maybe* other parameters should only be available when using the -3 quality option. The thinking being: "I've already accepted that I want reduced quality by selecting quality level -3, so the program will now let me foul it up myself rather than using presets....."

On -dither and -clipping, from listening to undithered output and the process never reducing amplitude then -dither seems to be expendable. Similarly, the iterative approach used in the current clipping prevention method has little impact on bitrate so the -clipping parameter also seems to be expendable.
halb27
Target b) for -3: OK, so we should think about the details.

Identical -spf and -skew values for all of the three quality levels? I don't like the idea.

From my test when finding useful values for -spf I know some values really hurt bitrate efficiency wise (most of all the bold 1 in '11124' for the 64 sample FFT of -1) but may be vital for being real defensive with respect to the critical band at the lower edge of the corresponding frequency range. So I think it's neceesary for -1 (and would be most welcome for -2 too, but it's expensive and the more economic way of treating this within -2 may be by doing the additional 128 sample FFT).

With -skew it's similar. -skew is important for diffentiating resulting bitrate between regular and problematic spots, but with a value >24 the improved defensiveness is getting more and more expensive. So I think a value of 24 is very appropriate for -2, but it should be significantly higher only for -1. For -3 it should be <24.

Using very high values for -snr helps differentiating between regular and problematic spots too but with these values there's a rather high price to pay bitrate wise. So again high values of -snr should be used with -1 only IMO.

So I strongly think -1, -2, and -3 should consist of different -fft, -spf, -skew, -snr, and -nts settings in such a way that the overkill defensiveness, standard defensiveness, reduced defensiveness are represented best.

If you want to keep -1 and -2 clean of user options I suggest you do it for -3 as well, and instead create an experimental quality option -x which enables all the advanced options. advanced options = any option except for -1, -2, -3, -nts x (and -flac etc. in case these are ever needed - guess they won't).
Nick.C
QUOTE(halb27 @ Nov 12 2007, 14:13) *
Target b) for -3: OK, so we should think about the details.

Identical -spf and -skew values for all of the three quality levels? I don't like the idea.

From my test when finding useful values for -spf I know some values really hurt bitrate efficiency wise (most of all the bold 1 in '11124' for the 64 sample FFT of -1) but may be vital for being real defensive with respect to the critical band at the lower edge of the corresponding frequency range. So I think it's neceesary for -1 (and would be most welcome for -2 too but it's expensive and the more economic way of treating this within -2 may be by doing the additional 128 sample FFT).

With -skew it's similar. -skew is important for diffentiating resulting bitrate between regular and problematic spots, but with a value >24 the improved defensiveness is getting more and more expensive. So I think a value of 24 is very appropriate for -2, but it should be significantly higher only for -1. For -3 it should be <24.

Using very high values for -snr helps differentiating between regular and problematic spots too but with these values there's a rather high price to pay bitrate wise. So again high values of -snr should be used with -1 only IMO.

So I strongly think -1, -2, and -3 should consist of different -fft, -spf, -skew, -snr, and -nts settings in such a way that the overkill defensiveness, standard defensiveness, reduced defensiveness are represented best.

If you want to keep -1 and -2 clean of user options I suggest you do it for -3 as well, and instead create an experimental quality option -x which enables all the advanced options. advanced options = any option except for -1, -2, -3, -nts x (and -flac etc. in case these are ever needed - guess they won't).
I like the idea of the -x quality parameter (-0?) enabling the advanced options and also keeping -1, -2 & -3 "clean". This would be a copy of -2 and only those settings that the user input would be over-written, the rest being taken as per -2 for the processing.

On the -skew, -spf and -snr settings I am inclined to agree with you. The only difficult bit being agreeing what those settings will be.....
halb27
QUOTE(Nick.C @ Nov 12 2007, 16:17) *

I like the idea of the -x quality parameter (-0?) enabling the advanced options and also keeping -1, -2 & -3 "clean".

On the -skew, -spf and -snr settings I am inclined to agree with you. The only difficult bit being agreeing what those settings will be.....

When first thinking of the experimental option I also thought of -0 cause it matches the current naming scheme. But with the current schematics it makes the experimental quality level look superior to the standard quality levels. Though hopefully somebody might find a great setting this way I think -x (or an explicit
-experimental) is more appropriate.

'The only difficult bit being agreeing what those settings will be.....'. May be, let's see, but with -2 I think we're pretty much done already (better ideas always welcome):

-2 = -fft 11101 -spf 11235-11236-11336-FFFFF-1234D -cbs 512 -nts -1.5 -skew 24 -snr 18

With -1 I suggest to use

-1 = -fft 11111 -spf 11124-11125-11225-11225-11236 -cbs 512 -nts -3.0 -skew 30 -snr 24.

Most disputable may be -3.
Due to the 'significantly reduced defensiveness' target I suggest we use those -spf values I found in my -spf value testing. I think it's necessary for a significantly reduced average bitrate, and it still provided excellent quality. So the mixture of this and the current setting is

-3 = -fft 1001 -spf 11236-FFFF-FFFF-FFFF-1246E -cbs 512 -nts -0.5 -skew 18 -snr 12.

All these settings are pretty much what they are right now, and IMO they're just working out a little bit more what the various accents of the different quality levels stand for.
I don't care much about such details like whether -skew value for -3 should be rather 20 and -snr value 0 (my very personal preference but worth nothing).
Nick.C
QUOTE(halb27 @ Nov 12 2007, 14:58) *

-1 = -fft 11111 -spf 11124-11125-11225-11225-11236 -cbs 512 -nts -3.0 -skew 30 -snr 24.
-2 = -fft 11101 -spf 11235-11236-11336-FFFFF-1234D -cbs 512 -nts -1.5 -skew 24 -snr 18
-3 = -fft 10001 -spf 11236-FFFFF-FFFFF-FFFFF-1246E -cbs 512 -nts -0.5 -skew 18 -snr 12.

I don't care much about such details like whether -skew value for -3 should be rather 20 and -snr value 0 (my very personal preference but worth nothing).
The quality settings in the next revision will reflect those above (unless anyone else indicates a strong preference for something different).

I've been playing with the -fft parameter again and -3 -fft 00100 -spf ....-23346-..... -skew 24 yields 403kbps on my problematic sample set with no immediately apparent artifacts. I say immediately apparent because I don't believe that ABX'ing -3 is useful - to me -3 is the equivalent of listening in a car or on a train or plane - there is background noise already, so some minor changes to the original may / will be obscured by the noise floor of the listening environment. My "acceptability" testing takes place in an open-plan office environment with earbuds & DAP.

I am wondering about the clipping reduction method - at the moment, if it finds 1 or more sample which clips after rounding then it reduces bits_to_remove by one and tries again, until bits_to_remove=0 then it just stores the original values. Is 0 permissible clipping samples a bit too harsh? At the time thatthe iterative clipping was introduced, I put in an "allowable" variable, implying that a number of clipping (but rounded) samples may be permitted. I think that I should implement a "-allowable" parameter (1<=n<=64 (maximum permissible codec_block_size)) to set the allowable value as a clipping detection "threshold".


halb27
QUOTE(Nick.C @ Nov 13 2007, 00:25) *

The quality settings in the next revision will reflect those above (unless anyone else indicates a strong preference for something different).

I've been playing with the -fft parameter again and -3 -fft 00100 -spf ....-23346-..... -skew 24 yields 403kbps on my problematic sample set with no immediately apparent artifacts. ....

Thanks a lot.

As for your -3 approach (just 1 FFT, targeting a significantly lower bitrate than ~400 kbps for regular music ) I can try to help and do listening tests, especially with your setting. I wouldn't lower quality demand extremely however cause after all we will stay with pretty high bitrate, and with that I think we should have a distinction from what we can get with mp3 at moderate bitrate (though this is always a matter of taste).
Sorry I won't be able to do it within this week as I'm leaving for my father in law's 90th birthday (got some trouble at the moment producing a photo based dvd movie, and neither my old nor the new dvd player (present for my father in law) are playing it fine).
Nick.C
QUOTE(halb27 @ Nov 12 2007, 22:50) *
As for your -3 approach (just 1 FFT, targeting a significantly lower bitrate than ~400 kbps for regular music ) I can try to help and do listening tests, especially with your setting. I wouldn't lower quality demand extremely however cause after all we will stay with pretty high bitrate, and with that I think we should have a distinction from what we can get with mp3 at moderate bitrate (though this is always a matter of taste).
Sorry I won't be able to do it within this week as I'm leaving for my father in law's 90th birthday (got some trouble at the moment producing a photo based dvd movie, and neither my old nor the new dvd player (present for my father in law) are playing it fine).
Don't worry about the timescale, I will keep on trying to optimise the code..... I hope you have a great time at the party! Have you checked whether the DVD is written as UDF or not? This may make a difference.

I also tried -3 -fft 01100 -spf ffff-22335-22346-fffff-fffff -skew 24 which yielded 420kbps - not too bad at all. Second opinion definitely required. [edit] I will test some "real" music tomorrow and see what the bitrate comes out at. Maybe 400kbps for "real music" should be the target rather than approaching that for my problem set. [/edit]
halb27
QUOTE(Nick.C @ Nov 13 2007, 01:22) *

... Have you checked whether the DVD is written as UDF or not? This may make a difference. ...

The DVD plays well on my PC so I think the DVD is fine. My own dvd player simply is broken and doesn't play any dvd any more. The new player plays the 'movie', but from time to time it skips the current spot a bit which especially sounds very ugly as the music skips. Guess it's a VBR problem and that's what I'm playing with all evening long but with limited success. Guess we'll exchange the player tomorrow.

As for your new -3 setting I like the new one better as it's more demanding. Let's hear how it sounds.
Nick.C
QUOTE(halb27 @ Nov 12 2007, 23:39) *
As for your new -3 setting I like the new one better as it's more demanding. Let's hear how it sounds.
Another variation:

At the moment the method uses the Hanning window function on the input to the FFT analysis. Looking for "window function" in my favourite resource (Wikipedia) gives quite a long list. I have added a "-window" parameter to select which one to use. This allows the selection of 7 window functions (for evaulation / elimination at this stage): Hanning, Bartlett-Hann, Blackman, Nuttall, Blackman-Harris, Blackman-Nuttall and Flat-Top.

Will post revision tonight.
2Bdecided
What do you get for your test set resampled to 32kHz, processed with -2?

Does 32k resampling followed by ReplayGain (only negative values applied) help even more?

It makes sense to have a -3 along the lines you're proposing, but I suspect the above will be dramatically more efficient, and still artefact-free (though with a 16k LPF and, with RG, loud tracks becoming quieter).

Cheers,
David.
GeSomeone
QUOTE(halb27 @ Nov 12 2007, 15:13) *

Target b) for -3: OK, so we should think about the details.

Just following you dialog here.. smile.gif
This seems the right basic choice, there has to be a benefit for offering a (little) bit of quality. IMO that means a significant lower bit rate for -3 (compared with -2).

(Would -skew of -12 -18 -24 (for -3 -2 -1) be too agressive?)
QUOTE(Nick.C @ Nov 12 2007, 23:25) *

I am wondering about the clipping reduction method - at the moment, if it finds 1 or more sample which clips after rounding then it reduces bits_to_remove by one and tries again, until bits_to_remove=0 then it just stores the original values. Is 0 permissible clipping samples a bit too harsh? At the time thatthe iterative clipping was introduced, I put in an "allowable" variable, implying that a number of clipping (but rounded) samples may be permitted.

I suppose you mean consecutive samples of the maximum (or minimum) value? To me in this case 0, 1 or 2 would make sense, only already badly clipping music would be affected by other values.

And yes, the dither function is obsolete as you no longer opt to lower the amplitude.
QUOTE(Nick.C @ Nov 13 2007, 00:22) *

I also tried -3 [..] which yielded 420kbps [..] [edit]Maybe 400kbps for "real music" should be the target rather than approaching that for my problem set. [/edit]

The problem with this is that from the offset this method aims for constant quality (I like that BTW) so the bit rate will vary. I found for example that music that already compresses well (lossless) like in the 600's will not get half the bit rates with the help of lossyWav but rather still around 420.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.