Help - Search - Members - Calendar
Full Version: lossyWAV Development
Hydrogenaudio Forums > Hydrogenaudio Forum > Uploads
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
Nick.C
lossyWAV 1.0.0b release thread.

Link to the wiki article

Change log 1.0.0b: 13/05/08
WAV chunk handling improved to allow unknown chunks before the 'data' chunk to be copied verbatim;
Error in --merge parameter associated with 24-bit files corrected.

Change log 1.0.0: 12/05/08
Code tidied up and GNU GPL references included;
Minor change to determination of RMS value of codec_block: minimum value of all channels now taken rather than average of all channels;
A SourceForge project will be created and the code posted in due course.

Change log beta v0.9.8d: 06/05/08
-spf preset values changed to: '22222-22223-22224-12234-12245-12356' in line with discussion on page 48;
Code tidied up a bit and work done on the noise shaping code for v1.1.0, including the implementation of a Fibonacci shift register PRNG for triangular dither (Thanks to DualIP for making me aware of this method of fast pseudo random number generation!).

Change log beta v0.9.8c: 04/05/08
-snr preset parameters revised to (18,22,23.5,23.5,23.5,25,28,31,34,37,40);
-impulse parameter renamed to -fft32 to more clearly indicate its function.

Change log beta v0.9.8b: 01/05/08
-snr preset parameters revised to (18,22,22,22,22,25,28,31,34,37,40);
-nts preset parameters revised to (20,16,9,6,3,0,-2.4,-4.8,-7.2,-9.6,-12);
-impulse is automatic from -q 3 (this will manifest itself as a step change in bitrate from -q 2.9999 to -q 3.0).

Change log beta v0.9.8: 01/05/08
-snr preset parameters revised to (18,19,20,21,22,25,28,31,34,37,40);
-snr and -nts parameters temporarily re-enabled to allow further testing.
-spf for 32 sample FFT set to 22222.

Change log beta v0.9.7: 29/04/08
-impulse parameter implemented in an attempt to trap impulse based artefacts in the processed output by calculating additional overlapping 32 sample FFT's on the sample data. This additional processing unfortunately adds about 40% to the processing time.
Revised -snr values from v0.9.6 variant #1 (not released, but discussed - page 46) retained.
[edit] First 9 downloads did not recognise the -analyses parameter correctly. [/edit]
[edit2] -spf parameter re-enabled for short-term testing: -spf <6 x 5 hexchar separated by '-' characters> (35 characters long in total). [/edit2]

Change log beta v0.9.6: 24/04/08
-<n> presets removed in favour of -q <n> (0<=n<=10 quality preset selection. -q 0 = old -8; -q 5 = old -3; -q 10 = old -0.
-snr and -nts parameters removed;
-minbits <n> (0<=n<=8; resolution = 0.01; default=3;) introduced as an advanced option to allow the user to select the minimum number of bits to keep (relating to the log2 of the rms value of all the samples in the codec block);
-help and -longhelp parameters introduced and basic no parameter help reduced. System options moved to -help; Advanced options moved to -longhelp. This still needs some fleshing out.

Change log beta v0.9.5: 22/04/08
a,b or c suffix to quality preset removed in favour of the new -analyses <n> parameter (2<=n<=5);
-8 quality preset introduced, -nts=20, -snr=16;

Change log beta v0.9.4: 18/04/08
Changed the default number of FFT analyses to 2 lengths for all quality presets;
Tightened up the spreading function (same for all quality presets);
Implemented floating point quality presets (-0.0 to -7.0, resolution 0.0001);
Made highest quality preset (-0) settings more conservative.

Change log beta v0.9.3: 17/04/08
Error in skewing function preparation found and rectified - knock-on effect that bitrate reduced by around 20kbps for all quality presets and variations in bitrate between spreading functions reduced;
All quality presets now use the spreading function for -1.

Change log v0.9.2 RC3: 13/04/08
Code tidied up and slight increase in processing throughput achieved;
-shaping and -autoshape parameters removed in accordance with roadmap (should return in v1.1).

Change log beta v0.9.1: 02/04/08
-autoshape now non-linear with respect to bits-to-remove, i.e. 1-((bits-per-sample-3-bits-to-remove)/(bits-per-sample-3))^2

Change log beta v0.9.0: 01/04/08
Minor correction to noise shaping code;
Further IA-32/x87 speedups found, processing rate increased by a further 10%.

Change log beta v0.8.9: 29/03/08
-autoshape parameter implemented (incompatible with -shaping <n>). This applies shaping variably depending on bits-to-remove and the bitdepth of the sample, i.e. shaping-to-apply = min(1, bits-to-remove / (bitdepth-of-sample - minimum-bits-to-keep)).

Change log beta v0.8.8: 27/03/08
Error in the -merge parameter tracked and amended;
FFT now makes use of the ability to calculate a real FFT of length 2N using a complex FFT of length N (20% to 25% speedup);
Reads and writes to disk are now larger to reduce file fragmentation.

Change log beta v0.8.7: 21/03/08
Error in the -merge parameter tracked and amended to adopt David's method of storing the difference when scaled;

Change log beta v0.8.6: 18/03/08
Error in the -merge parameter tracked and amended;
-scale <n> parameter implemented to allow WAV data to be scaled (in the range 0 to 1, resolution 0.000001) prior to processing. -scale is compatible with the -correction and -merge parameters (although combined filesize may be large);
Complete FFT unit now in IA-32/x87.

Change log beta v0.8.5: 17/03/08
-shaping parameter now takes a supplementary value between 0 and 1 (0.001 resolution) which specifies the "proportion" of noise shaping to apply (0=fully off [default], 1=fully on);
-newspread parameter removed as results are identical to the existing spreading function that I thought that I had doubts about. The revised method will probably be faster when fully optimised in IA-32/x87 and will replace the existing method in the near future.

Change log beta v0.8.4: 14/03/08
Total rewrite of the -shaping parameter, in line with gratefully received guidance from SebastianG. No dither has been included (yet). The program will automatically select either the 44.1kHz or the 48kHz functions as required by the input WAV file. At present these are the only two sample rates for which noise shaping functions have been incorporated;
A rewrite of the spreading function has been included and is enabled using the -newspread parameter. This fixes a problem where some samples would be used too many times in the calculation of the average value of the FFT output;
Limits for -snr and -nts modified to 0 to 48 and -48 to 36 respectively to allow testing of the effectiveness of the noise shaping function.

Change log beta v0.8.3:
Implementation of -shaping parameter to make fixed noise shaping optional (default=off);
minor amendment to shaping code;

Change log beta v0.8.2:
First real attempt at implementing noise shaping, thanks to David for the pointers. It is currently not an optional parameter and will be applied to all quality presets.
-merge parameter "repaired" (wasn't looking in the right places for files).
-1 quality preset reduced from 4 to 3 FFT analyses; -2 quality preset reduced from 3 to 2 FFT analyses; (use a,b,c to increase if so wished).

Change log beta v0.8.1:
Revision to -snr and -nts limits to allow extremely low bitrate testing (see page 37).

Change log beta v0.8.0:
Revision of all presets in line with discussion on -7 preset (page 36).

Change log beta v0.7.9:
Implementation of -6 & -7 quality presets: -4 = -3.5; -5 = -4.0; -6 = -4.5; -7 = -5. For bitrates and detailed settings, see end of page 35.

Change log beta v0.7.8:
Implementation of -5 quality preset, as -4 except -snr=15(-4=21); -nts=12(-4=6).

Change log beta v0.7.7:
Correction made to maximum_bits_to_remove;
-merge parameter implemented.

Change log beta v0.7.6:
Addition of -4 quality preset, analogous to -3 at v0.6.4 RC1, but with 5 allowable clips per channel per codec_block;
Some work done on maximum_bits_to_remove: log2 of RMS value of all samples in a codec_block is taken and minimum_bits_to_keep is subtracted rather than bits_per_sample-minimum_bits_to_keep;
-overlap parameter removed;
-centre parameter removed.

Change log beta v0.7.5:
Handling of 24-bit samples corrected.

Change log beta v0.7.4:
-extrafft parameter removed as superseded;
-1, -2 & -3 parameters augmented by -1a, -2a, -2b, -3a, -3b, -3c. The suffix character denotes how many additional FFT analysis lengths will be used in the processing of the file, a=1, b=2, c=3, i.e. 1a = 4+1 = 5; 3b = 2+2 = 4.

Change log beta v0.7.3:
-overlap parameter revised to take a value (0..16). 1024 Sample FFT end_overlap = 512-16*(overlap_value);
-centre parameter revised to add a central 1024 sample FFT to the analysis (unless overlap=16).

Change log beta v0.7.2:
-overlap parameter implemented to modify end_overlap to 448 samples (from 512 samples) for 1024 sample FFT;
-centre parameter implemented to centralise 1024 sample FFT on centre of codec_block, i.e. end_overlap = 256 samples;
Codec_blocks full of zero's are now not processed.

Change log beta v0.7.1:
Window function slightly modified and bit reduction noise constants re-calculated;
Allowable clips per channel per codec_block set to -1=0; -2=1; -3=2.
-noclips parameter implemented to allow user to set allowable clips=0 for -2 & -3;
Code optimised further in IA-32/x87;
Now checks for existence of correction file and requires -force parameter to over-write.

Change log beta v0.7.0:
Implementation of "-clips" parameter to set number of allowable clips per channel per codec_block (0<=n<=512).

Change log beta v0.6.9:
Code speedup;

Change log beta v0.6.8:
Implementation of dynamic minimum_bits_to_keep=5. Dynamic in the sense that the maximum bit is determined for each codec_block (taking sign into account) rather than just assuming bits_per_sample;
Implementation of allowable_clips per channel per codec block. -1 = 0; -2 = 1; -3 = 5. Based on the 512 sample codec_block_size this will allow at most 0.1134 milliseconds of clipping per channel per codec_block.

Change log v0.6.7 RC2:
-nts values for -1, -2 & -3 changed to -4, -2 and 0 respectively;
Processing speedup identified during problem sample investigation incorporated (thanks Alex B!);
Spreading function string for -3 changed back to: 22224-22236-22347-22358-2246C;
53 sample test set processed at -3 now produces 462.2kbps; 41.0MB.

Change log beta v0.6.6:
Positive change in bits to remove limited to an increase of +2 bit per codec_block, no -ve limit;
Additional 1024 sample FFT analysis removed (reverted to -512:511; 0:1023 on a 512 sample codec_block);
Spreading Function string for -3 changed to: 22224-22236-22347-22358-22469;
53 sample test set processed at -3 now produces 440.8kbps; 39.1MB.

Change log beta v0.6.5:
Additional 1024 sample FFT analysis introduced per codec_block;
Fairly massive speedup "accidentally" found and implemented - compromised by the additional analysis;
positive change in bits to remove limited to an increase of +1 bit per codec_block, no -ve limit;
Now able to process between 4 and 32 bit sample WAV files (I think - limited testing so far.....).

Change log v0.6.4 RC1:
Parameters kept:
-1, -2, -3; -o <folder>; -nts <n>; -snr <n>; -force; -check; -correction; -quiet; -nowarn; -below; -low.
Parameters removed:
-skew <n>; -spf <5x5hex>; -fft <5xbin>; -cbs <n>; -detail; -wmalsl.
Silence detection routine removed - very small gain for dubious benefit.
Code tidied and slight assembly optimisations implemented.

Change log beta v0.6.3:
[Implementation of experimental silence detection method using -detection parameter]. Removed - not satisfied with results.

Change log beta v0.6.2:
Fixed sample limit checking bug introduced in v0.6.1

Change log beta v0.6.1:
-correction parameter implemented which will create a .lwcdf.WAV file which, when added to the lossy.WAV file using a not yet implemented parameter of lossyWAV, will reconstitute the lossless original file.
Error finally found in remove_bits routine (which is why it's taken so long for me to implement the -correction parameter) - very slight increase in bitrate (about 0.54kbps for my 53 problem sample set).
-shaping parameter removed.
When the corresponding .lossy.wav and .lwcdf.wav files, processed using lossyWAV -3, are encoded using FLAC -3 -m -e -r 2 -b 512, the total size for my 53 sample set (69.4MB FLAC) is 76.3MB : 39.0MB .lossy.FLAC, 37.3MB .lwcdf.FLAC.
shadowking
Thanks, I'll check these when I get a chance.
shadowking
Its close to Dualstream quality 3 , better than wavpack at 320k -similar to wavpack 350k high modes but better bitrate efficiency . I will need to abx these when its dead quite but so far can do only abx atemlied (slight noise) and metmorphose [abrupt noise]. Average bitrate = 339 k ranging from 315~395 k

Metamorphose shows savage burst of noise not heard in wavpack or dualstream when using flat noise approach. Usually there is a rise in hiss but this is something that I've heard in shorten lossy and could be an issue.

Overall it looks good. I am more interested in overall performance at 340k than @ 480k.
Nick.C
I re-processed Atem_Lied & Metamorphose using: 1.5ms & 20ms analyses, force_dither_LSB, use min(min(bits_to_remove_table))+1 bits to remove *not mean(mean...)*, experimental triangular gaussian dither and 30/32 fix_clipped reduction, minimum_bits_to_keep=6.

<files removed - obsolete>
Wombat
QUOTE (Nick.C @ Jul 15 2007, 21:47) *
I re-processed Atem_Lied & Metamorphose using: 1.5ms & 20ms analyses, force_dither_LSB, use min(min(bits_to_remove_table))+1 bits to remove *not mean(mean...)*, experimental triangular gaussian dither and 30/32 fix_clipped reduction, minimum_bits_to_keep=6.

Well. I never tested these lossy "lossless" approaches but was bit curious.

This Atemlied problem sounds like these problems lame mp3 has on several tonal samples and only was shortly improved. Like somewhere near you hear a silent windblow.

I only listened to Atemlied and wonder how clear this problem is audible. The second approach you offer here Nick.C is only marginal better than the one above.
Nick.C
Thanks for the input Shadowking & Wombat - it seems more and more likely that removing any more bits than 2Bdecided's method calculates is going to noticeably impair quality.
Nick.C
Updated script containing revised fix_clipped method.

Updated (again) - code (and my thought processes) tidied up a fair bit. (20070719)

<files removed - obsolete>
Nick.C
Source modified again - realised that rectangular dither = triangular dither /2 and the gaussian dither I was using equated to triangular / (4 to 6 or more....). Changed the dither routine a bit - introduced a dither_amplitude parameter - rectangular = 0.5; triangular = 1.0.

Had another go at the conditional clipping reduction factor - I think that it's closer to "right" now.

<files removed - obsolete>
2Bdecided
Wow - very neat - you put me to shame!

(and you should see the state of the MATLAB scripts I write which I _don't_ release!)

Great work spotting the better codec block size. You could do a check on each file, trying various options (automatically I mean, but it would be painful). If it goes into Wavpack, I hope David does this. When I looked (though I didn't go down below 1024) the optimal lossyFLAC block size is often related (not perfectly) to performance of standard FLAC - on a lot of these samples, 1024 is better than 4096 without lossy pre-processing.


I'm a bit uncomfortable with having a different amount of scaling in each block (to prevent clipping). It's like a very weird DRC. Still, it's just an option, and quite useable for your application. If you crossfaded, it would be better still.

I think you've broken the rectangular dither. Half amplitude triangular dither ~= rectangular dither. Plot the PDFs to see why, but the clue is in the names wink.gif.


I like the structure, but I see that just after I combined two loops (analysis then apply) into one, you split it back into two. (Unless I did that? It's late, I forget). Anyway, that will make it a bit harder for someone to come along (as they eventually must) and make this work on files on disc, rather than loading the whole file into memory. It does make it a little easier to test and develop though, which is why I started with two loops.

When I get back to it, my main planned task is noise shaping. That's either going to revolutionise it, or not work!

Cheers,
David.
Nick.C
Ah - sorry about the rectangular dither - easily mended....

The scaling is applied to the whole file, not just one block. It's calculated to find the minimum block value then that minimum is applied to the whole file when the bit-reduction is done.

Still having fun...... smile.gif
Nick.C
Rev.23: Dither "fixed" (i.e. returned back to previous working version.... smile.gif )
There was about 0.9MiB difference between proper rectangular and 0.5 x triangular when compressed (rectangular bigger, 33.9Mib vs 33.0MiB).

Rev.24: "more likely to be nearer the mark" implementation of amplitude_response modification. Fileset now: WAV: 98.6MiB; FLAC 56.8MiB; ss.FLAC 28.4MiB over the 41 samples.

<files removed - obsolete>
Nick.C
Rev:25 Revised implementation of equal_loudness_filter. Files now lose more bits under the equal_loudness_filter if they are louder - as might be expected. Fileset: WAV: 98.6MiB; FLAC: 56.9MiB; ss.FLAC(no elf): 35.8MiB; ss.FLAC(elf): 29.6MiB. Fileset using equal loudness filter, no dither, no clip-fixing comes in at 25.2Mib blink.gif



<files removed - obsolete>
Wombat
QUOTE (Nick.C @ Jul 25 2007, 16:14) *
Rev:25 Revised implementation of equal_loudness_filter. Files now lose more bits under the equal_loudness_filter if they are louder - as might be expected. Fileset: WAV: 98.6MiB; FLAC: 56.9MiB; ss.FLAC(no elf): 35.8MiB; ss.FLAC(elf): 29.6MiB. Fileset using equal loudness filter, no dither, no clip-fixing comes in at 25.2Mib blink.gif

If it is for any help. Atemlied is still easily abxble and sounds nearly as the second try you provided.
I calles it ss2 in the abx test.

foo_abx 1.3.1 report
foobar2000 v0.9.4.3
2007/07/25 23:03:12

File A: C:\Temp\nforce\temp\Atem-lied.wav
File B: C:\Temp\nforce\temp\Atem_lied.ss2.flac

23:03:12 : Test started.
23:04:36 : 01/01 50.0%
23:04:52 : 02/02 25.0%
23:05:06 : 03/03 12.5%
23:05:29 : 04/04 6.3%
23:05:49 : 05/05 3.1%
23:06:04 : 06/06 1.6%
23:06:19 : 07/07 0.8%
23:06:35 : 08/08 0.4%
23:06:50 : 09/09 0.2%
23:07:02 : 10/10 0.1%
23:08:48 : Test finished.

----------
Total: 10/10 (0.1%)
Nick.C
<file removed - obsolete>
Wombat
QUOTE (Nick.C @ Jul 26 2007, 09:00) *
Hmmmmm....... Try this one - triangular dither, no elf, clip_reduction.

Just downloaded and testet. I have to admit this is on the edge what i can clearly abx but it is still possible on 2 places i picked in the beginning. I donīt think the filesize is that promising also.

foo_abx 1.3.1 report
foobar2000 v0.9.4.3
2007/07/26 21:11:08

File A: C:\Temp\nforce\temp\Atem_lied.ss3.flac
File B: C:\Temp\nforce\temp\Atem-lied.wav

21:11:08 : Test started.
21:11:30 : 01/01 50.0%
21:11:51 : 02/02 25.0%
21:12:13 : 03/03 12.5%
21:13:09 : 04/04 6.3%
21:13:49 : 05/05 3.1%
21:14:10 : 06/06 1.6%
21:14:25 : 07/07 0.8%
21:14:51 : 08/08 0.4%
21:15:13 : 09/09 0.2%
21:17:18 : 10/10 0.1%
21:17:35 : Test finished.

----------
Total: 10/10 (0.1%)
Nick.C
Last attempt (for tonight anyway....) - elf on (algorithm changed), triangular dither, more clip reduction.

ps. Thanks for the testing smile.gif


<file removed - obsolete>
Wombat
QUOTE (Nick.C @ Jul 26 2007, 22:01) *
Last attempt (for tonight anyway....) - elf on (algorithm changed), triangular dither, more clip reduction.

ps. Thanks for the testing smile.gif

Sorry, no need to abx. At second 3-4 is clearly more noise than in your last try.

Edit: to me it sounds even worse than your second try you lately provided cause of this more pronounced hiccup.
Nick.C
<file removed - obsolete>
Wombat
QUOTE (Nick.C @ Jul 26 2007, 22:27) *
Here's another......

Well, i canīt abx this!
I have to add that i am already tired like hell from a hard day.
Nick.C
Thanks again - the filesize is going up, but compared to the FLAC file it's still quite small. I'm going to try a few permutations on block_size.....
Nick.C
I've been looking at the FFT_Lengths used in the analysis process and the number of analyses. For triangular dithered, fix_clipped=1, force_dither_LSB=1, no elf I get the following:

FFT_Lengths: 1024, 64: size=34.0MiB; Rate: 3.01x - 2Bdecided's original process;
FFT_Lengths: 1024, 256, 64: size=34.7MiB; Rate: 2.34x - 2Bdecided's overkill process;
FFT_Lengths: 1024, 512, 256, 128, 64: size=35.4MiB; 1.54x - Total overkill, although it covers the full set of analyses between original limits.

<file removed - obsolete>

I've now got the script storing individual bits_to_remove_table values for each block in an array for analysis.
Wombat
QUOTE (Nick.C @ Jul 27 2007, 14:34) *
I've been looking at the FFT_Lengths used in the analysis process and the number of analyses. For triangular dithered, fix_clipped=1, force_dither_LSB=1, no elf I get the following:

FFT_Lengths: 1024, 64: size=34.0MiB; Rate: 3.01x - 2Bdecided's original process;
FFT_Lengths: 1024, 256, 64: size=34.7MiB; Rate: 2.34x - 2Bdecided's overkill process;
FFT_Lengths: 1024, 512, 256, 128, 64: size=35.4MiB; 1.54x - Total overkill, although it covers the full set of analyses between original limits.

Atem_lied appended from the 1024, 256, 64 process.

I've now got the script storing individual bits_to_remove_table values for each block in an array for analysis.

No, again no abx result. We may be in a region here my PC noise comes thru more than anything wrong with the file.
Nick.C
Considering further complicating this with some downsampling. We'll see how the code goes before I produce some results.

Right, I've implemented a not quite crude downsampler (n samples > n-1 samples, freq > old freq * (n-1)/n). For the sample attached, I went 3 > 2, 44.1kHz > 29.4kHz with triangular dither then through the bit reduction process separately.

On the other hand - I can let Foobar do a transcode from wav to wav with the resampling DSP enabled - very clean! See attached.

<file removed - obsolete>
2Bdecided
There's a resampler built into MATLAB and by default it's not very good. SSRC (or fb2k, CEP/Audition etc) are much better options. Stick with 32kHz as a target rate.

Cheers,
David.
Nick.C
Target rate - 32kHz (used foobar2000 PPHS resamples, ultra mode), high frequency limit 15.5kHz (16kHz gave v.large files.....)

.sl31 = equal loudness filter on; 3 analyses, btr_type=1 (min(min....));
.ss31 = equal loudness filter off; 3 analyses, btr_type=1 (min(min....));

Source will follow when tidied up.

<files removed - obsolete>
2Bdecided
I can't ABX, but don't have a quiet environment so please don't rely on me!

I haven't had chance to try your code, but the bitrates are comparable to the original code with ns=6. Look back in the original thread to see what halb27 could ABX at ns=6 - I think it was "furious". It might be worth trying.

I think resampling to 32k is the way to go for lower bitrates, if your DAP supports it and your ears can't hear it (I'm OK on both counts!).

Sorry I haven't had time to add anything constructive.

Cheers,
David.
Nick.C
Right - revised source (and 1 external function) - uses wavreadraw and wavwriteraw - not attached, but basically don't convert raw audio data into +/- 1.0 range.
BGonz808
I really like this idea of a preprocessor, and of course the near-lossless small flac files! But how can I use the MATLAB script. I don't have matlab and it seems impossible to get a trial. Could this preprocessor be turned into a foobar2000 dsp plugin by any chance? or a commandline program? cool.gif
and why aren't wavreadraw and wavwriteraw attached!

Thanks
Bobby
BGonz808
Please attach the wavreadraw and wavewriteraw so I can give this prog a spin. I dont know how to code matlab to use raw wav.

tongue.gif im a noob!
Nick.C
Wavread and wavwrite are copyrighted Matlab code and I will not post them - however they are easily modifiable - look for a section which multiplies (wavwrite) or divides (wavread) the audio data by 32767 or 32768 and insert a "%" before that line to "REM" it out - that will sort it for 16 bit audio. Oh, and save the functions to a different name or you will have broken the originals.
BGonz808
Thanks. That was a bit of an oversight on my part smile.gif
Nick.C
Realising that there are only so many parameters to be played with without destroying the audio quality of the output.......

I've been playing around with the spreading function - previously length=4 (i.e. [0.25,0.25,0.25,0.25]) - I've tried even numbers of length from 6 to 16 and am pleasantly surprised by the results. Atem_Lied attached for spreading function lengths of 8, 12 and 16 for your listening pleasure(?!).

[edit]
Following the processing of these samples (constant spreading_function_length with variable fft_length per analysis), I've started "playing about" with variable spreading_function_length with variable fft_length per analysis. There should be some processed results later tonight.
[/edit]

[edit2]
Right, samples attached - .ssx1.flac is 3 analyses (1024,256,64 fft_lengths) and corresponding spreading_function_lengths: 16,8,4;.ssx2.flac is 3 analyses (1024,256,64 fft_lengths) and corresponding spreading_function_lengths: 64,16,4;
[/edit2]



<files removed - obsolete>
halb27
The ss12, ss16, ssx1 and ssx2 versions are easily abxable.
Not quite so with ss8 - it took me a lot of concentration. Guess with 'normal' though concentrated listening it will go unnoticed.

But: what are the advantages against 2Bdecided's original apprach? Are you attaining a significantly lower bitrate?
Nick.C
Thanks for the listening time!

The bitrate is coming down a fair amount. For the 41 samples in the set, all using triangular dither:

WAV=98.6MiB;
FLAC=56.9MiB;
2Bdecided's (fft_length=1024,64; codec_block_length=1024; spreading_function_length=4,4)=35.4MiB;
NIC .ss20 (fft_length=1024,64; codec_block_length=576; spreading_function_length=4,4)=34.0MiB;
NIC .ss30 (fft_length=1024,256,64; codec_block_length=576; spreading_function_length=4,4,4)=34.7MiB;

Revised script appended.

<files removed - obsolete>
Wombat
QUOTE (Nick.C @ Aug 6 2007, 17:36) *
Realising that there are only so many parameters to be played with without destroying the audio quality of the output.......

I've been playing around with the spreading function - previously length=4 (i.e. [0.25,0.25,0.25,0.25]) - I've tried even numbers of length from 6 to 16 and am pleasantly surprised by the results. Atem_Lied attached for spreading function lengths of 8, 12 and 16 for your listening pleasure(?!).

[edit]
Following the processing of these samples (constant spreading_function_length with variable fft_length per analysis), I've started "playing about" with variable spreading_function_length with variable fft_length per analysis. There should be some processed results later tonight.
[/edit]

[edit2]
Right, samples attached - .ssx1.flac is 3 analyses (1024,256,64 fft_lengths) and corresponding spreading_function_lengths: 16,8,4;.ssx2.flac is 3 analyses (1024,256,64 fft_lengths) and corresponding spreading_function_lengths: 64,16,4;
[/edit2]



<files removed - obsolete>

Ok. today i was able to abx all 3 versions ss12, ssx1 and ss8.

What do you want now with all these attached files above?
halb27
@ Nick.C:

I appreciate 2BDecided's and your work very much.
But if you go and produce an inflation of numerous variants I guess we're heading into a problem.
On one hand I'm afraid not a lot of members will love to do such listening tests on the 121st of your variants, but what's worse is: you may find a variant producing a good atem-lied encoding and save 15% against 2BDecided's version. But what about general quality outside of Atem-lied?

IMO it would be best if you and 2BDecided work together even more closely in the sense that you go along a specific approach which you both think is most promising. And for this provide various listening samples for us to give quality feedback to you.
Though a saving of bitrate is very welcome the more important target at the moment IMO is a robust excellent quality. Don't worry but so far to me it seems that an approach closer to 2BDecided's original one seems to produce the more reliable results. But I think if you bring your both ideas together something great will come out. Maybe it's not so appropriate to produce something that makes the lossy flac encoding competitive with say wavPack lossy regarding bitrate. After all we have wavPack lossy for that. But as FLAC is widely supported on music players there is sense in having lossy FLAC files of extremely high quality of significant smaller size than the lossless ones.
Nick.C
Apologies for "going through the permutations" on the various options available in the script. Simplistically, it comes down to:

2 or 3 analyses? (processing time implication, slight size increase on 3);
fixed or variable spreading_function_length? (smaller size on variable);
ELF on or off - still unproven.

So, The only ones that are likely to be "better" than .ss8.flac are .ss20; .ss21; .ss30 and .ss31, i.e. .ss(2 or 3 analyses)(0=fixed;1=variable spreading_function_length).

From Halb27's and Wombat's comments earlier I would guess .ss20 or .ss31 are realistic candidates.

Having it narrowed down to two (or possible 1 - .ss20, as it's the closest to the original concept), is it worth producing a set of selected samples for ABX? If so, which samples would you recommend of those previously mentioned in the main thread (or others...)?
halb27
I welcome most if you can narrow it down to one, more so if this is closest to 2BDecided's original version and in case .ss31 doesn't give hope for the chance of significantly improving things over .ss20.

I propose
  • atem-lied
  • furious
  • keys (pointed to by shadowking)
  • triangle (pointed to by shadowking)
  • badvilbel
These are specific problem samples where problems with these kind of codecs should be most obvious.
We should also have samples where 'normal' hiss is most prominent.
I know just
  • bruhns (given by guruboolez)
but we should have more samples.
shadowking
I agree with Halb27. I don't have time to test all these modes and I don't know what happens at lower bitrates. Wavpack usually sounds good from 230 k , but others I tested don't - shorten, rkau (violent bursts of noise etc). On metamorphose sample I heard some similar phenom with the preprocessor.

I am happy with the original 2Bdecided method. People will be very suspicious with the thought of lossy FLAC etc. If we can from the start produce a near lossless reduction that is virtually not *abxable* under any condition and as good as lossless from a practical point of view then that will be more acceptable than another threshold than won't always hold. Once someone with lots of time and effort finds some fault people will start spreading bad rumours that we are destroying lossless compression etc etc

On the other hand 512k is not small but still much more so than lossless. If one desires an extreme high quality that holds up to anything then that will be a new 'lossless' to the masses @ 512 k.. size won't be the issue but imperfection will.

So wavpack , optimfrog, flac @ 512k end-to-all quality is better than 350k - 99% perfect quality when you package the lossy mode with FLAC name.
halb27
QUOTE (shadowking @ Aug 8 2007, 12:20) *
.. So wavpack , optimfrog, flac @ 512k end-to-all quality is better than 350k - 99% perfect quality when you package the lossy mode with FLAC name. ..

Perfectly said.
Never thought about quality demands being higher for lossy .flac files but I think this is absolutely true.
A lossy flac file should be indistinguable from the original with a probability of 1 (within the limitations of getting sure of that in practice).
Nick.C
Which prompts me to consider introducing a *negative* noise_threshold_shift value (say -1 or -2) to the parameter setting (i.e. reduce bits to remove slightly).

Using 2 analyses, fixed length spreading function, NTS=-1, the sample set increases from 34.0MiB to 34.9MiB lossy flac (56.9MiB flac / 98.6MiB wav).

<files all now found - thanks to Halb27!>

Atem_Lied, Badvilbel, Bruhns, Furious, Keys & Triangle_2 attached - 2 analyses; fixed length spreading_function; NTS=-1.
2Bdecided
I think there will be room for two or three settings only...

1. Transcode and multi-gen proof (or overkill option for cautious people). Re-encode it 20 times at this setting and it'll still be alright. Transcode it to anything and it'll still sound (about) as good as encoding straight from the original.

2. Normal. Chances of ABXing original from lossyFLAC normal should tend to zero, but it probably won't stand up to 20 generations of re-encoding.

3. Compact. Allows you to introduce known compromises to get the filesize down if you want to, e.g. resampling to 32kHz.


I have tried to deliver number 2 on that list. If it fails ABX with anything, then some of the parameters will need to be tightened up. So far it hasn't, but let's see. I never dreamed that someone would be as inventive as Nick in using these parameters to reduce the bitrate - I intended to use them to tweak the code to improve quality (if necessary).

I think it's obvious how to deliver number 1 - shift the noise threshold (already implemented) down and put in some extra checks (e.g. extra FFT size - already implemented, M/S checking - not yet implemented). Some of the extra checks might end up in number 2 anyway if it's ABXed - we'll see.

I believe Nick is trying to deliver number 3 on that list. To be honest, with a flat noise floor, I don't think there's much that can be done to deliver this. The noise floor is already pretty much where I think it should be - at the same level (or, if it's shifted, related to the level) of the minimum noise floor in the recording. If the existing calculation is wrong, and it puts noise above or below the existing noise level, then this should be fixed and integrated into number 2. The only extra steps you can take are to ignore stuff above a fixed frequency (already implemented by myself), or to take account of the MAF (already implemented by Nick). Anything else, how ever clever, must by definition be pushing the noise above the noise floor of the original recording. It may be audible, it may not - you'd need a psychoacoustic model to decide. However, I've already seen people ABX tracks with the noise threshold 6dB up (i.e. 1 more bit removed) so it doesn't seem that there's much room for improvement. There could be some - it depends on the signal, how much you want to lower the bitrate by, and how hard you're willing to work to do it.

What can deliver number 3 (at least for most signals) is to use a shaped noise floor, as suggested by SebG on page one of the original thread...

http://www.hydrogenaudio.org/forums/index....st&p=498376

This is basically what's described here...

http://telecom.vub.ac.be/Research/DSSP/Pub.../AES-2002-B.pdf
(there are other similar papers by the same authors)


I tried a cheats version by designing the minimum phase noise feedback filter directly from the desired magnitude response (quite easy, and already built into MATLAB sig proc toolbox, though I'd coded it myself before I found this!), but that doesn't take account of the constraints of gain (which should average to unity on a log scale, if I understand it correctly), and needing the first filter coefficient to be 1. If scaling the coefficients to make the first coefficient be 1 also happens to result in a reasonable gain, it works well. Normally this won't happen, and you'll add tens of dB of extra noise!

So to make it work, I (or someone!) will have to implement what's described in that paper. I haven't worked on LPCs before, but they seem to describe a short cut, and I'll give it a go when I get chance.

Cheers,
David.
halb27
IMO the efficiency option 3 can be considered seperately.
As you mentioned anyone who is out for smaller file size can achieve it right now by resampling to 32 kHz in advance (that's what I do with wavPack lossy).
This kind of noise shaping sounds interesting, but it's a new building block and can be done later.
At the moment it makes things more complicated and thus keeps us further away from what is needed most: that a nice guy come up and create an exe program from your idea.
Maybe it would help if you could provide a more detailed description of it that can be understood by a programmer without very detailed DSP knowledge.
halb27
QUOTE (Nick.C @ Aug 8 2007, 12:42) *
... Currently hunting for those furious & bruhns - can't find them - they seem to have been removed. ...

Here they are:
Click to view attachment Click to view attachment
Nick.C
Many thanks!
halb27
QUOTE (Nick.C @ Aug 8 2007, 12:42) *
...Atem_Lied, Badvilbel, Keys & Triangle attached - 2 analyses; fixed length spreading_function; NTS=-1. ...

Atem_lied: 9/10 (pretty hard for me to abx)
badvilbel: could not abx
keys: 8/10 (easier to abx than shown by the score - didn't catch the problem with my first two guesses)

triangle: guess I wasn't specific enough with the triangle sample I was thinking of. Thought of this one:
Click to view attachment
I don't have the original of your triangle version.
2Bdecided
QUOTE (halb27 @ Aug 8 2007, 15:43) *
IMO the efficiency option 3 can be considered seperately.
As you mentioned anyone who is out for smaller file size can achieve it right now by resampling to 32 kHz in advance (that's what I do with wavPack lossy).
This kind of noise shaping sounds interesting, but it's a new building block and can be done later.
At the moment it makes things more complicated and thus keeps us further away from what is needed most: that a nice guy come up and create an exe program from your idea.
Maybe it would help if you could provide a more detailed description of it that can be understood by a programmer without very detailed DSP knowledge.


Last point first: I'd have thought that "a programmer without very detailed DSP knowledge" could work from the MATLAB code (and an FFT library) more easily than from a description. If there's anything confusing about the code, I'd be more than happy to help. I would stress that it's not optimised. It's there for people to find problem samples, and update it. However, I guess this will be much easier if it's an exe, so to solve the chicken and egg situation, an exe would be great!

The noise shaping will have to wait until someone has the time to do it anyway. It might end up in option 2 if it works well enough, or be switchable separately.

So yes, certainly, if anyone can take on the task of coding it properly, please go for it. Nicks code is clearer than mine, but I don't think the experimental quality reducing options should be included, unless they work.

Cheers,
David.
Nick.C
QUOTE (2Bdecided @ Aug 8 2007, 19:58) *
............but I don't think the experimental quality reducing options should be included, unless they work.


Neither do I - I'm only now realising the importance of maintaining excellent quality in any processing to be implemented and subsequently encoded in the flac format - the last thing I would want to do is adversely skew "public" opinion against flac due to a poor lossy implementation.

I will post a clean version of the script without any extraneous experimental gubbins - in the hope that someone can turn it into a usable binary.
halb27
QUOTE (Nick.C @ Aug 8 2007, 12:42) *
... Bruhns, Furious, ....Triangle_2 ...

Bruhns: Did two sessions on two different spots that were suspicious to me and got at 7/10 in each session.
Very hard for me.
Triangle: Could not abx a difference..
Furious: Could not abx a difference.

So as far as to my results towards these samples the quality of your variant is very good to me keeping in mind that these are hard problems for wavPack lossy suspected to be not eeasy for this preprocessor too.
A good candidate for 2BDecided's option 3 when it's up to that.
Nick.C
If you're up to some more listening, 2Bdecided originally added a third analysis as an "overkill" option. The other way to increase bitrate is to introduce a negative noise_threshold_shift. The attached samples were processed with 3 analyses, noise_threshold_shift=-2; triangular_dither; force_dither_lsb=1; fix_clipped automatically if necessary after bit reduction and rounding.


Revised script attached - no longer requires external amplitude function but still requires modified wavread/write functions.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.