Near-lossless / lossy FLAC, An idea & MATLAB implementation |
![]() ![]() |
Near-lossless / lossy FLAC, An idea & MATLAB implementation |
Jun 28 2007, 14:27
Post
#151
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
ShadowKing, I take it that those samples are LossLess FLAC?
-------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
Jun 28 2007, 14:31
Post
#152
|
|
![]() Group: Members Posts: 1495 Joined: 31-January 04 Member No.: 11664 |
|
|
|
|
Jun 28 2007, 16:23
Post
#153
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
ShadowKing's samples.
CODE FLAC PP10 No artifacts noticable.
======================================================= 10 - Dungeon - The Birth- The Trauma Begins 919 453 A02_metamorphose 846 507 aps_Killer_sample 929 484 Moon_short 834 550 velvet 957 516 ======================================================= Average 1411 897 502 100% 64.6% 35.6% 100% 56.0% ======================================================= This post has been edited by Nick.C: Jun 28 2007, 16:24 -------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
Jul 3 2007, 13:09
Post
#154
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
Playing about with the code, I've added a "choose_bits_to_remove" parameter - which is used as follows:
CODE if (choose_bits_to_remove==0), bits_to_remove(block_number)=min(min(bits_to_remove_table)); else bits_to_remove(block_number)=floor(mean(mean(bits_to_remove_table)))+(choose_bits_to_remove-1); end; bits_to_remove(block_number)=min(bits_to_remove(block_number),bs-minimum_bits_to_keep); To my ears (combined with minimum_bits_to_keep=5) the transparency threshold is about 3 or 4. Setting Minimum_bits_to_keep (MBTK) to 6 improves BTR=4. The bitrate reduction is fairly significant: CODE Samples: 10 - Dungeon - The Birth- The Trauma Begins, 41_30sec, A02_metamorphose,
annoyingloudsong, aps_Killer_sample, Atem_lied, ATrain, birds, E50_PERIOD_ORCHESTRAL_E_trombone_strings, eig, glass_short, jump_long, Moon_short, rach_original, rawhide, S13_KEYBOARD_Harpsichord_C, S30_OTHERS_Accordion_A, S34_OTHERS_GlassHarmonica_A, S35_OTHERS_Maracas_A, S53_WIND_Saxophone_A, thewayitis, VELVET |=====|=========================| | WAV | 53,763,880 (1411.2kbps) | |FLAC | 29,767,971 ( 781.2kbps) | |=====|=========================|========================|========================| | | MBTK=5 | MBTK=6 | MBTK=7 | |=====|=========================|========================|========================| |BTR0 | 17,209,767 ( 451.7kbps) | 17,209,767 ( 451.7kbps)| 17,256,277 ( 452.9kbps)| |BTR1 | 16,052,243 ( 421.3kbps) | 16,052,243 ( 421.3kbps)| 16,110,776 ( 422.9kbps)| |BTR2 | 13,259,455 ( 348.0kbps) | 13,313,411 ( 394.4kbps)| 13,530,611 ( 355.2kbps)| |BTR3 | 10,814,615 ( 283.9kbps) | 11,025,396 ( 289.4kbps)| 11,369,979 ( 298.4kbps)| |BTR4 | 8,959,432 ( 235.1kbps) | 9,288,634 ( 243.9kbps)| 9,732,593 ( 255.5kbps)| |=====|=========================|========================|========================| This post has been edited by Nick.C: Jul 3 2007, 14:30 -------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
Jul 3 2007, 14:07
Post
#155
|
|
![]() Group: Members Posts: 1495 Joined: 31-January 04 Member No.: 11664 |
You should start to pickup some hiss below 300k . Sometimes turning up the volume reveals it, otherwise these encoders are artifact free.
Dungeon - baby crying added hiss Velvet - noise moving around beats (doom-chik-doom-chik) Atemlied - hissing on the phone ringing part 41 secs - cymbals 'dusty' metmorphose - hiss on the HF bits moon short - slight hiss This post has been edited by shadowking: Jul 3 2007, 14:09 |
|
|
|
Jul 3 2007, 14:11
Post
#156
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
Which BTR were you using? MBTK=7 (or maybe 8?) may help. My "testing" is on earbuds at moderate volume - suitable for an office environment at lunch. It also replicates my most likely playback environment.
-------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
Jul 3 2007, 14:15
Post
#157
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
Nick,
My gut feeling (and I haven't tried it yet) is that this will introduce audible problems. Near the start of this thread, halb27 ABXed some samples with 6dB and 12dB more noise than default. From the bitrates, it looks like you're pushing it even further than that. I've been working to solve the problem sample I managed to manufacture. It's fixed now with rectangular or triangular dither, which I've finally implemented properly. I still think it's a waste of time for most content, but it's nice to have the option. I'll upload when I get the chance. Cheers, David. |
|
|
|
Jul 3 2007, 14:22
Post
#158
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
Good afternoon David,
In ways, I'm looking for "an acceptable bitrate / quality" balance - my DAP of choice plays FLAC and this method of bitrate reduction feels "cleaner" than moving to a full blown lossy codec. Your original concept has proven itself - how far it can be pushed whilst maintaining "acceptable" quality is another matter. I see this as an analog to the LAME -V0 .. -V9 options. Looking forward to the revised source to chew on..... -------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
Jul 4 2007, 10:16
Post
#159
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
If you want to force the bitrate lower, you can do any or all of the following (with predictable results)...
* Resample to 32kHz - (removes frequencies above 16kHz) * Reduce the bitdepth (e.g. 14-bits, 12-bits) within the 16-bit file - (introduces fixed noise) (either pre-process, or force "bits_to_remove" to always be above a certain number) * ReplayGain (or just reduce the volume) before encoding - (makes it quieter!) * Use a positive noise_threshold_shift - (introduces variable noise) Part of what you've done is similar to just reducing the bitdepth, but might be less predictable. I'll post some numbers in a moment... This post has been edited by 2Bdecided: Jul 4 2007, 10:39 |
|
|
|
Jul 4 2007, 12:00
Post
#160
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
I grabbed all the files from the Atem_lied to thewayitis test set.
Regular flac: 728kbps Lossy flac: 524kbps Lossy flac nts+6dB: 457kbps Regular flac RG: 756kbps (! didn't help, because most of these files are quiet!) Regular flac RG 32k: 592kbps Lossy flac RG 32k: 441kbps Lossy flac RG 32k nts+6dB: 386kbps Lossy flac RG 32k nts+12dB: 328kbps Lossy flac RG 32k nts+24dB: 230kbps I also tried annoyinglyloudsong: Regular flac: 1252 kbps Lossy flac: 411kbps Regular flac RG 32kHz: 828kbps Lossy flac RG 32kbps nts+6dB: 266kbps Lossy flac RG 32kbps nts+12dB: 211kbps Lossy flac RG 32kbps nts+24dB: 133kbps I ran all these tests with triangular dither. With the caveat that the block switching might not be debugged, I've attached my latest script. Resampling to 32kHz is normally transparent for me, but won't be for people who can hear above 16kHz. nts+24dB sounds awful - like an FM radio with a very weak signal nts+12dB sounds OK. The hiss is audible if you listen carefully. It's probably OK for you Nick. nts+6dB sounds good. It's probably ABXable, but I didn't try. Cheers, David.
Attached File(s)
|
|
|
|
Jul 4 2007, 13:35
Post
#161
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
Examples for Nick.
Not transparent.
Attached File(s)
annoyingloudsong_32k_nts6.lossy.flac ( 520.44K )
Number of downloads: 180
annoyingloudsong_32k_nts12.lossy.flac ( 411.19K )
Number of downloads: 172
annoyingloudsong_32k_nts24.lossy.flac ( 260.42K )
Number of downloads: 131 |
|
|
|
Jul 6 2007, 11:10
Post
#162
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
Looking at the analysis times (1.5ms and 20ms) then the corresponding FFT_Length for those, I was wondering why the time is not set so that no rounding of the power to which two is raised is required when determining FFT_Length?
using time=10^(log10(2)*bits-log10(fs)) yields time (bits=6, fft_length=32) = approx. 1.451ms; time (bits=10, fft_length=1024) = approx. 23.219ms; and for the extra analysis: time (bits=8, fft_length=256) = approx. 5.805ms; Cound there be a benefit in tuning the analysis time exactly to the fft_length? -------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
Jul 6 2007, 12:14
Post
#163
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
Hi Nick,
I kind of picked the times off the top of my head. They seemed like good times. As you've seen, they're converted into numbers of samples the way they are, so you get something close to those times that's a power of 2, irrespective of sampling frequency. It could be neater (it's "closest" on a log scale, which may or may not be ideal), but I can't see any advantage to picking exact times. There can't be any times that will convert to exact powers of 2 for 32kHz, 44.1kHz and 48kHz sampling. If you want to avoid the log calculation, use a look up table, either to approximate the calculation, to specify sample values directly for common sample rates. However, I think there are other log calculations later in the code that you can't avoid. Cheers, David. btw, do the 32kHz sampled files play OK on your porable? |
|
|
|
Jul 6 2007, 12:45
Post
#164
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
Oops - didn't reply to the samples - NTS6 and NTS12 play fine, NTS24 is full of hiss - probably to be expected due to the noise added.
Been playing with the number of analyses and fft_lengths: 5 analyses (4,6,8,10,12 bits) and following BTR variant (btr_type=4) CODE btr_sum = sum(sum(bits_to_remove_table)); btr_min = min(min(bits_to_remove_table)); btr_max = max(max(bits_to_remove_table)); btr_size = number_of_analyses * channels; if (btr_type==0), bits_to_remove(codec_block_number)=btr_min; else bits_to_remove(codec_block_number)=max(0,floor((btr_sum-btr_min-btr_max)/(btr_size-2)+(btr_type-1)/2)); end; bits_to_remove(codec_block_number)=bs-max((bs-bits_to_remove(codec_block_number)),minimum_bits_to_keep); This gave me *really* nice sounding results (got a pair of Sennheiser canal phones for my iPAQ) at 272kbps for the sample set used previously. -------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
Jul 6 2007, 14:42
Post
#165
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
So let me see if I've got this right...
You're doing FFTs of sizes 2 to the power 4, 6, 8, 10, and 12. You were taking the mean bits-to-remove across the block, and but now you're adding them together, subtracting the highest and lowest values, dividing by something which isn't quite the number of values, and also dropping an extra 1-2 bits. I'll have to give it a listen. It can't be magic (or, I would think, universally transparent!, but maybe it hides the worst noise where it's least obvious. For a laugh, tell me how long it takes to run your five analysis version in Octave Cheers, David. |
|
|
|
Jul 6 2007, 14:58
Post
#166
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
Basically I'm calculating the mean of all the values (disregarding the highest & lowest) then adding 1.5 bits and finally rounding down.
i.e. bits_to_remove_table=[2,3,4,5,6],[3,4,5,6,6] >> (44-2-6)/(10-2) = 36/8 = 4.5 add 1.5 = 6! Oh, analysis takes a very long time.......... but...... tried 5,7,9 & 11 with btr_type=4 (i.e. add 1.5 bits) and get 292kbps, but with less analysis time. This post has been edited by Nick.C: Jul 6 2007, 15:01 -------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
Jul 6 2007, 15:39
Post
#167
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
I see. I'm unsure as to why it's not (btr_size-2*channels).
I suspect you'll get more noise (possibly audible) for highly tonal and highly transient signals. All else being equal, forcing an extra bit to remove is the same as using a +6dB noise threshold shift (except when bits to remove would have been zero with the former). It should be fine for what you want it for. Cheers, David. This post has been edited by 2Bdecided: Jul 6 2007, 15:41 |
|
|
|
Jul 6 2007, 20:59
Post
#168
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
2*channels would remove 4 values. I only want to remove the highest and the lowest analysis value (i.e. 2), and take the mean of the rest.
To be perfectly frank, I'm trying lots of permutations and seeing how the results pan out - I have two loops set up so that it loops through number_of_analyses=2:5 and btr_type=0:5 and it already loops through the 21 samples in the format .AxBy.wav where x=number of analyses and y=btr_type - leave simmering for quite a while and you get some results to listen to. Oh, I had to modify wavread and wavwrite to read / write integer values and modify your script to do the same as 3 copies of the audio data was causing my machine to run out of memory....... Love the concept - like the fact that I can get good quality at 300 - 350kbps on the sample set. -------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
Jul 6 2007, 23:30
Post
#169
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
Glad you're having fun with it Nick. For myself, I'd feel more comfortable with mp3 at those bitrates, but I could be convinced.
You mentioned modifying waveread and wavewrite. It sounds like a good idea. I don't have to be so careful with 4GB of RAM, but hopefully eventually I (or someone) will implement disk buffering do it's doesn't matter. It's great that you're playing with it and finding useful ways to get good quality at lower bitrates, but there is a hard ceiling with this approach. I don't want to sound negative, but you're adding flat noise, and experience suggests this becomes audible for problem samples ~300-400kbps, and audible for many things much below this. For the future, I'm wondering how well psychoacoustic based noise shaping would work with this. Not instead of what's there already, but as an optional alternative. You could obviously throw away more bits, but the peak level would increase (dramatically in some cases) and you must hit a point where FLAC (or whatever) finds it harder to compress. Bryant has mentioned this before, as has SebG... http://www.hydrogenaudio.org/forums/index....showtopic=11623 It's more complicated than what's in there at present. I might try it just for the fun(!) of it, but I'm off on holiday so it won't be for a while. Cheers, David. |
|
|
|
Jul 7 2007, 11:41
Post
#170
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
If it was easy, anyone could do it......
I'm a totally unskilled amateur in audio processing - but having immense fun. Have a good holiday! [edit] Had a rethink on the forcing extra bits to be removed and reverted back to the simplistic mean(mean(bits_to_remove_table)) alternative - but still using 4 analyses, fft_length =2^(5,7,9,11). Had a look at the triangular dither and found a Gaussian variant in wikipedia and this link http://www.musicdsp.org/showone.php?id=121. Currently using (sum of 8 separate Rand(block_size,channels)-4)/8. Planning to do a lot of conversion for DAP use - now to come up with a method of preserving tags....... [/edit] This post has been edited by Nick.C: Jul 9 2007, 16:57 -------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
Jul 11 2007, 14:20
Post
#171
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
Back to using 2 analyses (6 & 10 bit fft_length), using gaussian dither with 32 repeats and your fix_clipped=2 - forcing bits to be removed again - the dither *seems* to mask the extra bit loss.
Anyway, I can't seem to upload the results (no webspace of my own), so I can't submit for constructive criticism. Removing up to 2 extra bits (1/3 bit at a time) over the mean I can reduce 63.1MiB of WAV to between 20.3Mib and 11.7MiB of .ss.flac (lossless flac = 36.1MiB) for the 25 files in my sample set. This post has been edited by Nick.C: Jul 11 2007, 14:20 -------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
Jul 11 2007, 22:00
Post
#172
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
You can upload in the uploads forum here.
("Developers" can upload in normal threads - don't ask me, I found it be accident) On its own, "forcing some bits to be removed always" just raises the noise floor a little. Most people are more than happy with 14-bits (~FM BBC Radio 3 on very good equipment kind of quality). It's a good strategy to have as an option - I'll certainly merge it into my code when I get back. btw, even just rectangular dither solved the problem sample I created, but if you're forcing just audible noise, your dither choice would be subjective. EDIT: followed the dither link. Dubious information. IIRC Gaussian isn't proven to remove all harmonic distortion or noise modulation, where triangular is perfect in both regards. Rectangular is only perfect in the former - it can leave noise modulation (though we're adding some of that anyway!). (Nice holiday so far, but the weather will probably be poor tomorrow). Cheers, David. This post has been edited by 2Bdecided: Jul 11 2007, 22:05 |
|
|
|
Jul 12 2007, 08:12
Post
#173
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
Glad the holiday's going well!
I managed to upload some files see: http://www.hydrogenaudio.org/forums/index....showtopic=56129 - if there are any other samples anyone would wish to be processed, let me know. Samples uploaded for information basically - the bitrate is dramatically reduced in most cases. Baically, I'm playing with dither now - the gaussian implemented easily, so worth a try at least. I've processed a few albums now and they typically reduce to about 1/3rd of the lossless FLAC size post processing. So, from a magpie's perspective I can fit 3 times as many on my DAP! -------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
Jul 16 2007, 13:11
Post
#174
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
I'm not certain that lossyFLAC will never work at the bitrates you seem to want, but in the meantime, have you heard of mp3?
Cheers, David. |
|
|
|
Jul 16 2007, 13:52
Post
#175
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
Mp3 - fuzzat den?
On reflection, I'll probably fall back to your original script after learning for myself why I wouldn't want to remove any more bits. Still playing with dither and another possible variant on conditional fix_clipped. -------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 25th May 2013 - 08:35 |