Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: lossyWAV Development (Read 561287 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

lossyWAV Development

Reply #50
When it is of any interest i abxed the bruhns sample you offered in post 41.

foo_abx 1.3.1 report
foobar2000 v0.9.4.3
2007/08/08 22:57:53

File A: C:\Temp\nforce\temp\bruhns.ss.flac
File B: C:\Temp\nforce\temp\bruhns.wv

22:57:53 : Test started.
22:59:55 : 01/01  50.0%
23:00:37 : 02/02  25.0%
23:01:08 : 03/03  12.5%
23:02:02 : 04/04  6.3%
23:03:03 : 05/05  3.1%
23:03:52 : 06/06  1.6%
23:05:06 : 06/07  6.3%
23:05:37 : 06/08  14.5%
23:06:27 : 07/09  9.0%
23:07:12 : 07/10  17.2%
23:07:49 : Test finished.

----------
Total: 7/10 (17.2%)

Not that well but i wasn´t able to tell anything wrong with the one offered in post 50.

After realizing the offered wavpack file is nearly the same size as the lossy flacs versions i get my doubts about this approach.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

lossyWAV Development

Reply #51
.. the bruhns sample ... After realizing the offered wavpack file is nearly the same size as the lossy flacs versions i get my doubts about this approach. ...

Classical music as well as other music with a considerably amount of quiet spots compresses relatively well losslessly, so with this kind of music we can't expect a big file size saving (which of course desn't make it very attractive to lovers of these genres).
Popular music however compresses pretty badly when done losslessly so there will be a big saving in file size. So far something like 500 kbps are realistic and this means roughly half the file size of lossless encodings.

So I think this approach is not only only intelligent, but also of real practical importance to many music lovers.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #52
Thanks Wombat - it's nice to know that with improved settings the problem samples seem to become less of a problem.



Are we approaching 2Bdecided's option 2 with these settings?
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #53
A short test on a dozen or so classical samples:

Wavpack lossless -x: 16.25 MB - 722k vbr
Wavpack 550k -x :    11.45 MB - 509k abr

This is a significant saving IMO. Even on very quite cd's there will be some 15 % saving.

lossyWAV Development

Reply #54
If you're up to some more listening...

Atemlied: 7/10, extremely hard
badvilbel: could not abx the difference
bruhns: could not abx the difference
furious: could not abx the difference
keys: could not abx the difference
triangle: could not abx the difference

Very good quality.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #55
Thanks very much for the additional ABX'ing Halb27! Hopefully we're nearer the mark with the following settings:

Okay, 3 analyses, noise_threshold_shift=-3, triangular_dither, smart clipping reduction as before.

49 files: WAV=111MiB; FLAC=63.4MiB; LossyFLAC=42.0MiB.

So, a 1/3rd reduction over the original FLAC filesize - that can't be bad? This equates to approx 536kbps average for this (problematic samples) fileset.

Just processed "The Travelling Wilburys Collection" Disc 1: WAV: 431MiB; FLAC: 307MiB (1005kbps); LossyFLAC: 143MiB (468kbps), using the same settings as above.

[edit] I've noticed that the reference_threshold values calculated just prior to the calculation of the threshold_index values are *extremely* close to linear in two senses in the bits sense and in the fft_length sense, so the whole set of results can be calculated (closely) as in the attached code:

This gives the same average bits to remove figures (to 0.001 accuracy) for a file for dither_choice=1 or 2 and within 0.006 bits average for dither_choice=0.

The variables_filename is no longer dependent on noise_threshold_shift - that's done later, so less calculating of constants.....
[/edit]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #56
Tried to abx the atemlied version from your last post but no chance.
Very good.
The bitrate achieved with your sample album is also very promising.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #57
After all, this is just what us noobs can abx with these samples...
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

lossyWAV Development

Reply #58
Thanks guys - now, as has been said before, we need an executable version to distribute for further testing....

Most importantly, thanks to 2Bdecided for instigating this and providing the original application of the method in script form - the only thing I've added is the conditional fix_clipped method - all the other possible settings were there.....

However, a Foobar2000 DSP plugin has to be at the top of my wishlist - it would make it all *so* much easier, and would more easily preserve tagging information.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #59
... Most importantly, thanks to 2Bdecided for instigating this and providing the original application of the method in script form - the only thing I've added is the conditional fix_clipped method - all the other possible settings were there .....

Wonderful cause I think the more variations we have the bigger is the risk of not getting extremely good quality especially in this early stage.
And as Wombat pointed out the quality verification status at present isn't an extremely good one though probably reflects what can be expected at the moment.
I personally don't care too much about it cause different from the highly efficient codecs IMO there can't go too much wrong with this approach as far as I understand it and especially as we are in the 450+(+) kbps range.

However, a Foobar2000 DSP plugin has to be at the top of my wishlist - it would make it all *so* much easier, and would more easily preserve tagging information.

At the moment I think it's more important to have a standalone exe. For integrating into foobar we can have a simple .bat file that combines the preprocessing with the flac (or whatsoever) encoding. I painlessly use a bat file that resamples to 32 kHz using ssrc_hp and encodes the result to wavPack.

Well I've looked a bit into the script in order to find out whether I should try to produce an exe (at the moment I'm too busy but maybe that's different in a few weeks).
From first view it's not unrealistic cause it's not a very large script and a big part of it I think is not too hard to write in other languages. Anyway there seems to be a lot of stuff that's pretty MATLAB specific (the non-scalar operations) and would be not easy to understand.
Moreover questions (like rounding) may be vitally in context with internal MATLAB representation respectively the properties of numerical data in MATLAB.
Moreover my personality dislikes doing something just formally and blindly not knowing what I'm really doing. It's not necessary (though welcome) to know the exact DSP background, but a more logically and less technical procedure in bringing this code to another language would be welcome.
Looking at your last script it seems well documented though not easy to understand. I can't see for instance directly which operations are done on the entire audio data of the wav file and which are done on a block basis. Maybe it's because everything is done in a large sequence of statements corresponding to a wav file. It would be easier to understand if we had instead of this large sequence of rather atomic statements (though well documented) a rather short sequence of high level statements (aka procedural calls) of the kind
            a) do (a procedural call) logical operation aaa on the entire audio track
            b) do logical operation bbb on the entire audio track
            ...................................
            c) loop through the blocks of n samples:
                c1) do logical operation aaa on the block
                c2) do logical operation bbb on the block
            ....................................
and keep any initialization operations (configurational settings as well) as much as possible inside the corresponding operation itself.

Talking about a logically operation I mean (in contrast to an internal technical operation) an operation that adresses a logical detail of the encoding preprocessing method as such and not an operation that is computationally necessary in a technical sense.

Sure these things are all a matter of taste. I just write about what I feel if I were to transcode it into another language.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #60
Which language would you re-code the script into (assuming you got the time to do it)? I have used Turbo Pascal  / x86 Assembler very successfully in the past and more recently have hacked about with Visual Basic (inside Excel, for work related engineering calculations). If it is a language to which I can get access then I will more than happily contribute to the coding exercise.

I will try to "compartment"  and / or sub-function the code and add comments which make it more clear which element does what. At this time, it may be useful to remove the portion which loops through and processes a number of files - each file would be the subject of a single call to the executable.

Also, it may be sensible to reduce the possible settings to something like -1, -2 or -3 (per 2Bdecided's quality level statement previously), with the settings corresponding to the most recent processed sample set which has ABX'd *very* well being those for "-2". With that in mind, I would suggest that only triangular dithering be used and also that force_dither_LSB=1, i.e. always dither, even if no bits removed.

I will also try some multi-generational processing - to try and determine which settings might be appropriate for the "-1" setting.

The settings for "-3" might be more difficult - we know that there *will* be noticable artifacts in *some* samples, but without a side-by-side ABX will they be particularly noticable - these settings will be the subject of quite a lot of discussion, I think.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #61
Which language ...

Apart from time (at the moment) that's a problem to me which keeps me from saying enthusiastically 'I'll do it'.
I'm skilled to VBA and VB programming but I definitely won't do it this way. Good for small to medium sized applications within my company but not for this purpose. Especially wouldn't result in a standalone exe file.
Next language I'm most used to in recent years is Euphoria but as this is so special and the code should be shared this is also not the way to go.
Next comes Pascal aka Delphi so this is the most probable language I'd use.
Best for shared code would be C, and because my Delphi experience has a bit come to age and as I did code in C a long time ago I will consider this too. But I am aware I have to obey (not entirely but also) to my own emotions, and I definitely prefer Pascal coding over C coding.
So I guess I'd do it in Delphi. Delphi performance is good, so this shouldn't be a problem, especially as there is the possibility to use Assembler which should be restricted to minor parts of the code of course if used at all.

... I have used Turbo Pascal  / x86 Assembler very successfully in the past and more recently have hacked about with Visual Basic (inside Excel, for work related engineering calculations). If it is a language to which I can get access then I will more than happily contribute to the coding exercise. ...

Wonderful. So let's go Pascal/Delphi.
I wonder a bit about why you transcoded the code to this MATLAB clone instead of directly going Pascal. You obviously have a deep understanding of the code involved.
If it's just about the understanding of reading/writing a wav file which is a black box in the MATLAB script I can help you out. I've done it in my wavPack quality checker but it's pretty simple anyway at least when restricting to the basic wav structure used on Windows based pcs (going more general can be done later).

I will try to "compartment"  and / or sub-function the code and add comments which make it more clear which element does what. At this time, it may be useful to remove the portion which loops through and processes a number of files - each file would be the subject of a single call to the executable.

Wonderful. That should make the logics clearer and invite other programmers to take part in coding.

Also, it may be sensible to reduce the possible settings to something like -1, -2 or -3 (per 2Bdecided's quality level statement previously), with the settings corresponding to the most recent processed sample set which has ABX'd *very* well being those for "-2". With that in mind, I would suggest that only triangular dithering be used and also that force_dither_LSB=1, i.e. always dither, even if no bits removed.

Great. This will clear things up even more and make things easier to understand.
Most consequent would be a restriction to exactly what's used right now (and keeping in mind and/or keeping track of in another place what can be changed to arrive at other options/settings).
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #62
Okay, latest version of the script, more heavily commented.

I will be installing Turbo Delphi tonight and expect to have absolutely *nothing* useful for a few days as I work out simply how to set about creating a win32 command line executable.......

I have implemented the "single-command-line-option" principle and have further developed the use of pre-calculated constants for calculating reference_threshold values.

Using the 4 settings contained in the script (-1=VHQ (estimate), -2=ABX'ed good quality settings, -3=estimate at "reduced quality" settings and -0 = 2Bdecided's original settings):

WAV=111.9MiB; FLAC=63.4MiB; -0=39.9MiB; -1=48.3MiB; -2=42.0MiB and -3=38.9MiB.
1411kbps; 800kbps; 503kbps; 609kbps; 530kbps; 491kbps.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #63
Thanks for your work.
It's a good idea to have the more technical parameters bundled as details of quality options.
Makes things a lot clearer.

I've done a first more detailed look at the script.

If I see it correctly, the script is not self-contained for transcoding to Delphi with respect to the conv, fft, and hanning function (apart from wavread/write), which have to be coded from other sources and/or own understanding. The hanning function should be easy to implement if I have taken that correctly from a short google search.

The script can be made easier if it would restrict to the case use_calculated_reference_threshold = 1 used with any compression_option except for option 4.
Though I'd like to know how to arrive at the reference_threshold by simulated noise it looks to me like this can be done in a special tool (MATLAB welcome) to arrive at the rt_b_b constants used with use_calculated_reference_threshold = 1.

Many MATLAB specials are getting clear when asking Google, but what do the curly braces mean in for instance
spreading_function{analysis_number}=ones(spreading_function_length,1)/spreading_function_length; ?
The right hand side is clear, it's just a vector of the spreading weights.
So spreading_function must be this vector. But this vector does not depend on analysis_number, and even if it did: what's the meaning of the curly braces?

Moreover: What's
peaks_over=length(find(inaudio==peak_max));
Shortly it sounds like the number of samples with a peak_max value. But as inaudio is composed of the vectors of samples for the left and right channel: is peaks_over an array giving the number of peak samples for the left and the right channel seperately, or is it a scalar counting the peak levels of both channels together? From usage it looks like it's a scalar.

Sorry for asking such stupid questions but I'm totally new to MATLAB code.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #64
1: the script is not self-contained for transcoding to Delphi with respect to the conv, fft, and hanning function (apart from wavread/write), which have to be coded from other sources and/or own understanding. The hanning function should be easy to implement if I have taken that correctly from a short google search.

Yes;

The script can be made easier if it would restrict to the case use_calculated_reference_threshold = 1 used with any compression_option except for option 4.

Absolutely - if those with clearer knowledge of the topic are happy with this shortcut;

Though I'd like to know how to arrive at the reference_threshold by simulated noise it looks to me like this can be done in a special tool (MATLAB welcome) to arrive at the rt_b_b constants used with use_calculated_reference_threshold = 1.

My only concern at the moment is that the calculated constants relate to specific low and high frequency limits, therefore high_frequency_bin / low_frequency_bin values. Scratch that, I have just started looking at 20Hz to Nyquist frequency and the constant *seems* to be very close to that calculated for 20Hz to 15848Hz (23/32*44100) on only 128 iterations........

Many MATLAB specials are getting clear when asking Google, but what do the curly braces mean in for instance: spreading_function{analysis_number}=ones(spreading_function_length,1)/spreading_function_length; ?


The curly brackets allow you to refer to an array (which need not be of constant dimensions) from another array (or at least that's the way that I have rationalised it out), more like a pointer.

Moreover: What's peaks_over=length(find(inaudio==peak_max));

find(inaudio==peak_max)); produces a list of indices of values which are equal to the peak_max value, looking at both channels (in the case of stereo). length gives the total number of instances, ie. the length of the array.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #65
... My only concern at the moment is that the calculated constants relate to specific low and high frequency limits, therefore high_frequency_bin / low_frequency_bin values. Scratch that, I have just started looking at 20Hz to Nyquist frequency and the constant *seems* to be very close to that calculated for 20Hz to 15848Hz (23/32*44100) on only 128 iterations........
Thanks for your answer.
What about different sampling frequencies like 32 kHz?
Is the script taking full care of that (for instance concerning the constants which make up for reference_threshold) or are there some holes to be filled?
(Of course I ask cause I'm a 32 Khz lover).
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #66
Thanks for your answer.
What about different sampling frequencies like 32 kHz?
Is the script taking full care of that (for instance concerning the constants which make up for reference_threshold) or are there some holes to be filled?
(Of course I ask cause I'm a 32 Khz lover).


The high_frequency_limit will influence the high_frequency_bin, i.e. 16kHz hfl > hfb=32 (16000/32000*64) on a fft_length of 64. So, the calculated reference_threshold *should* work for all input frequencies - I think.

I tried badvilbel at 32kHz using PPHS and it was nasty even before I processed it. However PPHS worked well at 29.4kHz (i.e.44.1kHz * 2/3). Not sure if my iPAQ plays 29.4kHz accurately.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

 

lossyWAV Development

Reply #67
...I tried badvilbel at 32kHz using PPHS and it was nasty even before I processed it. However PPHS worked well at 29.4kHz (i.e.44.1kHz * 2/3). Not sure if my iPAQ plays 29.4kHz accurately.

29.4 kHz is a bit  too low for real good quality (32 KHz is on the edge for me).

But your bad 32 kHz quality seems to be a PPHS problem. I use ssrc_hp and I'm very happy with it (after having found out to use the --twopass option to avoid clipping).
You can get it from http://shibatch.sourceforge.net/ if you like to try it.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #68
You can get it from http://shibatch.sourceforge.net/ if you like to try it.


Thanks for the pointer - I'll install it and try it out.....

Back to something that you said earlier - you use ssrc to resample to 32kHz, using a batch file, if I remember correctly? Could you please post a copy of the relevant batch file as I'm interested in how it achieves the resampling / FLAC & tag operations.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #69
There are lots of places where the code is written to allow lots of tweaking. If such tweaking is not going to happen, it could be simplified.

The reference thresholds are one example. If fixed, with flat dither (as now) they can be calculated without all that simulation and are independent of low and high frequency limits. They depend on noise amplitude and fft size only.

Please don't ask for a formula - it's too late. (I have young kids and a job - 9:30pm is late!). I think it's already in the unfinished unworking noise shaping version - I'll have a look some time this week, if it helps.

Cheers,
David.

lossyWAV Development

Reply #70
Thanks David - the apparently planar nature of the reference_threshold values for different fft_length values seems to be too good an opportunity to miss. I'm trying to determine constants for different dither amplitudes too.

I should be up and running with Delphi tomorrow - tonight was scratched because I received a 2nd hand RAID card (eBay ftw!) today, so I *had* to reconfigure my home server  .

Ditto with the kids and job  - addicted to playing with the script I guess...... Thanks again for the script to play with - it's been great fun trying out all the various dead-end methods of reducing even further - then discarding them in favour of what you already had in there.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #71
...Back to something that you said earlier - you use ssrc to resample to 32kHz, using a batch file, if I remember correctly? Could you please post a copy of the relevant batch file as I'm interested in how it achieves the resampling / FLAC & tag operations.

No problem, but probably it will be no help to you as you want to care about tagging. My bat file just joins ssrc_hp and wavPack:

C:\Programme\wavPack\ssrc_hp.exe --rate 32000 --twopass --dither 0 --bits 16 %2 tmp.wav
C:\Programme\wavPack\wavPack.exe %1 tmp.wav %3
del tmp.wav

%1: wavPack options
%2: input wav File (foobar's %s)
%3 output wavPack file (foobar's %d)

My personal tagging stategy is easy: I only use the title, artist and album tags, and they make up for the filename of my lossless ape files.
This filename tagging makes it easy through the encoding procedure with my bat file.
As a final step I use mp3tag to convert the filename 'tags' into real wavPack tags.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #72
Thanks for the information - I'll try to get my head around applying it to <optional SSRC>, lossyFLAC.exe, FLAC.exe (with tags) later.......

I've installed Turbo Delphi (36214 days of licence left!) and started with the basics - set parameters from the command line. I will post code when it is a little more advanced and also hunt for code to read / write .WAV files.

To make the process quicker and less memory hungry, I think that the variable fix_clipped method *may* have to bite the dust - we would have to read the (potentially *enormous*) .WAV file twice and we almost certainly couldn't read it all into RAM - again assuming unlimited filesize. So, the next step is to use 2Bdecided's 30/32 multiplier (for triangular_dither) to reduce the amplitude of the audio data block by block.

Trying to write in Delphi / Pascal after Matlab is painful - I must stop writing in Matlab.........
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #73
.. fix_clipped method ...

As for that David Bryant's remark comes to my mind saying that when preprocessing for wavPack clipping should not be avoided as wavPack benefits not only from a sequence of trailing zero bits but also of a sequence of leading 1 bits.
So I think it's a good idea to have a corresponding option on the command line.

May be it's good to think of these things in a pure logical way. This means having an optimize option for potentially various target formats, something like '-optimize <format-extention>', that is '-optimize wv' when it's up to wavPack. The optimization potential for various formats is restricted (may be restricted to just not doing clipping prevention for wavPack) but it keeps up the possibility for anything that will come up.

As you are about starting coding right now which I can't (and you're the expert anyway):
How can I help you with things that doesn't take me too much time at the moment? Shall I look for Delphi fft and conv implementations resp. correspending Pascal code?
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #74
As for that David Bryant's remark comes to my mind saying that when preprocessing for wavPack clipping should not be avoided as wavPack benefits not only from a sequence of trailing zero bits but also of a sequence of leading 1 bits.
So I think it's a good idea to have a corresponding option on the command line.

May be it's good to think of these things in a pure logical way. This means having an optimize option for potentially various target formats, something like '-optimize <format-extention>', that is '-optimize wv' when it's up to wavPack. The optimization potential for various formats is restricted (may be restricted to just not doing clipping prevention for wavPack) but it keeps up the possibility for anything that will come up.

As you are about starting coding right now which I can't (and you're the expert anyway):
How can I help you with things that doesn't take me too much time at the moment? Shall I look for Delphi fft and conv implementations resp. correspending Pascal code?
 

No problems with trying to make this WAV processor work with more than the initially targetted FLAC format - the more the merrier! Maybe "-f" for FLAC and "-w" for WavPack? I am a fan of simplistic command lines with single character switches (if possible - and this is not going to be *too* complex......).

I am just beginning to start coding - if you could find fft and conv implementations that would be excellent - I'll get going on the functional elements and introduce procedures / functions in great number to reduce the complexity of the main code.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)