Help - Search - Members - Calendar
Full Version: lossyWAV Development
Hydrogenaudio Forums > Hydrogenaudio Forum > Uploads
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
halb27
QUOTE(Axon @ Nov 27 2007, 08:53) *

I don't see a use for a quasi-lossy bitrate reduction of a lossless format, if the reduction is known to produce artifacts in reasonable configurations. If people are able to ABX this in a wide variety of different modes, that doesn't give me much confidence in using lossyWAV at all, no matter what the settings. If I can deal with a probabilistic chance of artifact audibility, why not stay with lossy?

a) We have a system of lossyWAV + a lossless codec which makes up for a lossy codec.
So you do lossy encoding when using lossyWAV, and the good and bad of the procedure must be measured against that of other lossy codecs. Which is a very subjective thing of course when comparing lossy codecs of very good quality.
b) AFAIK nobody ever experienced an artifact even with our lowest quality mode -3. Quality was extremely good from the very start. You're welcome to do some listening tests and report about it.
QUOTE(Axon @ Nov 27 2007, 08:53) *

This doesn't seem like the sort of algorithm that lends itself to tuning.

??? For a very long period we had great quality but at a bitrate of ~500 kbps on average. But we've investigated and optimized David Bryant's idea of doing the averaging of the FFT outcome according to the length of the critical bands, and we differentiate on doing this depending on FFT length. We've optimized the -skew parameter where a rather high -skew value does an extremely good job at differentiating between spots in the music which have to be handled defensively or not. We've introduced the -snr parameter which adds benefits for the differentiation work of -skew. We've found a solution to the theoretical clipping issue. We've improved the way the FFT analyses covers the lossyWAV blocks for security reasons.
So we ended up with an average bitratre of ~350 kbps for -3 with not the least quality issue known. -2 and -1 IMO provide for adequately varying internals to make it promising for the cautious minded of the various kind.
As a consequence IMO the only really useful option apart from the quality parameter is -nts. I personally however wouldn't mind if the advanced options are kept even in the final release if they are clearly marked as such (maybe hidden in the commandline help, but documented in the external documentation).
Nick.C
QUOTE(Axon @ Nov 27 2007, 06:53) *

Forgive me for asking a fundamental (and admittedly critical) question; I'm very late to this particular party. Before I start, I must say this idea (and all the work that has gone into it) is incredible, and I would not hesitate to use it once the kinks are ironed out. From the original post by 2BDecided:
QUOTE
This isn't about psychoacoustics. What you can or can't hear doesn't come into it. Instead, you perform a spectrum analysis of the signal, note what the lowest spectrum level is, and throw away everything below it. (If this seems a little harsh, you can throw in an offset to this calculation, e.g. -6dB to make it more careful, or +6dB to make it more aggressive!).
I don't see a use for a quasi-lossy bitrate reduction of a lossless format, if the reduction is known to produce artifacts in reasonable configurations. If people are able to ABX this in a wide variety of different modes, that doesn't give me much confidence in using lossyWAV at all, no matter what the settings. If I can deal with a probabilistic chance of artifact audibility, why not stay with lossy?

This doesn't seem like the sort of algorithm that lends itself to tuning. If the technique is independent of psychoacoustics, then the only advanced setting that ought to exist is -skew.

Is that too harsh? Perhaps I'm being overly critical on beta code?
The beta nature only really reflects the status of the code with respect to bug reports which will (probably) come in. This method / pre-processor was initially intended to allow the benefits of a lossy codec to be "wrapped" in a lossless codec. The method is David's, Halb27 and I have only implemented it in Delphi and added a few tweaks along the way.

At various points along the way, people have assisted with setting determination through personal ABX'ing of particularly problematic samples (Big thanks to Halb27, Shadowking, Wombat & Gurubooleez). Valued input has been made by 2Bdecided, Bryant, TBeck, Mitch 1 2, Josef Pohm, SebastianG, user, collector, Dynamic, GeSomeone, Robert, verbajim, [JAZ], BGonz808, M & Jesseq.

At the present time I don't think that the method is "known" to produce any artifacts with default settings (however if anyone can tell me differently, I would be very appreciative of the particular sample to try and iron it out).

Yes there have been very few individuals involved in ABX'ing / settings development, but I take it that that just means that this is a niche program only wanted by a few people.

From a purely personal perspective, I have found the drive to develop it through feedback from those who have made comments along the way and from a desire to use lossyFLAC on my iPAQ (GSPlayer v2.25 & GSPFlac.DLL)

In keeping with David's wishes, the only command line options in the final revision will be quality levels -1,-2 & -3 and the -nts parameter (unless, as Halb27 has indicated we leave the advanced options in the code but don't "advertise" them outside of the accompanying PDF / TXT file.

Why don't you give it a try? It's certainly robust enough to handle a Foobar2000 transcode of about 1500 files without falling over (the largest of which was circa 60 minutes).
Synthetic Soul
QUOTE(Nick.C @ Nov 27 2007, 10:00) *
Yes there have been very few individuals involved in ABX'ing / settings development, but I take it that that just means that this is a niche program only wanted by a few people.
Personally, I have been following this thread avidly from the start, but I lack the ears to be testing very high quality lossy audio, or the expertise to offer technical advise or cause debate.

This gives me an opportunity to thank you all though for the work that you have put in. I think this is an extremely exciting development.

I think Axon's question was well worth the ask: much of the discussion in this thread is - to complete laymans like myself - of a complex technical nature. Given that lossyWAV sits somewhere between high quality 'psychoacoustic' lossy and lossless quality, it is necessary to explain to the general masses what users can expect from this process.

Personally I have been considering a Wavpack lossy backup of my music for a while. It is possible that using lossyWAV as a pre-processor may be more suited to my needs (or whims).

Also, I cannot simply 'give it a try'. I am highly unlikely to find an issue. What I need to know is that people with excellent ears and technical knowledge can assure me that this process will create a near-perfect archive from which I can safely transcode to lossy for use on my DAP, or car stereo.

After re-reading my post I think I've just realised why we're called 'users'. smile.gif


Nick.C
QUOTE(Synthetic Soul @ Nov 27 2007, 11:02) *
QUOTE(Nick.C @ Nov 27 2007, 10:00) *
Yes there have been very few individuals involved in ABX'ing / settings development, but I take it that that just means that this is a niche program only wanted by a few people.
Personally, I have been following this thread avidly from the start, but I lack the ears to be testing very high quality lossy audio, or the expertise to offer technical advise or cause debate.

This gives me an opportunity to thank you all though for the work that you have put in. I think this is an extremely exciting development.

I think Axon's question was well worth the ask: much of the discussion in this thread is - to complete laymans like myself - of a complex technical nature. Given that lossyWAV sits somewhere between high quality 'psychoacoustic' lossy and lossless quality, it is necessary to explain to the general masses what users can expect from this process.

Personally I have been considering a Wavpack lossy backup of my music for a while. It is possible that using lossyWAV as a pre-processor may be more suited to my needs (or whims).

Also, I cannot simply 'give it a try'. I am highly unlikely to find an issue. What I need to know is that people with excellent ears and technical knowledge can assure me that this process will create a near-perfect archive from which I can safely transcode to lossy for use on my DAP, or car stereo.

After re-reading my post I think I've just realised why we're called 'users'. smile.gif
Thanks are always appreciated.

I totally agree that the question is valid and requires an answer. Technically, I am not really the person to answer it, just the programmer.

Also, I will be using my lossyFLAC collection in tandem with my FLAC collection rather than replacing the latter with the former, essentially, lossyFLAC is my lossy transcode.

Until more ears have validated the current quality level settings, we're not going to be in the position to reassure new users of the quality of the output.
halb27
QUOTE(Synthetic Soul @ Nov 27 2007, 13:02) *

Personally I have been considering a Wavpack lossy backup of my music for a while. It is possible that using lossyWAV as a pre-processor may be more suited to my needs (or whims).

Also, I cannot simply 'give it a try'. I am highly unlikely to find an issue. What I need to know is that people with excellent ears and technical knowledge can assure me that this process will create a near-perfect archive from which I can safely transcode to lossy for use on my DAP, or car stereo.

After re-reading my post I think I've just realised why we're called 'users'. smile.gif

The more I'm into audio compression the more I think it's upto personal decisions (and personal a priori preferences) what codec and setting to use. Objective findings always have a limited scope.
My personal key event was the 128 kbps listening test of Lame 3.97 where Lame came out more or less on par with codecs like Vorbis. I have no doubt this test was done with great care, but I personally would never use 3.97 at a bitrate of 128 kbps (due to the 'sandpaper' noise and similar problems). Luckily 3.98 has overcome these problems, and is still about to improve.

So it's true that more listening experience by especially well-respected ears is most welcome, but IMO it's not a sine qua non thing. Technical knowledge can't assure transparency anyway.

So in the end what IMO counts is that any experience tells that everything is fine so far (finally we do have public experience though we like to get more). And of course any potential user must like the idea of being close to lossless (from the technical view of the overall procedure which is not necessarily related to quality), and must not care about a bitrate of 350 kbps or higher. Otherwise he wouldn't use it.

As you have considered using wavPack lossy you don't care about extremely high bitrate, and you like the idea of being with a clean signalpath associated with going a near-lossless way, cause otherwise you would use very hiqh quality Vorbis or similar. Using lossyWAV you're more or less in the same situation as if you used wavPack lossy. We can expect wavPack lossy high mode at 400 kbps using dynamic noise shaping giving transparent results in nearly any situation and non-annoying results even on the hardest stuff, and all this without a real quality control so far. With lossyWAV the situation is the same (hopefully even better due to the existing quality control which can be said to have proved being effective).

The main problem with very high quality codecs is: while it's easy to prove the codec has an issue by giving a sample, it's impossible to prove a codec is transparent in a universal sense. So in the end the most adequate attitude IMO is once very high quality is assured at least in a basic sense: don't care as long as no counterexamples are given.
Synthetic Soul
Thank you both for your responses.

QUOTE(halb27 @ Nov 27 2007, 12:21) *
Technical knowledge can't assure transparency anyway.
If it's technical knowledge of a lossless operation then it can.

The techniques that are being used in lossyWAV are complete gibberish to me. In my limited understanding though, what was originally proposed was the removal of near-useless bits from the WAVE, to make mor efficient use of basic compression routines within the encoders (e.g.: FLAC's wasted_bits). You speak below of "a clean signalpath": this is really what I am discussing. If someone with a technical knowledge of the algorithms used can assure users that the resulting signal has merely had some negligable information removed with no further processing then that to me would suggest that there was less room for a bug in the algorithm, or that the decision making process was more simple and therefore less prone to erratic behaviour. I don't think I'm making myself clear. smile.gif

QUOTE(halb27 @ Nov 27 2007, 12:21) *
As you have considered using wavPack lossy you don't care about extremely high bitrate, and you like the idea of being with a clean signalpath associated with going a near-lossless way, cause otherwise you would use very hiqh quality Vorbis or similar. Using lossyWAV you're more or less in the same situation as if you used wavPack lossy. We can expect wavPack lossy high mode at 400 kbps using dynamic noise shaping giving transparent results in nearly any situation and non-annoying results even on the hardest stuff, and all this without a real quality control so far. With lossyWAV the situation is the same (hopefully even better due to the existing quality control which can be said to have proved being effective).
Exactly.

QUOTE(halb27 @ Nov 27 2007, 12:21) *
The main problem with very high quality codecs is: while it's easy to prove the codec has an issue by giving a sample, it's impossible to prove a codec is transparent in a universal sense. So in the end the most adequate attitude IMO is once very high quality is assured at least in a basic sense: don't care as long as no counterexamples are given.
Agreed. And, of course, such claims are will be taken with a pinch of salt until a lot of testing has been undertaken. And, of course, testing high quality encodes is not easy.
Nick.C
QUOTE(Synthetic Soul @ Nov 27 2007, 13:10) *
If someone with a technical knowledge of the algorithms used can assure users that the resulting signal has merely had some negligable information removed with no further processing then that to me would suggest that there was less room for a bug in the algorithm, or that the decision making process was more simple and therefore less prone to erratic behaviour. I don't think I'm making myself clear. smile.gif
As I have an implicit knowledge of the workings of the 3 main procedures involved in the process (having transcoded them from Matlab > Delphi > IA-32 Assembler) I will work on a process flow explanation.
Synthetic Soul
QUOTE(Nick.C @ Nov 27 2007, 13:18) *
As I have an implicit knowledge of the workings of the 3 main procedures involved in the process (having transcoded them from Matlab > Delphi > IA-32 Assembler) I will work on a process flow explanation.
I would be very interested to read a non-technical explanation of the processes involved; however I feel awful for increasing your workload.

Please only do so if you believe that it will be necessary for other users to make the decision also.

Thanks again.
halb27
QUOTE(Synthetic Soul @ Nov 27 2007, 15:10) *

... If someone with a technical knowledge of the algorithms used can assure users that the resulting signal has merely had some negligable information removed with no further processing then that to me would suggest that there was less room for a bug in the algorithm, or that the decision making process was more simple and therefore less prone to erratic behaviour. ...

Yes, that's what makes the procedure attractive to me too though I'm afraid we won't get a kind of security from the mere process itself.
I can try to describe the procedure from my understanding which isn't perfect at all:

As you write the basic idea is to form (now) 512 sample blocks and decide for each block how many of the least significant bits not to use (set to 0). Lossless codecs like FLAC can make use of the reduced number of bits per sample in these blocks, and in order to be effective the block size of the lossless codec should be identical to the lossyWAV block size (or an integer multiple of it in case the lossless codec works more efficient in an overall sense with longer blocks). FLAC works fine with a blocksize of 512.

The usual 16 bit accuracy of wave samples is necessary mainly to give a good accuracy to low volume spots in the music and allow for a good dynamic range. At moderate to low volume spots far less than 16 bits are used for signal representation (that's why lossless codecs yield a good compression ratio in these cases). At high volume spots not the entire 16 bits are needed usually. Roughly speaking a certain number of rather high value bits are needed for loud spots (while the lower value bits can be zero), and a certain number of low value bits are needed for quieter music (and the high value bits are zero). That's the main background of the method. We care about the louder spots and reduce accuracy of representation here.
Dropping a certain amount of least significantly bits means adding noise to the original. This added noise is not necessarily perceived as the kind of analog noise/hiss known from for instance tape recordings.

So the main thing is to decide on how many least significant bits to drop. From a bird's view the frequency spectrum of the 512 sample block is calculated and the frequency region with the lowest energy is searched. The idea is to preserve this energy, don't let it get drowned in the added noise, and this done by keeeping sample accuracy high enough by looking up this minimum energy level in a table that tells how many bits are possible to remove depending on energy level and frequency. The table was found a priori by examining white noise behavior with respect to our purposes.

The real process is a bit more complicated letting several FFTs do the frequency spectrum analysis according to what they're best at: short FFTs responding to quickly changing signals but with a very restricted resolution at low to medium frequencies, and long FFTs giving good frequency resolution but not responding very quickly. Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.
Moreover in order not to have to keep up high accuracy due to pure hazard, a certain averaging is done over the outcome of the FFT analyses. A lot of tuning has been done on this in order to achieve good quality and relatively low bitrate.
A huge sensitivity bias is given to the low to medium frequency range by using the -skew and -snr options. This is done in analogy to the fact that the usual transform codecs give priority to the accurate representation of low to medium frequencies. The improvement in quality control by using -skew is so strong that we have decided that a noise threshold of +6 is sufficient for -3 (in the a priori theory -nts should be 0).
For -2 we also default to the slightly positive -nts 2, and only with -1 we use a defensive -2. Other than that that the different quality levels differ for the main part in how they do the FFT analyses. With -3 we use 2 different FFT lengths for each block, -2 uses 3 different FFT lengths, and it's a total of 4 FFT lengths for -1. Moreover the averaging of the FFT results is done in an increasingly defensive way when going from -3 to -1.

After having decided about how many least significant bits to remove (set to 0) the samples of the lossyWAV block are rounded to the corresponding values. This rounding can lead to clipping, but we have found a solution to avoid it (by simply dropping less bits in the block so long until no clipping occurs).

Hope that helps.
Mitch 1 2
I've started a new wiki article here. The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.
halb27
QUOTE(Mitch 1 2 @ Nov 27 2007, 16:28) *

I've started a new wiki article here. The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.

Wonderful idea, good job.
Nick.C
QUOTE(halb27 @ Nov 27 2007, 14:15) *
.....Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.....
I don't think that that is the case, the original method overlaps the ends of the codec_block by half an fft_length and overlaps fft's by half an fft_length. The -overlap parameter overlaps the ends by half an fft_length and overlaps fft's by 5/8 of an fft_length.
Synthetic Soul
QUOTE(halb27 @ Nov 27 2007, 14:15) *
I can try to describe the procedure from my understanding which isn't perfect at all:
...
Hope that helps.
Yes. Thank you for your time. I'm slowly getting there. smile.gif

I'm not sure if you can answer this, and it may be better left for the documentation, but I am left wondering between the differences of -1, -2 and -3. Is -3 thought to be transparent in all known situations now? The obvious next question being: so why bother with -2 and -3?

I guess the same could be said with LAME -V0 and 320kbps CBR, but I'm expecting lossyWAV to have less of a grey area.

Personally, I'd like to hope that -2 (as default) is 'considered transparent until a problem sample can be found', -3 is overkill for the more paranoid amongst us, and -1 introduces a slight amount of risk. Apologies if the description of these presets has been discussed recenty elsewhere.
Nick.C
QUOTE(Synthetic Soul @ Nov 27 2007, 15:36) *

QUOTE(halb27 @ Nov 27 2007, 14:15) *
I can try to describe the procedure from my understanding which isn't perfect at all:
...
Hope that helps.
Yes. Thank you for your time. I'm slowly getting there. smile.gif

I'm not sure if you can answer this, and it may be better left for the documentation, but I am left wondering between the differences of -1, -2 and -3. Is -3 thought to be transparent in all known situations now? The obvious next question being: so why bother with -2 and -3?

I guess the same could be said with LAME -V0 and 320kbps CBR, but I'm expecting lossyWAV to have less of a grey area.

Personally, I'd like to hope that -2 (as default) is 'considered transparent until a problem sample can be found', -3 is overkill for the more paranoid amongst us, and -1 introduces a slight amount of risk. Apologies if the description of these presets has been discussed recenty elsewhere.
Exactly those last descriptions, but in reverse order: -1 = overkill; -2 = what you said; -3 = (may, although not yet proven) introduce a slight amount of risk.

The reason for -1 is that you may want to do other things with the output of lossyWAV; -2 is considered to be a very robust intermediate between -1 and -3; -3 is the "I want a lower bitrate and I want "acceptable" (rather than transparent) output" setting, which at the moment is better than its target.

My view of the process:

Read WAV header from input file;
Write WAV header to output file;

Create reference_threshold tables for each fft_length for each bits_to_remove (1 to 32) - not required as precalculated data is used to re-create the surface for each window / dither combination (yes, it changes with both..... sad.gif) - This calculates the mean fft output from the analysis of the difference between the random noise signal and its bit_removed compatriot;

Create threshold_indices from selected reference_threshold table (window / dither combo) - basically, determine how many bits_to_remove for a given minimum dB value;

Read WAV data in a codec_block_size chunk (all channels at once) and for each channel:

Carry out FFT analyses (3 for 1024 sample fft on 512 codec_block_size up to 33 for a 64 sample fft on 512 codec_block_size) on each channel of the codec_block, for each fft_analysis:

Calculate magnitudes of FFT output (from complex number);

Skew magnitudes (currently -36dB at 20Hz to 0dB at 3545Hz, following a 1-sin(angle) curve where angle is the proportion of 1 given by (log(this_bin_frequency)-log(min_bin_frequency))/(log(max_bin_frequency)-log(min_bin_frequency))) by the relevant amount;

Spread skewed magnitudes using the relevant spreading function (e.g. 23358-...... means average 2 bins in the first zone, 3 in the second and third zones, 5 in the fourth zone and 8 in the fifth zone), retaining the minimum value and the average value of the skewed results;

minimum_threshold=floor(min(minimum_skewed_result+nts,average_skewed_result-snr));

Look up Threshold_Index table for the relevant fft_length to determine bits to remove for that particular fft_analysis;

When all fft_analyses for a particular codec_block are complete, determine the minimum bits_to_remove value and use that to:

Remove_bits: For each sample in each channel of the codec_block bit_removed_sample:=round(sample/(2^bits_to_remove))*(2^bits_to_remove). If in the remove_bits process a sample falls outwith the upper or lower bound then decrease bits_to_remove and start the remove_bits process again.

Write processed codec_block and repeat;

Close files and exit.
Synthetic Soul
QUOTE(Nick.C @ Nov 27 2007, 15:42) *
Exactly those last descriptions, but in reverse order: -1 = overkill; -2 = what you said; -3 = (may, although not yet proven) introduce a slight amount of risk.
Excellent news. smile.gif I will have to spend some time reading your explanation as it, on a quick skim, still seems quite technical to me. Perhaps, as I try to comprehend myself, I can suggest a n00b translation to your technical explanation, that may help to produce the final documentation?

Anyway, the reason I came to post again: WOW!

I have tested lossyWAV previously, but - given the frequency of releases - have really been waiting for it to get to beta before testing fully.

I have just used it and FLAC on my TAK corpus, and am astounded by the savings, using the default settings.

CODE
File FLAC lossyWAV+FLAC
===========================
00 1054 376
01 728 366
02 765 390
03 1013 413
04 883 425
05 860 469
06 1084 455
07 981 419
08 1052 399
09 873 393
10 1026 511
11 853 367
12 834 422
13 1016 435
14 954 403
15 867 390
16 1068 397
17 861 376
18 787 442
19 909 394
20 1142 400
21 760 384
22 1022 410
23 1030 394
24 917 433
25 914 384
26 810 401
27 878 354
28 1040 449
29 912 442
30 895 419
31 913 411
32 1010 402
33 1018 397
34 831 429
35 939 410
36 1038 402
37 1084 439
38 825 381
39 999 413
40 1007 408
41 1037 505
42 1054 408
43 897 418
44 839 364
45 924 425
46 898 431
47 890 398
48 1014 414
49 999 412

Bloody good work gentlemen!

I am under the impression that I can also use TAK and WavPack already. I need to do some more reading to see, if anything, what I need to do to test these also.
halb27
QUOTE(Nick.C @ Nov 27 2007, 17:22) *

QUOTE(halb27 @ Nov 27 2007, 14:15) *
.....Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.....
I don't think that that is the case, the original method overlaps the ends of the codec_block by half an fft_length and overlaps fft's by half an fft_length. The -overlap parameter overlaps the ends by half an fft_length and overlaps fft's by 5/8 of an fft_length.

Oops, I thought the new overlapping was done throughout. So without the -overlap option FFT overlapping is done as before and it takes the -overlap option to do the new overlapping (we discussed something like 8 pages back)?
Nick.C
QUOTE(halb27 @ Nov 27 2007, 16:37) *
QUOTE(Nick.C @ Nov 27 2007, 17:22) *
QUOTE(halb27 @ Nov 27 2007, 14:15) *
.....Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.....
I don't think that that is the case, the original method overlaps the ends of the codec_block by half an fft_length and overlaps fft's by half an fft_length. The -overlap parameter overlaps the ends by half an fft_length and overlaps fft's by 5/8 of an fft_length.
Oops, I thought the new overlapping was done throughout. So without the -overlap option FFT overlapping is done as before and it takes the -overlap option to do the new overlapping (we discussed something like 8 pages back)?
Yes, exactly - the new 5/8th fft_length overlapping system doesn't have me totally "sold" to make it the default, but it is still a selectable option.

@Synthetic Soul - smile.gif Glad you like it sir - now, does it bear listening to? Oh, and which quality level was that?

@Axon - Thanks for stimulating a very interesting series of posts!
Mitch 1 2
QUOTE(Synthetic Soul @ Nov 28 2007, 02:20) *
Anyway, the reason I came to post again: WOW!

I have tested lossyWAV previously, but - given the frequency of releases - have really been waiting for it to get to beta before testing fully.

I have just used it and FLAC on my TAK corpus, and am astounded by the savings, using the default settings.
You ain't seen nothin' yet! You should try using lossyWAV -3 with FLAC -8 -b 512.
Synthetic Soul
QUOTE(Nick.C @ Nov 27 2007, 16:39) *
@Synthetic Soul - smile.gif Glad you like it sir - now, does it bear listening to? Oh, and which quality level was that?
I've been casually listening to the files while testing, and of course can hear no discernable difference. Default settings for both lossyWAV (-2) and FLAC (-5).

I will soon be posting results for WavPack and TAK defaults also.

QUOTE(Nick.C @ Nov 27 2007, 16:39) *
@Axon - Thanks for stimulating a very interesting series of posts!
Indeed. I've not felt it was the time to get involved before now, but I think it's now time for us more casual testers to show our interest. smile.gif
Synthetic Soul
OK, here's my results for FLAC, WavPack and TAK on default settings

CODE
Encoder          |  Command
===================================================================
FLAC 1.2.1       |  flac -b 512 <source>
WavPack 4.42a2   |  wavpack --merge-blocks --blocksize=512 <source>
TAK 1.0.2 Final  |  takc -e -fsl512 <source>

===============================================================
File  |    FLAC    Lossy  |    WavPack Lossy  |    TAK    Lossy
CODE
00 | 1054 376 | 1048 367 | 1034 360
01 | 728 366 | 728 374 | 708 359
02 | 765 390 | 766 395 | 742 378
03 | 1013 413 | 1013 421 | 997 406
04 | 883 425 | 880 421 | 867 413
05 | 860 469 | 858 491 | 798 445
06 | 1084 455 | 1077 458 | 1071 447
07 | 981 419 | 976 418 | 955 410
08 | 1052 399 | 1046 395 | 1040 391
09 | 873 393 | 871 401 | 823 372
10 | 1026 511 | 1029 524 | 1011 504
11 | 853 367 | 853 374 | 827 355
12 | 834 422 | 832 429 | 811 414
13 | 1016 435 | 1010 435 | 1000 425
14 | 954 403 | 948 402 | 927 396
15 | 867 390 | 864 397 | 841 380
16 | 1068 397 | 1066 400 | 1059 393
17 | 861 376 | 860 382 | 829 365
18 | 787 442 | 783 440 | 774 431
19 | 909 394 | 907 393 | 879 382
20 | 1142 400 | 1140 396 | 1130 394
21 | 760 384 | 767 390 | 740 370
22 | 1022 410 | 1014 408 | 1004 400
23 | 1030 394 | 1025 391 | 1022 385
24 | 917 433 | 913 444 | 888 423
25 | 914 384 | 910 381 | 884 371
26 | 810 401 | 811 404 | 784 383
27 | 878 354 | 871 366 | 855 346
28 | 1040 449 | 1033 459 | 1019 443
29 | 912 442 | 911 444 | 877 421
30 | 895 419 | 889 431 | 843 403
31 | 913 411 | 914 415 | 874 389
32 | 1010 402 | 1003 401 | 992 393
33 | 1018 397 | 1009 398 | 994 387
34 | 831 429 | 859 457 | 793 411
35 | 939 410 | 940 417 | 908 395
36 | 1038 402 | 1032 399 | 1027 393
37 | 1084 439 | 1088 453 | 1071 430
38 | 825 381 | 829 392 | 796 367
39 | 999 413 | 993 408 | 986 399
40 | 1007 408 | 999 405 | 990 398
41 | 1037 505 | 1029 516 | 1012 497
42 | 1054 408 | 1046 403 | 1035 395
43 | 897 418 | 901 426 | 882 408
44 | 839 364 | 830 377 | 798 354
45 | 924 425 | 920 425 | 909 414
46 | 898 431 | 899 435 | 881 426
47 | 890 398 | 882 393 | 875 384
48 | 1014 414 | 1006 412 | 997 401
49 | 999 412 | 992 409 | 984 400
==============================================================
Avg | 940 412 | 937 415 | 917 400
Josef Pohm
QUOTE(Mitch 1 2 @ Nov 27 2007, 15:28) *

I've started a new wiki article here. The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.

As your documentation reports which codecs support LossyWAV and which don't, the following is my experience about the missing ones.

MP4ALS and LPAC support LossyWAV very very well.

SHN should, but I didn't bother to actually check.

On the other side, unless I made some kind of mistake, in my tests APE, LA and ALAC didn't even show to be able to support wasted bits detection at all! OFR supports wasted bits but I can't see a way for it to use a 512 samples frame size (nor my OPINION is that OFR was designed to work with such a small frame size).
Synthetic Soul
QUOTE(Mitch 1 2 @ Nov 27 2007, 16:55) *
You ain't seen nothin' yet! You should try using lossyWAV -3 with FLAC -8 -b 512.
Using -8 does little for my corpus by the looks of it. I've only tested the first 25 files so far, but it only take the average bitrate from 933 to 930 for those files.

In fact, using lossyFLAC and encoding using -5 yields, on average, a file 43.90% the size of the standard FLAC, but with -8 it is merely 43.93% the size. wink.gif

Edit: Sorry, in my haste to test I have forgotten that I'm still using lossyWAV files processed using -2. Perhaps with -3 there is a more drastic improvement.
Nick.C
QUOTE(Josef Pohm @ Nov 27 2007, 17:47) *
QUOTE(Mitch 1 2 @ Nov 27 2007, 15:28) *
I've started a new wiki article here. The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.
As your documentation reports which codecs support LossyWAV and which don't, the following is my experience about the missing ones.

MP4ALS and LPAC support LossyWAV very very well.

SHN should, but I didn't bother to actually check.

On the other side, unless I made some kind of mistake, in my tests APE, LA and ALAC didn't even show to be able to support wasted bits detection at all! OFR supports wasted bits but I can't see a way for it to use a 512 samples frame size (nor my OPINION is that OFR was designed to work with such a small frame size).
As long as the target codec can work on a multiple of the lossyWAV codec_block_size, or use -cbs xxx to set the lossyWAV codec_block_size to the same as the target codec, or I get off my behind and implement a -ofr parameter to specify codec specific settings (as for WMALSL).

We may be early beta, but if anyone has any ideas as to improvements / additions / changes they might like to see then let me know you can pm me or e-mail me from here if you don't want to post publicly.

I am gratified to see that the code is quite robust as the error reports have dwindled.... <avalanche!>

Mitch 1 2 is doing a great job with the wiki article, I should get round to my bit of it.
Axon
Thanks for the excellent responses.

I think I may not have stated my concerns accurately or completely in my first post. I was certainly wrong to assume that artifacts have been found in recent -1 and -2 tests. But my beef isn't quite with the existence of artifacts, or that the bit reduction process is necessarily obscure (although halb's and Nick's posts did a lot to explain them). It's that the entire design process of the algorithm seems obscure, and clarifying it (and potentially formalising it) would go a long way to help explain to users exactly what this is good for.

2BDecided's original post seemed to imply that the transparency of bit reduction can be solely proved based on one psychoacoustic principle: spectral masking below a noise floor. This appears to be one of the more fundamental results of psychoacoustics, and fairly hard limits on audibility can be determined a priori to listening tests, based on the literature.

This is the biggest advantage lossyWAV may have compared to other lossy formats. Most lossy encoders exploit multiple psychoacoustic effects to reduce bitrate while maintaining transparency. If one effect is exploited too aggressively, out of several effects being exploited in parallel, transparency is lost and an artifact is audible. But lossyWAV, if it only relies upon spectral masking, has only one point of failure, and one that is very well understood. The quantization distortion should not induce artifacts under any other psychoacoustic effect. That's in incredibly strong selling point to convince people to use lossyWAV for many, many applications.

But the sheer number of tunings that have occured in the final product (regardless of whether or not they are eventually made available to the user) made me question how ironclad this advantage really is. It seems to me that the algorithm should be proven transparent a priori of any listening tests, based entirely on signal processing principles, and only very little psychoacoustic principles (based only on masking the quantization noise with the background noise). But instead, the settings seem like they are based primarily on listening tests. Those are a correct testing method for lossy codecs, but for an encoder this agonizingly close to being able to be formally verified? The tunings have the slight air of a sausage factory behind them. The end result is tasty, but the means to the end are rather unsavory.

Perhaps lossyWAV has simply evolved to use slightly more psychoacoustic phenomena than a simple theory of spectral masking. That appears to be the justification for -skew and the spreading functions. Certainly, a tight argument can be made for taking into account the width of the critical bands to adjust the sensitivity of low/high frequencies. But it still seems like the other options are pretty much pulled out of a hat.

What would be ideal is if each step of the algorithm is shown to follow logically from critical band masking theory, or from a small finite set of psychoacoustic effects, and to show that the algorithm is immune to artifacts from other effects.

Perhaps I'm talking out of line by asserting that an algorithm like this can be formally verified?

Nick.C
QUOTE(Axon @ Nov 27 2007, 20:33) *
Thanks for the excellent responses.

I think I may not have stated my concerns accurately or completely in my first post. I was certainly wrong to assume that artifacts have been found in recent -1 and -2 tests. But my beef isn't quite with the existence of artifacts, or that the bit reduction process is necessarily obscure (although halb's and Nick's posts did a lot to explain them). It's that the entire design process of the algorithm seems obscure, and clarifying it (and potentially formalising it) would go a long way to help explain to users exactly what this is good for.

2BDecided's original post seemed to imply that the transparency of bit reduction can be solely proved based on one psychoacoustic principle: spectral masking below a noise floor. This appears to be one of the more fundamental results of psychoacoustics, and fairly hard limits on audibility can be determined a priori to listening tests, based on the literature.

This is the biggest advantage lossyWAV may have compared to other lossy formats. Most lossy encoders exploit multiple psychoacoustic effects to reduce bitrate while maintaining transparency. If one effect is exploited too aggressively, out of several effects being exploited in parallel, transparency is lost and an artifact is audible. But lossyWAV, if it only relies upon spectral masking, has only one point of failure, and one that is very well understood. The quantization distortion should not induce artifacts under any other psychoacoustic effect. That's in incredibly strong selling point to convince people to use lossyWAV for many, many applications.

But the sheer number of tunings that have occured in the final product (regardless of whether or not they are eventually made available to the user) made me question how ironclad this advantage really is. It seems to me that the algorithm should be proven transparent a priori of any listening tests, based entirely on signal processing principles, and only very little psychoacoustic principles (based only on masking the quantization noise with the background noise). But instead, the settings seem like they are based primarily on listening tests. Those are a correct testing method for lossy codecs, but for an encoder this agonizingly close to being able to be formally verified? The tunings have the slight air of a sausage factory behind them. The end result is tasty, but the means to the end are rather unsavory.

Perhaps lossyWAV has simply evolved to use slightly more psychoacoustic phenomena than a simple theory of spectral masking. That appears to be the justification for -skew and the spreading functions. Certainly, a tight argument can be made for taking into account the width of the critical bands to adjust the sensitivity of low/high frequencies. But it still seems like the other options are pretty much pulled out of a hat.

What would be ideal is if each step of the algorithm is shown to follow logically from critical band masking theory, or from a small finite set of psychoacoustic effects, and to show that the algorithm is immune to artifacts from other effects.

Perhaps I'm talking out of line by asserting that an algorithm like this can be formally verified?
Certainly not talking out of line, but beyond my limited knowledge, as I said - I'm just the programmer. The -skew and -spread (and -snr I suppose) functions and settings have certainly been arrived at heuristically. I've worked up beta v0.5.2 (attached) Superseded... to allow the original concept settings to be implemented using a -0 parameter (as closely as possible due to slight changes in the conv / spread combined function). Use -0 -clipping to emulate the original method settings, -0 -fft 10101 -clipping to emulate the three analysis version. -nts is the only other parameter available to you under the original method.

As to number of tunings, -fft, -nts, -snr, -skew and -spread are the only tunings used in the 3 default quality settings, others such as -clipping, -dither, -overlap, -window, -allowable are all defaulted to off.

I must stress that looking at the file sizes of the output of vanilla -0, I am fairly certain that artifacts will show in Atem_lied at the very least.

***** -0 is not a permanent quality setting, merely a response to a request. *****
halb27
It's true some heuristics were introduced, especially spreading and skewing - spreading from the very start. Without these heuristics the method may have a better justification, but it comes at the price of a seriously increased bitrate.
With the advanced options everybody who wants to can get rid of the heuristics: -skew 0 -snr 0 -fft 10101 -spf 11111-11111-11111-11111-11111 -nts 0 for instance when using a 64, 256, and 1024 sample FFT.
I personally love the reduced bitrate given by spreading and skewing, and I feel secure enough with it according to experience.

I agree however that this gives rise to the question whether we should readjust the quality levels. Maybe -1 should go to Axon's pure method, and maybe -2 should be a mixture of current -2 and -1, for instance the FFT usage like that of -1 (maybe dropping the 128 sample FFT), but with an -nts value of 2.
I personally would agree with such a solution.

ADDED:
I just saw your new beta, Nick. So I see -snr should be negative to the limit for avoiding the skewing/snr heuristics. Spreading length should be 1 however IMO to avoid the spreading heuristics. The constant spreading of 4 was just 2Bdecided's spreading heuristics at his start up as far as I can see it. There's no reason IMO to use a blocksize of 1024. 2Bdecided just used a 1024 sample block size when he started things.
Of course not averaging FFT outcome at all is fine in a pure sense but is suspected to be a huge overkill especially in the high frequency range bringing bitrate up.
Nick.C
QUOTE(halb27 @ Nov 27 2007, 21:17) *
It's true some heuristics were introduced, especially spreading and skewing - spreading from the very start. Without these heuristics the method may have a better justification, but it comes at the price of a seriously increased bitrate.
With the advanced options everybody who wants to can get rid of the heuristics: -skew 0 -snr 0 -fft 10101 -spf 11111-11111-11111-11111-11111 -nts 0 for instance when using a 64, 256, and 1024 sample FFT.
I personally love the reduced bitrate given by spreading and skewing, and I feel secure enough with it according to experience.

I agree however that this gives rise to the question whether we should readjust the quality levels. Maybe -1 should go to Axon's pure method, and maybe -2 should be a mixture of current -2 and -1, for instance the FFT usage like that of -1 (maybe dropping the 128 sample FFT), but with an -nts value of 2.
I personally would agree with such a solution.

ADDED:
I just saw your new beta, Nick. So I see -snr should be negative to the limit for avoiding the skewing/snr heuristics. Spreading length should be 1 however IMO to avoid the spreading heuristics. The constant spreading of 4 was just 2Bdecided's spreading heuristics at his start up as far as I can see it. There's no reason IMO to use a blocksize of 1024. 2Bdecided just used a 1024 sample block size when he started things.
Of course not averaging FFT outcome at all is fine in a pure sense but is suspected to be a huge overkill especially in the high frequency range bringing bitrate up.
At present you can't use a negative -snr value, it's safely forced in the code.

As an aside, using -0 -spf 11111-11111-11111-11111-11111 -cbs 512 -fft 10001 yields: 56.47MB / 637.0kbps; changing to -fft 10101 yields: 57.60MB / 649.7kbps on my 53 sample set.

Bearing in mind that the source FLAC files amount to 69.36MB / 781kbps, that's not really a great saving.

[edit] And the 4 bin spreading function was there from the very beginning in David's original script. [/edit]
halb27
QUOTE(Nick.C @ Nov 27 2007, 23:52) *

As an aside, using -0 -spf 11111-11111-11111-11111-11111 -cbs 512 -fft 10001 yields: 56.47MB / 637.0kbps; changing to -fft 10101 yields: 57.60MB / 649.7kbps on my 53 sample set.

Bearing in mind that the source FLAC files amount to 69.36MB / 781kbps, that's not really a great saving.

The pure method isn't attractive to you, and it isn't attractive to me. But it's intrinsically safe as Axon said.
QUOTE(Nick.C @ Nov 27 2007, 23:52) *

[edit] And the 4 bin spreading function was there from the very beginning in David's original script. [/edit]
Yes, 2Bdecided used this spreading heuristics from the very start, and we've improved upon it - both with respect to quality and bitrate saving.

ADDED:
I just re-read Axon's post. I'm not sure any more if he dislikes spreading as he seems to accept the critical band heuristics being the most important basis for our current spreading parameters. Sure this means already to accept some heuristics.
Anyway the question remains: should we have the -1 configuration in such a way that configuration details have a very high degree of theoretical justification?
jensend
The primary advantage of lossless formats, it seems to me, is the future-proof factor (being able to benefit from it when a new and better encoder or a different format comes around rather than having that option made unattractive by the huge quality per bitrate losses involved in transcoding). So has anybody done listening tests to see how files processed by lossyWAV do when encoded into MP3/AAC/Vorbis/whatever?

Also, where is the preferred place to discuss lossyWAV? It seems like it would belong in the "other lossy formats" forum, but all the discussion of it seems to be restricted to this thread and the original thread in the FLAC forum.
BGonz808
I'm just wanting to see if my understanding of the preprocessing method is somewhat accurate:
Let's say that an amplitude of part of a 16-bit wave is +32295 (1111111000100111), LossyWAV will simplify (not "clip" , oops tongue.gif maybe I meant snip?) it so that the binary value contains many trailing zeros so that FLAC will compress those away as wasted_bits. The processed value of that amplitude will then become something like +32256 (1111111000000000) and save 9 bits. Is this the basic principle? Just wanting a little bit of clarification, thanks

808
Axon
QUOTE(halb27 @ Nov 27 2007, 16:01) *
I just re-read Axon's post. I'm not sure any more if he dislikes spreading as he seems to accept the critical band heuristics being the most important basis for our current spreading parameters. Sure this means already to accept some heuristics.
Anyway the question remains: should we have the -1 configuration in such a way that configuration details have a very high degree of theoretical justification?
Well, insofar as nothing in psychoacoustics is set in stone and there are going to be heuristics to evaluate very complicated phenomena, you can't escape them. I mean, the Bark scale seems like a hack in the first place, as every closed-form EBW equation probably is.

But clearly, spreading exists in any halfway-complete masking model. To leave such a tempting bone out there without chewing on it is madness. I'd just like to know how the predicted -spf numbers line up against what the tunings are, and have an option to use the theoretical numbers.

I would use a different option than -1 for a setting that matched theoretical predictions, because there's still a need for -1 to -3 in their current incarnations. Moreover, whatever setting exists must still be absolutely transparent. It seems like 2BDecided's original code had some artifact problems... which makes no sense if it was purely by the book.
halb27
QUOTE(BGonz808 @ Nov 28 2007, 05:10) *

I'm just wanting to see if my understanding of the preprocessing method is somewhat accurate:
Let's say that an amplitude of part of a 16-bit wave is +32295 (1111111000100111), LossyWAV will clip it so that the binary value contains many trailing zeros so that FLAC will compress those away as wasted_bits. The processed value of that amplitude will then become something like +32256 (1111111000000000) and save 9 bits. Is this the basic principle? Just wanting a little bit of clarification, thanks

808

Yes, that essentially is it. It's only a bit the other way around, and clipping isn't a correct description. LossyWAV decides on a per block analysis how many least significant bits are considered not essential for the 512 samples in the block. If it decides for instance that 9 (that's unusually many, let's also consider 3) least significant bits can be ignored then a sample of 1111111000100111 in the block is rounded to 1111111000000000 (resp. 1111111000101000).
Nick.C
QUOTE(jensend @ Nov 28 2007, 00:02) *
The primary advantage of lossless formats, it seems to me, is the future-proof factor (being able to benefit from it when a new and better encoder or a different format comes around rather than having that option made unattractive by the huge quality per bitrate losses involved in transcoding). So has anybody done listening tests to see how files processed by lossyWAV do when encoded into MP3/AAC/Vorbis/whatever?

Also, where is the preferred place to discuss lossyWAV? It seems like it would belong in the "other lossy formats" forum, but all the discussion of it seems to be restricted to this thread and the original thread in the FLAC forum.
In its purest sense, it's lossy, so lossy it is.

All the discussion and uploading lives in here as I am not a member of the developers group and cannot upload in any other forum.

@Halb27: Maybe I'm being a little over protective of the settings we have arrived at after quite a bit of work. Let's rename them as -DAP1, -DAP2 & -DAP3, and start again on the pure method versions. Thinking about it, I feel that -snr may be useful in the pure method.

Attached again (to bring it closer to the conversation) my spreading excel sheet.
Josef Pohm
QUOTE(Nick.C @ Nov 27 2007, 20:44) *
QUOTE(Josef Pohm @ Nov 27 2007, 17:47) *
...OFR supports wasted bits but I can't see a way for it to use a 512 samples frame size (nor my OPINION is that OFR was designed to work with such a small frame size).
As long as the target codec can work on a multiple of the lossyWAV codec_block_size, or use -cbs xxx to set the lossyWAV codec_block_size to the same as the target codec, or I get off my behind and implement a -ofr parameter to specify codec specific settings (as for WMALSL).

I think OFR support is a story on his own. From a certain point of view, the facts that it supports wasted bits detection and that it shares with LA the crown for the best compression ratios around were very promising. On the other hand I couldn't find any information about the frame sizes OFR uses or a possible undocumented switch to make it work with a frame size fixed by the user.

As a last chance, I got an OFR file (encoded at default setting), damaged one only sample with an hexadecimal editor and checked what happened.
As a result, I got exactly five seconds of silence in the middle of the music.

So I couldn't do any better than assuming that OFR is working with a frame size of 220.500 samples (at least on 44.1khz material at default setting), that means practically no chance to use it with lossyWAV.

That's a risky assumption, but that is the little I could do. Obviously, I can't be sure at all about such a conclusion, so, when somebody knows better that would be welcome.
Synthetic Soul
The only information I could find on the board:

QUOTE(pest @ Sep 24 2006, 18:47) *
The reason why Monkey uses large frames (up to 4s at 44.1khz) relies on it's architecture.
OptimFROG suffers from the same problem. The adaptive predictors have to catch up some data...

halb27
QUOTE(Axon @ Nov 28 2007, 09:42) *

Well, insofar as nothing in psychoacoustics is set in stone and there are going to be heuristics to evaluate very complicated phenomena, you can't escape them. I mean, the Bark scale seems like a hack in the first place, as every closed-form EBW equation probably is.

But clearly, spreading exists in any halfway-complete masking model. To leave such a tempting bone out there without chewing on it is madness. I'd just like to know how the predicted -spf numbers line up against what the tunings are, and have an option to use the theoretical numbers.

I would use a different option than -1 for a setting that matched theoretical predictions, because there's still a need for -1 to -3 in their current incarnations. Moreover, whatever setting exists must still be absolutely transparent. It seems like 2BDecided's original code had some artifact problems... which makes no sense if it was purely by the book.

I gladly see we're all pretty close to each other.
And especially I have done a rather bad job explaining the ingredients from the sausage factory. I'll try to do better:

a) the skew and snr options

These options I think have the worst theoretical justification.
But: the only thing they can do is to decrease the number of bits removed, to increase the sample accuracy, that is to potentially increase quality compared to not using them.
And it was found that they do a very good job in differentiating between 'good' spots where many bits can be ignored and 'bad' spots where we have to keep nearly all the bits.
As far as I was busy with that I did not find good skew/snr values by listening tests. Instead I have a set of regular music where many bits on average are expected to be removable, and a set of problem samples where it is known that only few bits can be safely removed. I've looked at the resulting bitrate of these sample classes for deciding on skew and snr. I've done only few listening tests for the skew/snr value finding due to the exclusively defensive nature of using these parameters.

A certain danger drops in with our decision to use a positive -nts value for -2 and -3 which is done because we have an excellent good/bad spot indicator by using skew/snr and because the skew value is something like nts applied to the low to medium frequency range so that we can safely lower the nts demand with respect to this. However this adds a certain risk for the higher frequencies.
We do not do this with -1 which is the option best suited to perfectionists.
A -nts value of 2 for quality level -2 is so close to 0 that I think the practical advantages of skewing with respect to good/bad spot differentiation outperform the small danger introduced. Sure we can discuss forever whether the default -nts value should be +2 or +2.5 or +1.5 or maybe 0. In practice it's not very important. Moreover -nts is our main option apart from the quality parameter and everybody can set it easily to 0 with -2 or -3.
In the end the -nts values for -2 and -1 match very much IMO what we have in mind for these quality levels.
BTW at least I don't have this very strong demand for 'secure' transparency with -2 and -3. I do with -1, but with -2 (more so with -3) I accept a very slight risk that the result is not transparent on rare occasion in case I can expect to get only a negligible problem. So in the end it's the typical lossy approach with -2 and -3, but with extremely high demands for -2, and very high demands for -3.

b) spreading

I'm glad you have a positve aspect towards spreading. When allowing for spreading I think David Bryant's idea of taking care of the width of the critical bands is a good starting point for deciding on the spreading details. As far as I was busy with the spreading details my target was to have several FFT bins in every critical band. With this in mind what at first glance looks a bit dangerous with our -spf values, the rather long spreading length of the highest frequency zone with the 1024 sample FFT in fact is a small danger. The problems come rather from the other end, as frequency resolution is pretty low there. But as our spreading length is short there with the long FFTs I think this is adequate. Moreover we do several FFTs, and especially with -1 this should give a very secure result. Last not least we have skewing to bring a big additional safety margin to low frequencies.
As far as I was busy with the critical bands my primary considerations ws about number of FFT bins in the critical bands, and I backed these things up again by checking with my regular and problematic sample set looking at the resulting bitrate. Bitrate should be high with the difficult tracks, and rather low with the regular tracks. The final result was that we got a significantly improved security margin for the difficult tracks (compared to what we had before), and a bitrate decrease with the regular tracks. I also did listening tests, but to a minor degree.
Of course we can discuss endlessly the details of spreading as well as other details of how to do the FFT anylasis and do simplifications with the result. For instance I personally would prefer a different FFT covering of the blocks, and I would prefer a 512 sample FFT instead of the 256 sample FFT with -2 in favor of giving additional security to the low end. But after all it's not vital to me (beyond myself it's an open question whether that's useful at all), and IMO we have adequate considerations for the various aspects with our current settings.

So I think your aspects which originate from the theoretical basis (ensuring quality a priori without listening tests) are covered well by using -1. This is your quality level, as what we have in mind with -2 and -3 isn't in full congruence with your targets.
Sure any practical suggestion for improving things is welcome.
halb27
QUOTE(Nick.C @ Nov 28 2007, 10:39) *

... @Halb27: Maybe I'm being a little over protective of the settings we have arrived at after quite a bit of work. Let's rename them as -DAP1, -DAP2 & -DAP3, and start again on the pure method versions. Thinking about it, I feel that -snr may be useful in the pure method.

Attached again (to bring it closer to the conversation) my spreading excel sheet.

Sorry it was me who brought in some confusion wanting to have -1 going the extremely pure way.
I've thought it over at night (see my last post) - and come to the conclusion that with our current -1 we're going the pure way. Stuff from the sausage factory like skewing doesn't hurt quality a bit - the contrary is true. We do have to make some practical considerations for the way we do the FFT analyses, but here too I think this is in agreement with the pure way though details are always disputable.

So I think we can leave -1 as is. Sure suggestions for improvements are always welcome.

-3 is typically used with DAPs as you said, and -2 is a compromise for -3 and -1, kind of a -1 for the more practically minded.

BTW your spreading excel sheet was of high value for me on deciding about the spreading details - as far as it was me who worked out the details.

A suggestion:
It looks like it will be hard to disqualify -3 qualitywise (which is a good thing of course). Maybe for testing we can do it the other way around, start with an even less demanding quality setting in such a way that we do get into trouble, and increase the quality demands until quality is fine with the problems found. This way we can get a feeling of how big the security margin of -3 is. It is expected to be small, but who knows?
Essentially this means that we should be able to set -nts to a value higher than +6.
Nick.C
QUOTE(halb27 @ Nov 28 2007, 11:42) *

QUOTE(Nick.C @ Nov 28 2007, 10:39) *

... @Halb27: Maybe I'm being a little over protective of the settings we have arrived at after quite a bit of work. Let's rename them as -DAP1, -DAP2 & -DAP3, and start again on the pure method versions. Thinking about it, I feel that -snr may be useful in the pure method.

Attached again (to bring it closer to the conversation) my spreading excel sheet.

Sorry it was me who brought in some confusion wanting to have -1 going the extremely pure way.
I've thought it over at night (see my last post) - and come to the conclusion that with our current -1 we're going the pure way. Stuff from the sausage factory like skewing doesn't hurt quality a bit - the contrary is true. We do have to make some practical considerations for the way we do the FFT analyses, but here too I think this is in agreement with the pure way though details are always disputable.

So I think we can leave -1 as is. Sure suggestions for improvements are always welcome.

-3 is typically used with DAPs as you said, and -2 is a compromise for -3 and -1, kind of a -1 for the more practically minded.

BTW your spreading excel sheet was of high value for me on deciding about the spreading details - as far as it was me who worked out the details.

A suggestion:
It looks like it will be hard to disqualify -3 qualitywise (which is a good thing of course). Maybe for testing we can do it the other way around, start with an even less demanding quality setting in such a way that we do get into trouble, and increase the quality demands until quality is fine with the problems found. This way we can get a feeling of how big the security margin of -3 is. It is expected to be small, but who knows?
Essentially this means that we should be able to set -nts to a value higher than +6.
It's easier than that: use -snr <large negative number> with v0.5.3.....

Using -3 -snr -215 on my 53 sample set yields: 32.16MB; 362.8kbps.......

lossyWAV beta v0.5.3 attached: Superseded.

-snr parameter now valid in range -215<=n<=48.
-window parameter fully removed.

I intend to fully remove the following parameters unless there is objection:

-dither;
-clipping;
-overlap.
Mitch 1 2
I don't object, and I also don't see the use in keeping -allowable.
halb27
QUOTE(Nick.C @ Nov 28 2007, 16:52) *

... It's easier than that: use -snr <large negative number> with v0.5.3.....

I have no idea what a negative -snr value is doing. I had thought bringing in snr means giving the relevant min the chance to go lower than when not using snr. From this understanding any snr value has only the chance to make things more defensive compared to not using snr. Sure as we do use a snr value of 21 we will get lower bitrate when turning the -snr value down. However I wonder what makes your problem samples set go so low in bitrate. Guess there's a specific meaning of a negative snr value.

Anyway I'd prefer to use a higher -nts value of up to say 40 instead. It would give us the chance to keep the usual skew/snr combination and go extreme with noise threshold for learning about lossyWAV behavior.
Nick.C
QUOTE(halb27 @ Nov 28 2007, 15:15) *
QUOTE(Nick.C @ Nov 28 2007, 16:52) *
... It's easier than that: use -snr <large negative number> with v0.5.3.....
I have no idea what a negative -snr value is doing. I had thought bringing in snr means giving the relevant min the chance to go lower than when not using snr. From this understanding any snr value has only the chance to make things more defensive compared to not using snr. Sure as we do use a snr value of 21 we will get lower bitrate when turning the -snr value down. However I wonder what makes your problem samples set go so low in bitrate. Guess there's a specific meaning of a negative snr value.

Anyway I'd prefer to use a higher -nts value of up to say 40 instead. It would give us the chance to keep the usual skew/snr combination and go extreme with noise threshold for learning about lossyWAV behavior.
I am beginning to feel that -snr is a bit of packing in the sausage. When I tried -3 -snr -215 (modified average = average - snr_value, i.e. average +215 in this case, effectively removing it from consideration) I got palatable results.

[edit] I would go further than saying palatable: 32.17MB / 362.8kbps on my 53 sample set. I've started a speculative 1496 track transcode - so far: 256 tracks, 2.20GB / 302kbps vs 6.43GB / 881kbps..... [/edit]

-nts amended as requested.

Now you can really cause awful results.......

Try: -3 -nts 48 -skew 0 -snr -215

This gave 9.504MB / 107.2kbps. ohmy.gif

lossyWAV beta v0.5.4 attached. Superseded.
CODE
lossyWAV beta v0.5.4 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-0            emulate script  [2xFFT] (-cbs 1024 -nts  0.0 -skew  0 -snr -215
              -spf 44444-44444-44444-44444-44444 -fft 10001)
-1            extreme quality [4xFFT] (-cbs  512 -nts -2.0 -skew 36 -snr   21
              -spf 22224-22225-11235-11246-12358 -fft 11011)
-2            default quality [3xFFT] (-cbs  512 -nts +1.5 -skew 36 -snr   21
              -spf 22224-22235-22346-12347-12358 -fft 10101)
-3            compact quality [2xFFT] (-cbs  512 -nts +6.0 -skew 36 -snr   21
              -spf 22235-22236-22347-22358-2246C -fft 10001)

-o <folder>   destination folder for the output file
-nts <n>      set noise_threshold_shift to n dB (-48.0dB<=n<=+48.0dB)
              (-ve values reduce bits to remove, +ve values increase)
-force        forcibly over-write output file if it exists; default=off

Codec Options:

-wmalsl       optimise internal settings for WMA Lossless codec; default=off

Advanced / System Options:

-snr <n>      set minimum average signal to added noise ratio to n dB;
              (-215.0dB<=n<=48.0dB) Increasing value reduces bits to remove.
-skew <n>     skew fft analysis results by n dB (0.0db<=n<=48.0db) in the
              frequency range 20Hz to 3.45kHz
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)
-fft <5xbin>  select fft lengths to use in analysis, using binary switching,
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-overlap      enable conservative fft overlap method; default=off

-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 22235-22236-22347-22358-2246C (Characters must be one of
              1 to 9 and A to F (zero excluded).
-allowable    select allowable number of clipping samples per codec block
              before iterative clipping reduction; (0<=n<=64, default=0).

-clipping     disable clipping prevention by iteration; default=off
-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

David Robinson for the method itself and motivation to implement it in Delphi.
Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
jesseg
QUOTE
lFLCDrop Change Log:
v1.2.0.2
-added support for "-0 (emulate script)" option

lFLC.bat Change Log:
v1.0.0.2
- improved temp file handling
- fixed quality preset bug
fixed a pretty massive FUBAR on my part, the variable name for passing in the quality preset wasn't right, so it was defaulting to -2 always. that's been fixed. that's what i get for initially working on it 9 hours straight without breaks.
laugh.gif

[edit] removed, newer version on later post [/edit]
halb27
I just tried insane -nts settings on my problem set to get a feeling about the security margin we have when using -3:

a) -3 -nts 30 => 319/390 kbps for my regular/problem sample set

I was astonished about the quality of Atem-lied which I tried first. badvilbel was next and also has a remarkable quality. bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A however have big errors (no abxing required), and the errors of furious and triangle are also easy to perceive though quality isn't really bad.
The big errors of bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are pretty much of the kind I know from wavPack lossy.
Everybody who likes to hear the potential problems lossyWav has when accuracy demand is too small is invited to do a listening test with this setting. The problems of the bad samples mentioned are easy to hear.

b) -3 -nts 20 => 320/405 kbps for my regular/problem sample set

Results were a lot better. Only bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are not transparent, with bibilolo and S37_OTHERS_MartenotWaves_A being already roughly acceptable. Just keys_1644ds is still missing quality very seriously, though it too has improved in a remarkable way.

c) -3 -nts 16 => 321/419 kbps for my regular/problem sample set

Only key_1644ds and S37_OTHERS_MartenotWaves_A are not transparent to me. S37_OTHERS_MartenotWaves_A is already very hard to abx for me, and even for key_1644ds it's not easy.

d) -3 -nts 12 => 326/438 kbps for my regular/problem sample set

Only keys is not totally transparent to me - and I was able to abx keys only with a pretty bad 7/10 result.

e) -3 -nts 9 => 333/455 kbps for my regular/problem sample set

Now also keys_1644ds is transparent to me.


Looking at these results to me even -3 (-nts 6 defaulted) seems to have a remarkable security margin.
The default -3 setting yields 345/474 kbps for my regular/problem sample set.
Nick.C
QUOTE(halb27 @ Nov 28 2007, 21:57) *
I just tried insane -nts settings on my problem set to get a feeling about the security margin we have when using -3:

a) -3 -nts 30 => 319/390 kbps for my regular/problem sample set

I was astonished about the quality of Atem-lied which I tried first. badvilbel was next and also has a remarkable quality. bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A however have big errors (no abxing required), and the errors of furious and triangle are also easy to perceive though quality isn't really bad.
The big errors of bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are pretty much of the kind I know from wavPack lossy.
Everybody who likes to hear the potential problems lossyWav has when accuracy demand is too small is invited to do a listening test with this setting. The problems of the bad samples mentioned are easy to hear.

b) -3 -nts 20 => 320/405 kbps for my regular/problem sample set

Results were a lot better. Only bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are not transparent, with bibilolo and S37_OTHERS_MartenotWaves_A being already roughly acceptable. Just keys_1644ds is still missing quality very seriously, though it too has improved in a remarkable way.

c) -3 -nts 16 => 321/419 kbps for my regular/problem sample set

Only key_1644ds and S37_OTHERS_MartenotWaves_A are not transparent to me. S37_OTHERS_MartenotWaves_A is already very hard to abx for me, and even for key_1644ds it's not easy.

d) -3 -nts 12 => 326/438 kbps for my regular/problem sample set

Only keys is not totally transparent to me - and I was able to abx keys only with a pretty bad 7/10 result.

e) -3 -nts 9 => 333/455 kbps for my regular/problem sample set

Now also keys_1644ds is transparent to me.


Looking at these results to me even -3 (-nts 6 defaulted) seems to have a remarkable security margin.
The default -3 setting yields 345/474 kbps for my regular/problem sample set.
That's a lot of listening! It's reassuring that the previously determined -3 settings have been confirmed by your test.

I went down a slightly different path with -snr <large negative number> to effectively remove it from the calculation of the minimum value for each FFT result. I think that some of your large -nts values would sound *very* different without the -snr safety net. That's not to say that -snr is necessarily bad, but I think it bloats the bitrate a bit.

TBeck
QUOTE(Synthetic Soul @ Nov 27 2007, 12:02) *

This gives me an opportunity to thank you all though for the work that you have put in. I think this is an extremely exciting development.

I second this!

Thank you very much!

If lossyWAV get's enough users, i will evaluate if some modifications of TAK can significantly improve the compression of it's output. In this context "significantly" means at least by about 20 kbps. I have some ideas, but you can not be sure until you tried it.

Thank you again!

Thomas
Nick.C
QUOTE(TBeck @ Nov 28 2007, 22:23) *
QUOTE(Synthetic Soul @ Nov 27 2007, 12:02) *
This gives me an opportunity to thank you all though for the work that you have put in. I think this is an extremely exciting development.
I second this!

Thank you very much!

If lossyWAV get's enough users, i will evaluate if some modifications of TAK can significantly improve the compression of it's output. In this context "significantly" means at least by about 20 kbps. I have some ideas, but you can not be sure until you tried it.

Thank you again!

Thomas
*Another* 20kbps saving! On top of everything else, that would probably push the average output of -3 down to circa 320kbps using TaK.......

Congratulations on the piping by the way, I may have to beseech aid in implementing it in lossyWAV - though how you pipe in and pipe out of lossyWAV then ensure that the output pipe goes to the lossless encoder I haven't the faintest clue........
GeSomeone
QUOTE(Nick.C @ Nov 28 2007, 17:21) *

-nts amended as requested.

Now you can really cause awful results...

Attached File lossyWAV_beta_v0.5.4.zip

Just a side note again .. when you're going to experiment further (in the code) with settings it would be best to call those (in between) versions Alpha again. When you arrive at something you're confident about you could release another beta. (I'm not saying something isn't right, but maybe another alpha round is needed?)
Nick.C
QUOTE(GeSomeone @ Nov 28 2007, 23:18) *
QUOTE(Nick.C @ Nov 28 2007, 17:21) *
-nts amended as requested.

Now you can really cause awful results...

Attached File lossyWAV_beta_v0.5.4.zip
Just a side note again .. when you're going to experiment further (in the code) with settings it would be best to call those (in between) versions Alpha again. When you arrive at something you're confident about you could release another beta. (I'm not saying something isn't right, but maybe another alpha round is needed?)
Well, all I did was change an input range to a particular parameter, I did not substantially change the code. I see what you mean though.

[edit] On reflection, no settings per se have been changed (other than the inclusion of the ability to revert to a close approximation of David's original script), only the ability to change settings has been augmented.

The more I listen to -3 -snr -215, the more I like it. I still think that there is a place for -snr, however I feel that it needs better explanation. I'll work up a spreadsheet which will graphically demonstrate the -skew, -nts and -snr parameters effects on a suitably small fft_length.

The bottom line though is that there is only one process which actually modifies the audio data, namely the bits_to_remove procedure - no heuristics in that process at all. The number of bits_to_remove may depend on a heuristically generated minimum_value, but the added noise caused by the subsequent bit reduction has already been calculated - therefore the link between minimum_value and bits_to_remove. [/edit]
halb27
QUOTE(Nick.C @ Nov 29 2007, 10:09) *


... The more I listen to -3 -snr -215, the more I like it. ...

From the bitrate you gave for your sample set which consists of problem samples to a high degree it's hard to imagine that keys_1644ds, bibilolo, or Martenotwaves are fine. I will try it this weekend. Anyway I'd like to know what a negative -snr value is doing.
Nick.C
QUOTE(halb27 @ Nov 30 2007, 09:03) *
QUOTE(Nick.C @ Nov 29 2007, 10:09) *
... The more I listen to -3 -snr -215, the more I like it. ...
From the bitrate you gave for your sample set which consists of problem samples to a high degree it's hard to imagine that keys_1644ds, bibilolo, or Martenotwaves are fine. I will try it this weekend. Anyway I'd like to know what a negative -snr value is doing.
Attached spreadsheet shows how -skew, -snr and -nts interact on a 64 sample FFT (random numbers used for FFT output, F9 to recalculate for another iteration).

As an aside:

Bibilolo -3: 1487438 bytes; -3 -snr -215: 1470329 bytes;
Keys_1644ds -3: 105088 bytes; -3 -snr -215 : 105088 bytes;
S37_OTHERS_MartenotWaves_A -3: 711469 bytes; -3 -snr -215: 711469 bytes.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.