Help - Search - Members - Calendar
Full Version: lossyWAV Development
Hydrogenaudio Forums > Hydrogenaudio Forum > Uploads
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
2Bdecided
QUOTE(Nick.C @ Apr 1 2008, 20:09) *
Re: Transcodability - lossyWAV does not allow re-processing of an already processed file. I would prefer to keep it that way
That's a good thing (except for testing) - I meant feeding the output of lossyWAV to an mp3 encoder. I think that's what Dynamic was talking about.

Cheers,
David.
Nick.C
QUOTE(2Bdecided @ Apr 2 2008, 09:44) *
QUOTE(Nick.C @ Apr 1 2008, 20:09) *
Re: Transcodability - lossyWAV does not allow re-processing of an already processed file. I would prefer to keep it that way
That's a good thing (except for testing) - I meant feeding the output of lossyWAV to an mp3 encoder. I think that's what Dynamic was talking about.

Cheers,
David.
Yes, I see what he meant now. However, it reminds me of a quote on anythingbutipod where someone transcoded from lossyWAV >> OGG and the filesize increased by about 1MB compared to lossless >> OGG (as a %age I have no idea....)

As an aside, -7 -autoshape -snr 9 -nts 36 is nearly palatable on my iPAQ and comes in at 321.1kbps for my 53 problem sample set.

[edit] Iterating, I found that -7 -autoshape -snr 14.35 -nts 19.95 is very close in bitrate to vanilla -7. [/edit]
user
Once there was a good rule,
Thou should not transcode wink.gif

lossy-wav is made already for portable usage, compatibility with portable devices flac supporting, but to shrink the size of true Lossless music. No reason to go 2nd time lossy, ie. transcode from lossy->lossy. if somebody wants mp3/mpc/ogg/aac as small sized thingie for portable usage, then go directly from Lossless source to small-lossy (mp3/mpc/ogg/aac).
There are already enough programs and scripts to encode in 1 single step to various formats/sizes/bitrates, like mareo.exe.
2Bdecided
I fundamentally disagree with you user.

Of course you should not aim to transcode, but sometimes it is inevitable, and sometimes it is not worth worrying about.

Of course we should all keep our lossless files and use them everywhere, but sometimes we can't, and sometimes it is not worth worrying about (for some people).


For example...

Modern loud CDs regularly hit 1000kbps+ with lossless codecs.

What's happening is that the mathematical 96dB range is being perfectly preserved, even though the actual dynamic range is about 6dB. wink.gif

I believe it is pointless keeping the lossless version. It's a "perfect" copy of a mediocre original.

Thankfully, lossyWAV, used less aggressively, allows you to make a near-lossless version.

If I can have something which is half the bitrate (or less), sounds identical, and transcodes identically, then I have no need to keep the lossless version.

To me, this is an argument for dumping the lossless original. As it says on the Monkey's Audio website: lossless is for anal retentives. I'm not one, so if all rational reasons for lossless are removed, I won't use it. YMMV.


Why not create mp3s or whatever from the lossless original?

1. "Why not?" Well, Why? Really, if there's no difference, why? It's OCD-like behaviour.

2. I might not know the lossy format I will need in the future. Shall I create mp3, ogg, AAC, HE-AAC etc?

3. If I'm a radio station, it's the broadcast (FM, mp2, mp3, WMA, whatever) that's the "transcode" - I can hardly avoid that or make it at the same time as I rip the CD.


So, for me, the "transcodability" of the less aggressive lossyWAV modes is very important.

I could give you more examples... "sensible" preservation of 24/96 files; "sensible" preservation of GBs of "working files" from audio sessions which will probably never be used again, but won't be any use at all if converted to mp3; etc etc etc.

If lossyWAV, in its more gentle modes, is "safe".

Cheers,
David.
halb27
I just finished my abx test.

First I followed your suggestion and used -7 -autoshape -nts 20 -snr 14.
This setting yields 309 kbps for my regular set (quite a bit higher than plain -7) and 355 kbps for my problem set (a bit low for problem samples).
Hiss is pretty audible with bruhns (for instance sec. 9.3-10.2), but it's audible also with bibilolo (sec. 4.3-5.5) and badvilbel (sec. 5.9-7.2). There's also a slight inaccuracy with Atemlied (sec. 9.3-10.1) which is best audible at moderately high listening volume.
I didn't test a lot more samples then those mentioned because to me this is not adequate quality for an average of 309 kbps. The hiss (and the inaccuracy) isn't really annoying though, and I listened to some regular music (carefully but without abxing), and was content with it. Anyway looking at codecs like vorbis (I just tested the new Aoyumi version, and quality is great even at -q4 [~130 kbps] I personally don't like my abx result at a bitrate of ~310 kbps.

I redid the test using plain -7 autoshape which yields 325/384 kbps for my regular/problem test set.
bruhns 9.3-10.2 is better now to me, though still quite audible. Same goes for bibilolo.
I didn't test more samples as I personally am not content with this as well.

-6 -autoshape yields 337/404 kbps.
bibilolo is ok now, but when abxing bruhns I found added hiss already at sec. 2.3-4.4.

I skipped -5 and went directly to -4 -autoshape as from my last test I know plain -4 is transparent for me with the samples tested. -4 -autoshape yields 369/450 kbps. With this average bitrate for the problem set chances are good that everything is alright now.
bruhns at sec. 9.3-10.2 however still isn't perfect though quite acceptable.

Summing it up to me this isn't a good result for using -autoshape though it's a matter of taste whether or not one is willing to accept added hiss which seems to be the major issue when using -autoshape with low bitrate settings.

Maybe a variant of autoshape is more successful: make the frequency up shift of noise also depend on the degree to which there's energy in the input signal's 2 highest frequency zones (that is the range from ~8.2 kHz up). As bruhns is a pretty low volume sample maybe being more conservative at low volume is helpful too. What do you think, Nick?
Nick.C
QUOTE(halb27 @ Apr 2 2008, 20:42) *
I just finished my abx test.

First I followed your suggestion and used -7 -autoshape -nts 20 -snr 14.
This setting yields 309 kbps for my regular set (quite a bit higher than plain -7) and 355 kbps for my problem set (a bit low for problem samples).
Hiss is pretty audible with bruhns (for instance sec. 9.3-10.2), but it's audible also with bibilolo (sec. 4.3-5.5) and badvilbel (sec. 5.9-7.2). There's also a slight inaccuracy with Atemlied (sec. 9.3-10.1) which is best audible at moderately high listening volume.
I didn't test a lot more samples then those mentioned because to me this is not adequate quality for an average of 309 kbps. The hiss (and the inaccuracy) isn't really annoying though, and I listened to some regular music (carefully but without abxing), and was content with it. Anyway looking at codecs like vorbis (I just tested the new Aoyumi version, and quality is great even at -q4 [~130 kbps] I personally don't like my abx result at a bitrate of ~310 kbps.

I redid the test using plain -7 autoshape which yields 325/384 kbps for my regular/problem test set.
bruhns 9.3-10.2 is better now to me, though still quite audible. Same goes for bibilolo.
I didn't test more samples as I personally am not content with this as well.

-6 -autoshape yields 337/404 kbps.
bibilolo is ok now, but when abxing bruhns I found added hiss already at sec. 2.3-4.4.

I skipped -5 and went directly to -4 -autoshape as from my last test I know plain -4 is transparent for me with the samples tested. -4 -autoshape yields 369/450 kbps. With this average bitrate for the problem set chances are good that everything is alright now.
bruhns at sec. 9.3-10.2 however still isn't perfect though quite acceptable.

Summing it up to me this isn't a good result for using -autoshape though it's a matter of taste whether or not one is willing to accept added hiss which seems to be the major issue when using -autoshape with low bitrate settings.

Maybe a variant of autoshape is more succesfull: make the frequency up shift of noise also depend on the degree to which there's energy in the input signal's 2 highest frequency zones (that is the range from ~8.2 kHz up). Do you like to try that, Nick?
Ok, I will try again to implement the RMS variability approach that I was trying (but didn't release).

Another approach would be to make the variability of the shaping non-linear with respect to bits-to-remove. At present it increases at 1/13 per bit to remove, i.e. 0=0; 1=1/13; 2=2/13; etc; 12=12/13; 13=13/13. If I was to change this from linear to some power, say for example shaping_factor = 1-((13-bits-to-remove)/13)^n then things may change.

Again, the noise shaping function itself is totally fixed, all the autoshape function is vary how much to apply. It doesn't change the frequency to which the noise is shifted. Think of it as -shaping 0 = pure white noise; -shaping 1 = fully shaped noise; -shaping <n> = something in between.

I'll get back to the "drawing board" with the autoshape function and post v0.9.1 soon.

As an aside, I have found that TCPMP for my iPAQ plays FLAC *much* better (better = less cpu usage and more accurate output) than GSPlayer / gspflac.dll. In particular dithernoisetest would exhibit some harmonics using GSPlayer which don't exist in TCPMP. TCPMP v0.72 RC1 is still available, google is your friend....
halb27
QUOTE(Nick.C @ Apr 2 2008, 22:57) *

... Again, the noise shaping function itself is totally fixed, all the autoshape function is vary how much to apply. It doesn't change the frequency to which the noise is shifted. Think of it as -shaping 0 = pure white noise; -shaping 1 = fully shaped noise; -shaping <n> = something in between. ...

That's clear. I was trying to bring another thing into focus: masking effects. If there's a lot of HF energy in the input signal, your shaping factor can be close to 1, and if there's no or little HF energy there the shaping factor is better close to 0. To be considered as well as the amplitude considerations.
halb27
QUOTE(Nick.C @ Apr 2 2008, 22:57) *

... Another approach would be to make the variability of the shaping non-linear with respect to bits-to-remove. At present it increases at 1/13 per bit to remove, i.e. 0=0; 1=1/13; 2=2/13; etc; 12=12/13; 13=13/13. If I was to change this from linear to some power, say for example shaping_factor = 1-((13-bits-to-remove)/13)^n then things may change. ....

With bits-to-remove=1 and n=2 this yields a shaping factor of ~0.148 > 1/13~0.077. So this would make the noise shaping more agressive (in case I understand this correctly).
Nick.C
QUOTE(halb27 @ Apr 2 2008, 21:15) *
QUOTE(Nick.C @ Apr 2 2008, 22:57) *
... Another approach would be to make the variability of the shaping non-linear with respect to bits-to-remove. At present it increases at 1/13 per bit to remove, i.e. 0=0; 1=1/13; 2=2/13; etc; 12=12/13; 13=13/13. If I was to change this from linear to some power, say for example shaping_factor = 1-((13-bits-to-remove)/13)^n then things may change. ....
With bits-to-remove=1 and n=2 this yields a shaping factor of ~0.148 > 1/13~0.77. So this would make the noise shaping more agressive (in case I understand this correctly).
Yes, it will make it more aggressive. Using the revised -autoshape -7, my 53 problem sample set yields 378.77kbps. With the v0.9.0 -autoshape -7, 366.21kbps.

lossyWAV beta v0.9.1 attached to post #1 in this thread.
halb27
QUOTE(Nick.C @ Apr 2 2008, 23:27) *

QUOTE(halb27 @ Apr 2 2008, 21:15) *
QUOTE(Nick.C @ Apr 2 2008, 22:57) *
... Another approach would be to make the variability of the shaping non-linear with respect to bits-to-remove. At present it increases at 1/13 per bit to remove, i.e. 0=0; 1=1/13; 2=2/13; etc; 12=12/13; 13=13/13. If I was to change this from linear to some power, say for example shaping_factor = 1-((13-bits-to-remove)/13)^n then things may change. ....

With bits-to-remove=1 and n=2 this yields a shaping factor of ~0.148 > 1/13~0.77. So this would make the noise shaping more agressive (in case I understand this correctly).
Yes, it will make it more aggressive. Using the revised autoshape, my 53 problem sample set yields 378.77kbps. With the v0.9.0 autoshape, 366.21kbps.

I think in the opposite direction: using for instance something like

bits-to-remove      shaping factor
        0                          0
        1                          0
        2                          0.1
        3                          0.15
        4                          0.25
        5                          0.4
        6                          0.55
        7                          0.7
        8                          0.8
        9                          0.85
       10                         0.9
       11                         0.95
    >=12                       1.0

so that as a tendency with low-volume spots shaping is small.
Should be positive for the added hiss of low-volume spots. Reduces the bitrate bloat as well.
Nick.C
I thought that the problem of hiss you were encountering would be white noise, i.e. shaping too low?

Is full -shaping 1.0 better than -autoshape for the problem samples you identified (bruhns, bibilolo, badvilbel)?

And finally (as if you had nothing else better to do wink.gif ) is the v0.9.1 -autoshape any better (if full shaping is better than autoshape v0.9.0)?
singaiya
David, I totally agree with your post. Well said, as usual. I'm excited for lossyWAV precisely because of it's potential as a transcodable source. I'll do some listening tests in this area when I can find the time. This sums it up for me:

QUOTE(2Bdecided @ Apr 2 2008, 05:45) *

If I can have something which is half the bitrate (or less), sounds identical, and transcodes identically, then I have no need to keep the lossless version.


Not to mention that there isn't a problem sample found yet (at more defensive settings).
Dynamic
QUOTE(2Bdecided @ Apr 2 2008, 13:45) *

1. "Why not?" Well, Why? Really, if there's no difference, why? It's OCD-like behaviour.

2. I might not know the lossy format I will need in the future. Shall I create mp3, ogg, AAC, HE-AAC etc?

3. If I'm a radio station, it's the broadcast (FM, mp2, mp3, WMA, whatever) that's the "transcode" - I can hardly avoid that or make it at the same time as I rip the CD.


So, for me, the "transcodability" of the less aggressive lossyWAV modes is very important.


I'm in agreement. I'd like to use a safe & robust setting in lossyWAV (-1 or -2 perhaps) just as I'd happily pre-process my rips with Album Gain and simple dither (using wavgain or foobar2000) before losslessly compressing them. I'd treat either as an excellent quality source to keep on my hard drive, which I could tag properly and then robustly encode to conventional lossy formats as I need it. Such an archive occupies far less space than straight lossless in the case of those many modern dynamically ultra-compressed albums.

I acquire new playback devices from time to time, and may desire different formats to suit the storage capacity / battery life / format compatibility / gapless support available with each. (Pragmatism frequently leads me to stick to LAME VBR MP3s, however). Also, on occassions, I actually need to have a degree of dynamic compression (foo_vlevel) for soft background music from highly dynamic sources that would get lost entirely in places if I didn't use some volume levelling. This requires processing before encoding (unless we get frame-by-frame volume levelling in mp3gain style).

I wouldn't normally want to transcode lossyWAV -2 to lossyWAV -7, for example, but perhaps if I wanted low battery-drain and fairly good quality on the right device, I'd be willing to do so, pragmatically, (and I'd be tempted to name the file as .transcoded.lossy.flac or with a .lossy7t.flac extension or some such, just in case it should ever find its way back onto my PC).

It seems that I'm part of a minority in being willing to use 'safe' lossyFLAC or lossyWV in place of true lossless as my main PC storage and for generating lossy files pretty-much on the fly, for whatever external device I wish.
The Sheep of DEATH
Is there currently any plan/work on dynamic noise shaping? That is, despite some obscure and ancient "patent" (which may/may not apply depending on license/age/obscurity/location/algo differences--and on that topic, imho, any software patent shouldn't last more than 5 years, much less 15)...

Currently, (yes I use tcpmp 0.72rc1 or the 0.8x builds floating around wink.gif), 320kbps is my max, so I'm looking forward to just the right combination of settings (i.e. -7 with shaping, snr, nts) to produce such a file at that bitrate. 320kbps with noise shaping, whew. This is really heating up! biggrin.gif
halb27
QUOTE(Dynamic @ Apr 3 2008, 02:12) *

... It seems that I'm part of a minority in being willing to use 'safe' lossyFLAC or lossyWV in place of true lossless as my main PC storage and for generating lossy files pretty-much on the fly, for whatever external device I wish.

I also think like that. I'd love to have just 1 collection (not a lossless and a lossy one), and -1 or -2 and a good additional noise shaping is a very promising way to go.
halb27
QUOTE(Nick.C @ Apr 2 2008, 23:55) *

I thought that the problem of hiss you were encountering would be white noise, i.e. shaping too low?

Is full -shaping 1.0 better than -autoshape for the problem samples you identified (bruhns, bibilolo, badvilbel)?

And finally (as if you had nothing else better to do wink.gif ) is the v0.9.1 -autoshape any better (if full shaping is better than autoshape v0.9.0)?

I'll try your proposals this weekend.
My considerations arise from my WavPack lossy experience. Before David Bryant introduced dynamic noise shaping I preferred to shift noise upwards. This eliminated ugly distortions with samples like keys, but it introduced the risk of audible hiss. This risk was very real when using settings in the 300 to 350 kbps range, especially with high values for the shift.
I think our situation is similar, and I think a strong shifting up should only be done when there's a high chance that the added hiss is masked. I think this is especially so as with the very aggressive settings quality control of our machinery has a weak basis in general and a very weak basis above ~3 kHz (though it works fine to an astonishing extent).
As for controlling the masking of HF hiss I can imagine a crude approach is sufficient.
The very first approach can be: use a shaping factor of 1 for very loud music, and a shaping factor of 0 for quiet music, but do it defensively meaning: with music of mediocre loundness use a moderate shaping factor closer to 0 than to 1.
Bits to remove is a rough measure for the loudness of the music. So the noise shaping factor can be computed by something like:

noise shaping factor   = 0     for bits-to-remove <= 5
                                   = 1     for bits-to-remove >=12
                                   = (bits-to-remove - 5)^2/49     for bits-to-remove between 5 and 12

With this very crude approach of controlling the masking I think it's best to be very conservative.

A better hiss masking control (which needn't be that defensive) could be not to take into account the loudness of the music (or the number of bits to remove), but the HF energy of the input signal, something like the sum of all the bins in the 2 highest frequency zones of the FFT analyses (~8.2+ kHz) for all the 64 sample FFTs which make up for an entire 512 sample block.
Nick.C
QUOTE(halb27 @ Apr 3 2008, 06:08) *
I'll try your proposals this weekend.
My considerations arise from my WavPack lossy experience. Before David Bryant introduced dynamic noise shaping I preferred to shift noise upwards. This eliminated ugly distortions with samples like keys, but it introduced the risk of audible hiss. This risk was very real when using settings in the 300 to 350 kbps range, especially with high values for the shift.
I think our situation is similar, and I think a strong shifting up should only be done when there's a high chance that the added hiss is masked. I think this is especially so as with the very aggressive settings quality control of our machinery has a weak basis in general and a very weak basis above ~3 kHz (though it works fine to an astonishing extent).
As for controlling the masking of HF hiss I can imagine a crude approach is sufficient.
The very first approach can be: use a shaping factor of 1 for very loud music, and a shaping factor of 0 for quiet music, but do it defensively meaning: with music of mediocre loundness use a moderate shaping factor closer to 0 than to 1.
Bits to remove is a rough measure for the loudness of the music. So the noise shaping factor can be computed by something like:

noise shaping factor   = 0     for bits-to-remove <= 5
                                   = 1     for bits-to-remove >=12
                                   = (bits-to-remove - 5)^2/49     for bits-to-remove between 5 and 12

With this very crude approach of controlling the masking I think it's best to be very conservative.

A better hiss masking control (which needn't be that defensive) could be not to take into account the loudness of the music (or the number of bits to remove), but the HF energy of the input signal, something like the sum of all the bins in the 2 highest frequency zones of the FFT analyses (~8.2+ kHz) for all the 64 sample FFTs which make up for an entire 512 sample block.
So, instead of just calculating the minimum / average of each FFT output for the whole range, 20Hz > 16kHz, I could calculate a minimum / average for each different portion of the spreading frequency list. In this way, the relative outputs in each sub-range could be compared and if the high frequency range was low then apply less shaping as you have already suggested.

[edit] As an aside, I thought that we were getting close to the "end" with respect to v1.0.0, so the release numbers have been climbing rapidly. As we are in (yet another!) potentially fairly fast transitionary period, I will be appending b > z to the beta releases to give me more "time" before v1.0.0.... [/edit]

[edit2] If I was going to really push the processing-time-per-codec-block requirement, I could also carry out a 512 sample FFT on the correction data, i.e. quantization noise, and see where the quantization noise has actually gone.... [/edit2]
halb27
QUOTE(Nick.C @ Apr 3 2008, 08:54) *

So, instead of just calculating the minimum / average of each FFT output for the whole range, 20Hz > 16kHz, I could calculate a minimum / average for each different portion of the spreading frequency list. In this way, the relative outputs in each sub-range could be compared and if the high frequency range was low then apply less shaping as you have already suggested. ...

I do not understand the minimum / average approach.

I think what I have in mind is something else: compute the input signal's HF energy of a block as the sum of all the bins in the 2 highest frequency zones (~8.2+ kHz) of all the 64 sample FFTs which cover the block.
Compare this HF energy to predefined energy levels which tell about the noise shaping factor.
For the predefined energy levels:
Look at the HF energy (computed the same way) of the bibilolo start (at the seconds I mentioned in my last test report) and use a noise shaping factor of 0 for this energy level.
On the other end take loud music with a high amount of HF (for instance 'Living in the future'), and use the computed energy level as a measure for using a noise shaping of 1.
In production when energy level is between these two extreme forms, use a quadratic function of the form (HF-a)^2/b for interpolation to get the noise shaping value.
SebastianG
QUOTE(Nick.C @ Apr 3 2008, 07:54) *

[edit2] If I was going to really push the processing-time-per-codec-block requirement, I could also carry out a 512 sample FFT on the correction data, i.e. quantization noise, and see where the quantization noise has actually gone.... [/edit2]

Why?
Assuming the unfiltered quantization noise to act like a memoryless source of random numbers with rectangular probability density the noise power you'll get after shaping can directly be computed with the help of the noise transfer function N. For a frequency f in radians f=Hz*(2pi/fs) where fs=sampling_frequency_in_Hz set z=cos(f)+i*sin(f) and compute |N(z)*2^{bits2remove}| which is proportional to the the amplitude spectral density of the filtered noise.

Usually a psychoacoustic codec determines the amount of tolerable noise in specific time/frequency regions. Seeing "spreading function" popping up here I assume you're actually doing that computation. For a "codec block" the result of this computation would be a curve describing the spectral power density of the tolerable noise. Then you could try to find the parameters 's' and 'b' for the curve |N(z*s)*2^{b}| so it's still under the tolerable noise curve but maximizes b -- the number of bits to remove. 's' here is the shaping strengh parameter.

my 2 cents on optimizing the number of bits to remove and the shaping strenth,
SG
2Bdecided
QUOTE(Dynamic @ Apr 3 2008, 00:12) *
Also, on occassions, I actually need to have a degree of dynamic compression (foo_vlevel) for soft background music from highly dynamic sources that would get lost entirely in places if I didn't use some volume levelling. This requires processing before encoding (unless we get frame-by-frame volume levelling in mp3gain style).
OT: That's possible, but no one has implemented it. It wouldn't be as good or flexible as a separate DRC, but it would often be better than transcoding an mp3 to another mp3.


QUOTE(Nick.C @ Apr 3 2008, 06:54) *
[edit] As an aside, I thought that we were getting close to the "end" with respect to v1.0.0, so the release numbers have been climbing rapidly. As we are in (yet another!) potentially fairly fast transitionary period, I will be appending b > z to the beta releases to give me more "time" before v1.0.0.... [/edit]
IMO (though others may disagree strongly) your first "stable" release should be without noise shaping.

Also IMO (and again, others may disagree) the more you base your noise shaping on the input signal, the closer you get to that Sony patent.

If fixed noise shaping doesn't buy you much, you should definitely get a stable and (as far as I know) patent-free release out there before playing with noise shaping any more.

If nothing else, having a "stable" release is going to get you a lot more testers! (I would hope!).

Cheers,
David.
halb27
QUOTE(2Bdecided @ Apr 3 2008, 11:43) *

IMO (though others may disagree strongly) your first "stable" release should be without noise shaping.
...
If nothing else, having a "stable" release is going to get you a lot more testers! (I would hope!). ...

As for the current state I also don't see a big advantage of noise shaping.
But we're already talking about things which are very promising cause with loud and HF rich music (aka most of pop/rock music) noise shifting can really lead to very high quality at a rather moderate bitrate. Guess this is an attractive feature for many users. Sure we're moving here in the world of psychoacoustics but to a lot slighter degree than transform codecs do it.

And look at the initial purpose we're struggling at with -2 and -1 where we don't rely on any kind of psy model to assure quality. With a good noise shaping we get a near noiseless frequency range of the fundamentals and a controlled quality in the HF region. To me this is very attractive and we should rather wait a bit yet until final release.

Of course there shouldn't be any patent problems. But is there really an issue with the proposals done so far?
GeSomeone
QUOTE(Nick.C @ Apr 3 2008, 07:54) *
As we are in (yet another!) potentially fairly fast transitionary period, I will be appending b > z to the beta releases to give me more "time" before v1.0.0....

How about 0.10.1 0.11.1 etc. dry.gif
Although lossyWav can be considered beta, the noiseshaping functions might be considered alpha (debatable).
May I asked what will be the result when the "optimum" noise shaping is found, better quality at the same bit rate or lower bit rate at the same quality? unsure.gif anything else seems not useful.
Nick.C
QUOTE(GeSomeone @ Apr 3 2008, 12:05) *
QUOTE(Nick.C @ Apr 3 2008, 07:54) *
As we are in (yet another!) potentially fairly fast transitionary period, I will be appending b > z to the beta releases to give me more "time" before v1.0.0....
How about 0.10.1 0.11.1 etc. dry.gif
Although lossyWav can be considered beta, the noiseshaping functions might be considered alpha (debatable).
May I asked what will be the result when the "optimum" noise shaping is found, better quality at the same bit rate or lower bit rate at the same quality? unsure.gif anything else seems not useful.
Yes, I can use 0.10.0, etc. - so I will.

Noise shaping makes the processed data less predictable for the lossless codec, thus increasing bitrate. However, its use can allow more aggressive settings to be used before the results are noise shaped.

David: I'm inclined to agree with you - v1.0.0 should be issued with noise shaping code removed.

Horst: from beta v1.0.1, I would expect to improve noise shaping and its application in lossyWAV.

Sebastian: Your understanding of applied mathematics far exceeds mine - I'm not sure what you're getting at.
halb27
QUOTE(Nick.C @ Apr 3 2008, 14:19) *

...
David: I'm inclined to agree with you - v1.0.0 should be issued with noise shaping code removed.

Horst: from beta v1.0.1, I would expect to improve noise shaping and its application in lossyWAV.
...

Sounds like a promising road map.
2Bdecided
QUOTE(Nick.C @ Apr 3 2008, 12:19) *
Sebastian: Your understanding of applied mathematics far exceeds mine - I'm not sure what you're getting at.
Simplistically, that the quantisation noise has gone exactly where you've put it in a predictable way - you don't need to check - unless something is broken.

Of course, it's easy to break something, and useful to have that checking code in there for debugging. It also saves having to think the theory through! wink.gif

Cheers,
David.
halb27
QUOTE(Nick.C @ Apr 2 2008, 23:55) *

I thought that the problem of hiss you were encountering would be white noise, i.e. shaping too low?

Is full -shaping 1.0 better than -autoshape for the problem samples you identified (bruhns, bibilolo, badvilbel)?

And finally (as if you had nothing else better to do wink.gif ) is the v0.9.1 -autoshape any better (if full shaping is better than autoshape v0.9.0)?

I tried v0.9.0 -autoshape, v0.9.0 -shaping 1, v0.9.1 -autoshape on bruhns.
To me the v0.9.0 -autoshape and v0.9.1 -autoshape quality is identical (both from subjective impression as well as the abx results: sec. 9.3-10.2: 7/10 with both versions, sec. 2.3-4.4: 10/10 with v0.9.0 and 9/10 with v0.9.1).
Judgement about -shaping 1 isn't so easy. After listening to -autoshape I had problems identifying the problem with -shaping 1 and scored badly. Once more used to it however I could recognize it and with some trials the problem was even more pronounced to me than with the autoshape versions. Guess my hearing abilities are very much at their limits with these very high frequency problems, but from time to time they become apparent even to me.

I also wanted to try bibilolo sec. 4.3-5.5 but didn't succeed at all in abxing it today (fatigue? bad constitution?)
Kwevej
Why lossyWAV ?

I think that Musepack can do it better...
Mitch 1 2
Kwevej, read the lossyWAV FAQ.
halb27
QUOTE(Kwevej @ Apr 6 2008, 12:31) *

Why lossyWAV ?

I think that Musepack can do it better...

Musepack, as well as Vorbis, AAC, mp3, does it better at a bitrate in the 200 kbps area (very roughly speaking). Most people don't have the need for anything else (apart from maybe lossless archiving).

The special thing about lossyWAV is that when used the way the initial purpose was (achieved by using quality level -2 or -1) we get an extremely high quality.
There is no guarantee that things can't go wrong (after all it's lossy), but according to experience there's nothing to be afraid of. Codecs like mp3 and the others mentioned above have a complicated signal path which changes the original technical description of the music enormously, and there is a lot of heuristic decision making. As a result there's always the risk of artefacts and inaccuracies in the changed technical description though we all know codecs like Vorbis do a great job at pretty low bitrate, and it's very rare that music isn't transparent at a bitrate of ~200 kbps (usually that's overkill already).
LossyWAV in contrary doesn't change the structure of the technical description of the music at all, it uses the usual 16 bit PCM description of the wave samples and only reduces the accuracy of the samples (by rounding and thus zeroing those least significant bits which it thinks it can safely do so. The 16 bits of the PCM format are needed to accurately describe the full dynamics of loud as well as quiet music. For quiet music we can't save a lot of bits because quiet music is described with rather few bits (many of the most significant bits are zero) which we usually need for an accurate description, but for loud music we can - the full 16 bit accuracy usually isn't needed here.)
The downside is that with this approach we can't get the efficiency of Musepack etc. Using lossyWAV the secure way yields a bitrate of ~420 kbps for the -2 setting (or ~470 kbps for the -1 setting which is considered to be overkill, just for the very cautious minded or those who like to use extreme quality lossyWAV as a replacement for lossless archiving.).

There is a wish for going lower in bitrate, and due to additional internal quality assuring mechanisms we can do so without going very risky. This however isn't backed up by the initial idea anymore. To me the -4 quality setting still is transparent and yields a bitrate of ~350 kbps. Even when going lower like with -6 (~310 kbps) the rare and subtle inaccuracies are easily acceptable to me.
Noise shaping is the current theme of further development, and maybe it's possible this way to improve transparancy in the 300-350 kbps range and/or achieve extremely high though not transparent quality a bit below 300 kbps.
Mitch 1 2
Nick, is there any chance of seeing piping support (both in and out), before lossyWAV 1.0?
Nick.C
QUOTE(Mitch 1 2 @ Apr 10 2008, 10:31) *
Nick, is there any chance of seeing piping support (both in and out), before lossyWAV 1.0?
Unless I were to know where to start, i.e. how *does* foobar2000 pipe WAV data into a program, then how do I pipe that out to FLAC / WavPack / TaK / etc to produce the encoded processed output?

I think it may be better to park piping beside noise shaping and attempt to include it in v1.1.0 rather than to delay the work up to release of v1.0.0 any further.

I have been working on tidying up the code and have shaved another second off the processing time for my 53 problem sample set at preset -7:
CODE
|======|==================|==================|
|  QS  | Time/Rate v0.9.2 | Time/Rate v0.9.0 |
|======|==================|==================|
|  -7  | 13.14s / 56.60x  | 14.34s / 51.86x  |
|  -7a | 18.26s / 40.73x  | 19.30s / 38.54x  |
|  -7b | 24.25s / 30.66x  | 24.56s / 30.27x  |
|  -7c | 28.62s / 25.98x  | 29.47s / 25.23x  |
|======|==================|==================|
All tests were carried out with the input files cached in memory to ignore read latency.
Mitch 1 2
QUOTE(Nick.C @ Apr 11 2008, 22:07) *
Unless I were to know where to start, i.e. how *does* foobar2000 pipe WAV data into a program, then how do I pipe that out to FLAC / WavPack / TaK / etc to produce the encoded processed output?
Now you're getting ahead of yourself. wink.gif I was simply asking about stdin/stdout support.
To support foobar2000, however, I suppose you could launch (with parameters) an external encoder, e.g. "lossyWAV.exe - -o -enc flac.exe -f -b 512 -e -o %d -".
carpman
Hi all,

I was messing around with an audio recording from a YouTube video. I got rid of everything > 14kHz. So this isn't your high quality CD audio here (i.e. perhaps not lossyWAVs intended use), but I was surprised that when I ran it through LossyWav (-2) the resultant lossy.flac file was larger (747kbps) than the lossless FLAC file (737 kbps), both were encoded with the latest FLAC using -5.

Is that odd? Seems odd to me.

C.
jesseg
Not really odd. The noise-floor above your filter was very very low so there's very little if any bits for lossyWAV to remove, and combined with the difference in block size used in FLAC (assuming that you didn't force them both to 512) then... that makes sense.
Nick.C
QUOTE(carpman @ Apr 12 2008, 01:06) *
Hi all,

I was messing around with an audio recording from a YouTube video. I got rid of everything > 14kHz. So this isn't your high quality CD audio here (i.e. perhaps not lossyWAVs intended use), but I was surprised that when I ran it through LossyWav (-2) the resultant lossy.flac file was larger (747kbps) than the lossless FLAC file (737 kbps), both were encoded with the latest FLAC using -5.

Is that odd? Seems odd to me.

C.
The upper limit for lossyWAV is 16kHz - so you had a 2kHz zone which would not allow any bits to remove.
carpman
Nick, jesseg

Thanks for your replies and patience.

C.
Kwevej
QUOTE(Mitch 1 2 @ Apr 6 2008, 14:01) *

Kwevej, read the lossyWAV FAQ.



Someone will send me a FLAC. How would I recognize, that it is really lossless?
I don't like the "Lossy Lossless" idea.
Nick.C
QUOTE(Kwevej @ Apr 12 2008, 15:51) *
QUOTE(Mitch 1 2 @ Apr 6 2008, 14:01) *
Kwevej, read the lossyWAV FAQ.


Someone will send me a FLAC. How would I recognize, that it is really lossless?
I don't like the "Lossy Lossless" idea.
If you rip your own FLAC files, you will always know whether they are lossless or not.

Also, how do you know that the FLAC file you have has not been created from a decoded MP3 file?
jesseg
Good point Nick.

Re: knowing if a FLAC is a lossyFLAC... if --keep-foreign-metadata was used in the FLAC command-line, then you will be able to know if the source file was a lossyWAV or not (but saying nothing of the pre-lossyWAV source). It's a little round-about way because you would (as far as i know) have to decode it to wav, and then use the lossyWAV -check method. But... if there's a way to view metadata in a FLAC file directly, then that would be easier.

And as Nick said, if a FLAC is encoded from an mp3, you have no way at all of knowing without a doubt through a purely technological means, unless the decoder saves the information as meta-data in the WAV or passes it through to a FLAC tag via a transcoder. And again, FLAC would have to be set to save the meta-data. Otherwise, it's all judgmental (and inaccurate) at that point, when you're trying to spot coding/decoding artifacts and decide what is what.
botface
QUOTE(Nick.C @ Apr 12 2008, 08:47) *

QUOTE(carpman @ Apr 12 2008, 01:06) *
Hi all,

I was messing around with an audio recording from a YouTube video. I got rid of everything > 14kHz. So this isn't your high quality CD audio here (i.e. perhaps not lossyWAVs intended use), but I was surprised that when I ran it through LossyWav (-2) the resultant lossy.flac file was larger (747kbps) than the lossless FLAC file (737 kbps), both were encoded with the latest FLAC using -5.

Is that odd? Seems odd to me.

C.
The upper limit for lossyWAV is 16kHz - so you had a 2kHz zone which would not allow any bits to remove.

Nick would you mind expanding on "The upper limit for lossyWAV is 16kHz"?
I assumed you meant that it was the HF cut-off point so that anything above that frequenecy would be "ignored" and hence missing from the output. However, having done a brief test on a piece recorded from FM radio the frequency plots from the original wav file and the lossywav file look the same, with no reduction in >16k levels. Especially noticeable is that the 19khz pilot tone is still there and not reduced in level.

Sorry if this is a dum question

Nick.C
QUOTE(botface @ Apr 13 2008, 20:42) *
QUOTE(Nick.C @ Apr 12 2008, 08:47) *
QUOTE(carpman @ Apr 12 2008, 01:06) *
Hi all,

I was messing around with an audio recording from a YouTube video. I got rid of everything > 14kHz. So this isn't your high quality CD audio here (i.e. perhaps not lossyWAVs intended use), but I was surprised that when I ran it through LossyWav (-2) the resultant lossy.flac file was larger (747kbps) than the lossless FLAC file (737 kbps), both were encoded with the latest FLAC using -5.

Is that odd? Seems odd to me.

C.
The upper limit for lossyWAV is 16kHz - so you had a 2kHz zone which would not allow any bits to remove.
Nick would you mind expanding on "The upper limit for lossyWAV is 16kHz"?
I assumed you meant that it was the HF cut-off point so that anything above that frequenecy would be "ignored" and hence missing from the output. However, having done a brief test on a piece recorded from FM radio the frequency plots from the original wav file and the lossywav file look the same, with no reduction in >16k levels. Especially noticeable is that the 19khz pilot tone is still there and not reduced in level.

Sorry if this is a dum question
Not a dumb question at all - I am guilty of giving a truncated explanation of what I should have elaborated on.....

When the FFT analyses are carried out on each codec_block in lossyWAV, the results between 20Hz and 16kHz are taken into account when determining bits_to_remove for that FFT (and ultimately that codec_block). The only process applied to the actual audio data is the remove_bits routine, i.e. revised_sample:=round(original_sample / (2^bits_to_remove))*(2^bits_to_remove) which sets the lowest bits_to_remove lsb's to zero. No frequencies are intentionally removed from the output samples.

Anyway, due to the lack of problematic feedback for beta v0.9.1, lossyWAV v0.9.2 RC3 is attached to post #1 in this thread.
2Bdecided
QUOTE(Kwevej @ Apr 12 2008, 15:51) *
Someone will send me a FLAC. How would I recognize, that it is really lossless?
I don't like the "Lossy Lossless" idea.
If it came from an mp3 file, you can often spot this in the spectrogram.

If it came from a lossyWAV file, you can count the number of "wasted bits" in the FLAC file, or the number of (512-sample) blocks of LSBs set to zero (where some MSBs are non-zero) in the decoded .wav. If you find several of either, it's probably from lossyWAV (or some even rarer perversion of the audio).


If you add noise to either of the above, these methods won't work.

Cheers,
David.
Nick.C
Thanks to unfortunateson for the 96khzsample FLAC file - when I tried to process the contained WAV file lossyWAV crashed. It turned out to be a divide by zero error in the preparation of the skewing factors.

This led me to re-assess the skewing factor preparation and I quickly found a simple fix (which also improved the methodology) - however, the fix reduces the bitrate of all the quality presets by around 20kbps.

I had already made an unrelated change to the spreading function which increased the bitrate for -3 to -7 by between 2kbps and 4kbps.

However, the amendment to the skewing function preparation has reduced the difference in bitrate between the 3 existing spreading functions.

So, I have amended the skewing function preparation and there is now only one spreading function (that for quality preset -1).

lossyWAV beta v0.9.3 attached to post #1 in this thread.

As this beta has changed some longstanding "constants" of the method, I will be extremely grateful if some of our more acutely eared members could ABX some of the more problematic samples and post feedback.

I am fairly sure that quality *should* not have suffered, but my ears are not good enough to perform the critical evaluation required.

Thanks,

Nick.

I have processed my 53 problem sample set using beta v0.9.3 and the change in spreading function has changed the variation in bitrate somewhat:

CODE
|-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|   Version   |lossyWAV -1|lossyWAV -2|lossyWAV -3|lossyWAV -4|lossyWAV -5|lossyWAV -6|lossyWAV -7|
|-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
| v0.9.2 RC3  |  543kbps  |  494kbps  |  433kbps  |  408kbps  |  385kbps  |  365kbps  |  348kbps  |  
|-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
| beta v0.9.3 |  505kbps  |  467kbps  |  435kbps  |  406kbps  |  381kbps  |  357kbps  |  337kbps  |
|-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
From this, I feel that maybe -1 should become (even) more conservative in its -snr and -nts settings (currently 24 and -3 respectively). However, the change in change in bitrate between quality presets is now significantly more linear than it has ever been.

Also, as all quality presets now use the same spreading function I could (nearly) implement a fractional quality preset (like oggenc) between -1.000 and -7.000.
halb27
QUOTE(Nick.C @ Apr 17 2008, 13:43) *

... I had already made an unrelated change to the spreading function which increased the bitrate for -3 to -7 by between 2kbps and 4kbps. ...

Very welcome IMO as this important frequency range didn't get much of the skewing effect so far. But why don't you do it for -2 and -1 as well especially as bitrate seems to have dropped by your changes?
As far as I can see 2Bdecided's basic principle is still totally taken care of when using -2 and -1, and that's the important thing.
I wouldn't care much about the bitrate drop. Maybe the lowest quality settings aren't acceptable any more, but IMO a quality scale from -1 to, say, -5 is sufficient.

I will do my usual tests as soon as possible, but my father has died so at the moment I will only seldom look up HA.
GeSomeone
QUOTE(Nick.C @ Apr 17 2008, 13:43) *
there is now only one spreading function (that for quality preset -1).

Does that mean a performance hit? (I remember that -1 uses the most CPU but I'm not sure if that was partly because of the spreading).
Nick.C
QUOTE(GeSomeone @ Apr 18 2008, 10:39) *
QUOTE(Nick.C @ Apr 17 2008, 13:43) *
there is now only one spreading function (that for quality preset -1).
Does that mean a performance hit? (I remember that -1 uses the most CPU but I'm not sure if that was partly because of the spreading).
Not at all, the spreading is actually quicker for -1 as fewer bins are averaged across the frequency ranges. The performance hit for -1 is related to the extra 256 sample FFT's which are calculated.

However, I have implemented the floating point quality presets from -1.0000 to -7.0000 and am considering removing the 256 sample FFT from the -1 quality preset (it can still be added using -1a to -1.9999a) which would make all quality presets default to 2 FFT analysis lengths (64 sample and 1024 sample) with 128, 256 and 512 sample FFTs remaining optional.

I will post beta v0.9.4 later today with the FP quality presets enabled and -1 defaulting to 2 FFT analysis lengths.
SokilOff
QUOTE(halb27 @ Apr 6 2008, 06:08) *


There is a wish for going lower in bitrate, and due to additional internal quality assuring mechanisms we can do so without going very risky. This however isn't backed up by the initial idea anymore. To me the -4 quality setting still is transparent and yields a bitrate of ~350 kbps. Even when going lower like with -6 (~310 kbps) the rare and subtle inaccuracies are easily acceptable to me.


Thanks for detailed explanations. But is there any difference between lossyWAV and f.i. WavPack lossy ? At high bitrates (> 320-350 kbps) WavPack lossy seems to sound transparent too. Is there any advantages for lossyWAV over WavPack in lossy mode ?
Nick.C
QUOTE(SokilOff @ Apr 18 2008, 10:57) *
QUOTE(halb27 @ Apr 6 2008, 06:08) *
There is a wish for going lower in bitrate, and due to additional internal quality assuring mechanisms we can do so without going very risky. This however isn't backed up by the initial idea anymore. To me the -4 quality setting still is transparent and yields a bitrate of ~350 kbps. Even when going lower like with -6 (~310 kbps) the rare and subtle inaccuracies are easily acceptable to me.
Thanks for detailed explanations. But is there any difference between lossyWAV and f.i. WavPack lossy ? At high bitrates (> 320-350 kbps) WavPack lossy seems to sound transparent too. Is there any advantages for lossyWAV over WavPack in lossy mode ?
Only really that lossyWAV is compatible with a number of lossless codecs which make use of the wasted-bits approach so is in one sense codec independent. I don't know if anyone has carried out any comparisons between WavPack lossy and lossyWAV output - it would be interesting though....
2Bdecided
IIRC the lossy version of Wavpack on stable release supports CBR only. I can't remember if there's VBR in beta test.

lossyWAV is pure VBR, and does not support CBR.

Cheers,
David.
halb27
QUOTE(SokilOff @ Apr 18 2008, 11:57) *

QUOTE(halb27 @ Apr 6 2008, 06:08) *


There is a wish for going lower in bitrate, and due to additional internal quality assuring mechanisms we can do so without going very risky. This however isn't backed up by the initial idea anymore. To me the -4 quality setting still is transparent and yields a bitrate of ~350 kbps. Even when going lower like with -6 (~310 kbps) the rare and subtle inaccuracies are easily acceptable to me.


Thanks for detailed explanations. But is there any difference between lossyWAV and f.i. WavPack lossy ? At high bitrates (> 320-350 kbps) WavPack lossy seems to sound transparent too. Is there any advantages for lossyWAV over WavPack in lossy mode ?

lossyWAV has a quality control, whereas wavPack lossy hasn't. With wavPack lossy you give a target bitrate which is internally converted to an accuracy demand for the predictor error. This is not directly related to overall accuracy (cause in case the predictor is seriously wrong, it takes a high degree of accuracy for the predictor error to get at a good overall accuracy). There's more to it which takes into account special problems, but roughly speaking it's like that.
The disadvantage of lossyWAV as compared to wavPack lossy is that for the lossless part a small blocksize of 512 samples is necessary to make good use of the varying bits-to-remove. This however makes the lossless codec less efficient. Moreover David Bryant has implemented an effective noise shaping in wvPack losssy. Noise shaping in lossyWAV is work in progress and at the moment is problematic as it blows up bitrate of the lossless codec because of added high frequency hiss of rather high volume.

So at the moment I think with an average bitrate of >~ 400 kbps lossyWAV is to be preferred (using -2 or -1) because of the better accuracy control.
At a bitrate of roughly 350 kbps I think both codecs' quality is comparable. They are both expected to be transparent with few exceptions. Maybe the number of exceptions is a bit fewer with lossyWAV, but that's speculation, and I think we can be very content with both codecs.
At a bitrate below ~300 kbps I think wavPack lossy is preferable because of it's more efficient coding which becomes more and more important the lower we go with bitrate.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.