Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: lossyWAV 1.3.0 Development Thread (Read 195550 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

lossyWAV 1.3.0 Development Thread

Reply #100
I've made another sample, similar to "wheeee" but distinctly different, and this one exhibits two clicks similar to the one that happens in wheeee. Here it is, this one is aptly named "zweee" (what can I say - I'm talented for giving them appropriate names):

- Zweee

This time the clicks are on the left channel, one at 1.13 seconds and the other at 2.9 seconds.

Indeed, this time the clicks are much more difficult to hear in the lossy sample but looking at the added noise (zweee.lwcdf.wav) it's apparent that something is obviously wrong:







They're even clearly visible when you compare the waveforms of zweee.wav and zweee.lossy.wav:






Nick, maybe you could see what is common in these three spots - right channel of "wheeee" at 5.43 seconds and left channel of "zweee" at 1.13 and 2.9 seconds, maybe there's something common in the spectrums or something else at these three spots that makes lossyWAV bug out and produce clicks like these.

lossyWAV 1.3.0 Development Thread

Reply #101
Thanks for this sample - I will, as you suggest, use it to fault find.

The error looks like filter instability - I need to work out what conditions that occurs under.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV 1.3.0 Development Thread

Reply #102
Glad I can help!

Also, there's another thing that's odd for me about this new sample, more specifically about the noise lossyWAV adds to it (at least with -Z -A), and that's that there appears to be absolutely no noise added at all (or extremely little) in the places where, judging by the original signal, there definitely should have been noise added. For example, check out the noise at 0.62, 1.6, 3.42 and 4.38 seconds - nearly no noise added in both channels, yet the original signal doesn't seem to be any quieter at those places and the noise floor appears to be just as high as elsewhere.

lossyWAV 1.3.0 Development Thread

Reply #103
The error looks like filter instability

I agree.

I need to work out what conditions that occurs under.

One aspect is numerical accuracy. You should probably use doubles (if you're not doing this already) for the filter design code. I'm not sure if it's worth using double floats for the actual filter, though. But the most important aspect is the shape you want the filter to match. It should be rather smooth and not too steep.

Cheers!
SG

lossyWAV 1.3.0 Development Thread

Reply #104
Thanks for the confirmation, SG. I am using doubles throughout - however, I am not checking the desired shape of the filter at all, although it is modified (-3dB <1kHz; -1dB >6kHz; interpolated between 1kHz and 6kHz). I have worked out a temporary fix by only feeding part (>80%) of the "quantization error" into the WAPL_Update function - this seems to work for the two samples provided by doccolinni.

lossyWAV beta 1.2.2j attached to post #1 in this thread.

[edit] Spectrograms and sample plots removed. [/edit]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)


lossyWAV 1.3.0 Development Thread

Reply #106
I tried furious again using 1.2.2.j -Z --adaptive.
Hiss is still there, and there are click-like sounds added.

I will try different --adaptive parameters hopefully on sunday.
lame3995o -Q1.7 --lowpass 17

 

lossyWAV 1.3.0 Development Thread

Reply #107
I have worked out a temporary fix by only feeding part (>80%) of the "quantization error" into the WAPL_Update function - this seems to work for the two samples provided by doccolinni.

I actually have no idea what this quick fix does. It may work in this case and be worse in others. You alter the shape by this in a way I can't easily predict. IMHO, you should generate better curves in the first place instead of feeding "bad" curves to the filter design routines and messing with the noise shaping loop. The noise shaper is not the problem. The input to the filter design routines is.

You should partition the spectrum into non-uniform subbands (for example, starting with bandwidths of 100Hz and increasing bandwidths up to 1000Hz for higher frequencies) compute tolerable noise levels in dB for each of these subbands based on the signal's power and "tonality" in those areas. As a first approximation you can use fixed SNRs depending on the frequency ranging from 40 dB (lower frequencies) to 10 dB (above 6 kHz). Make sure that the levels between neighbouring bands don't differ too much (i.e. differences restricted to +/- 15 dB). In order to do this you might have to decrease some noise levels. Keep in mind that 0dB corresponds to the noise level of rectangular quantization noise with +/- 1/2 LSBs. So, going below, say, -20 dB hardly makes sense. Interpolate those resulting data points smoothly (for example, with a 2nd order B-Spline as "interpolator").

How do you currently determine the "bits_to_remove" value?

Cheers!
SG

lossyWAV 1.3.0 Development Thread

Reply #108
I agree that the "desired shape" used at present is to blame - probably too steep, as you said earlier.

The signal is analysed using an FFT the same length as the codec block to produce the "desired shape" at present. This is 512 samples for 44.1/48kHz signal, giving a bin resolution of approx. 86Hz. I have an idea how to compute power (from one of your e-mails) but for tonality I do not know.

I will try to start on calculating power for varying bandwidths (for every bin?) and see where that gets to.

The determination of bits-to-remove has not changed - I remember you saying that the make_filter routine determines this (gain?) but it was much lower than the value determined using the existing method and was not used. Maybe this could be used totally separately (i.e. no other FFT analyses of the signal) to determine bits-to-remove, but then it wouldn't be lossyWAV as we know it.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV 1.3.0 Development Thread

Reply #109
The determination of bits-to-remove has not changed - I remember you saying that the make_filter routine determines this (gain?) but it was much lower than the value determined using the existing method and was not used. Maybe this could be used totally separately (i.e. no other FFT analyses of the signal) to determine bits-to-remove, but then it wouldn't be lossyWAV as we know it.

In that case maybe it's the algorithm determining bits_to_remove somehow interfering with the algorithm determining noise shaping/filtering? I mean, if the make_filter algorithm determining the noise shaping/filtering makes such calculations which produce lower values for bits_to_remove then if you increase the value of bits_to_remove (which the old algorithm does) maybe in some rare cases (as in the samples I provided) the noise shaping/filtering won't produce the desired result precisely because more bits were removed than for what the noise shaping/filtering produced by make_filter was intended?

I don't know if any of that made sense.

lossyWAV 1.3.0 Development Thread

Reply #110
I tried 1.2.2.j -Z --adaptive default, 64, and 128 with furious.
Worst part of the 1.2.2.j encodings is from second 1.9 to 2.7: click-like artefacts and hiss.

--adaptive 64 improves upon the kind of clicks compared to --adaptive default, but makes hiss worse.
--adaptive 128 is similar, but I prefer --adaptive 64.

Looking at the error file for second 1.9 to 2.7, the error has a strange block structure in the spectogram.

While -z --adaptive is fine for tuning purposes, probably it uses too low a bitrate.
So I also tried -Z --adaptive 64 --altpreset. The strange artefacts are gone for me, but the hiss is still ABXable though not that easy as before.
-Z --adaptive --altpreset is better for me as the hiss has lower volume, but the last second is still not very hard to ABX.

For a comparison I also listened to the non-adaptive variant -Z --altpreset. I'm sorry to say that at the moment --adaptive doesn't improve upon the result of this.
lame3995o -Q1.7 --lowpass 17

lossyWAV 1.3.0 Development Thread

Reply #111
Thanks again for your listening efforts - I have taken onboard SG's comments and will attempt to implement it over the next few days.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV 1.3.0 Development Thread

Reply #112
IMHO, you should generate better curves in the first place instead of feeding "bad" curves to the filter design routines and messing with the noise shaping loop.
Yes!

Quote
You should partition the spectrum into non-uniform subbands (for example, starting with bandwidths of 100Hz and increasing bandwidths up to 1000Hz for higher frequencies) compute tolerable noise levels in dB for each of these subbands based on the signal's power and "tonality" in those areas. As a first approximation you can use fixed SNRs depending on the frequency ranging from 40 dB (lower frequencies) to 10 dB (above 6 kHz). Make sure that the levels between neighbouring bands don't differ too much (i.e. differences restricted to +/- 15 dB). In order to do this you might have to decrease some noise levels. Keep in mind that 0dB corresponds to the noise level of rectangular quantization noise with +/- 1/2 LSBs. So, going below, say, -20 dB hardly makes sense. Interpolate those resulting data points smoothly (for example, with a 2nd order B-Spline as "interpolator").
To be really clear, IMO (please correct me if I'm wrong here SebG), what SebG is talking about is how to get the thing working - not how to do a really good job.

With all this work, you'll have a 1980s style psychoacoustic model, which is then crippled.

I think a better way would be to offer the user 3 or 4 options (at least while beta testing):
1. no noise shaping
2. fixed noise shaping
3. dynamic noise shaping, conservative (not psy based)
4. dynamic noise shaping, aggressive (based on best psy model you can find)

If you're going to have one noise shaping filter per block (and that seems like a good start), then for a conservative approach I think you could try keeping the previous approach of using multiple FFT sizes to determine the lowest signal level in the block. For each frequency, use the lowest value found in that block across the various FFT sizes (with the scaling that was there already, and the many successful refinements that you added). Then smooth that target function, bringing the peaks down (not raising the troughs up). The bits you can remove is defined by the area under that target function.

I think any "proper" psy model should also be post-processed in this way, finding the lowest allowed noise level in each frequency band in each block, and smoothing the resulting target function so the filter is realisable and stable.


Going further, you could have multiple (successive) target filters in each lossyWAV block (though obviously only one bits_to_remove value - unless you communicate varying block size to the encoder!). Multiple target filters in one block would bring an advantage in a signal where the spectrum shape changed a lot within one block (different filter required) while the area under the spectrum didn't change much (same bits to remove required), but it wouldn't help much otherwise.

Hope this helps. Your hard work and tenacity is amazing!

Cheers,
David.

lossyWAV 1.3.0 Development Thread

Reply #113
To be really clear, IMO (please correct me if I'm wrong here SebG), what SebG is talking about is how to get the thing working - not how to do a really good job.

My suggestion was intended to be both.  The part with fixed, frequency-dependent SNRs was about getting it to work and could be replaced with more elaborate approaches. The stuff about grouping frequencies to bands, estimating tolerable noise levels is what every psychoacoustic model does unless I'm mistaken. The part about keeping the target filter's frequency response smooth still applies to more sophisticated psychoacoustic models.

If you're going to have one noise shaping filter per block (and that seems like a good start), then for a conservative approach I think you could try keeping the previous approach of using multiple FFT sizes to determine the lowest signal level in the block.

You mean the lowest level within a certain frequency region, right? Otherwise, I don't see where you'd need noise shaping for in this case.

[...] Then smooth that target function, bringing the peaks down (not raising the troughs up). The bits you can remove is defined by the area under that target function.

Right. The code I wrote even computes this "noise_gain" where log2(noise_gain) = bits_to_remove. This is sort of a by-product of the filter design computation.

I think any "proper" psy model should also be post-processed in this way, finding the lowest allowed noise level in each frequency band in each block, and smoothing the resulting target function so the filter is realisable and stable.

Yes. Well, "proper" models are expected to produce reasonably smooth "masking" curves due to the spreading function. Apart from "smoothness" another constraint is: Keep the noise PSD target below the signal's PSD (at least by, say, 9 dB). This ensures two things:
  • dithering won't be necessary to avoid nonlinear distortions (I think)
  • no added hiss / won't hurt predictability w.r.t. linear prediction (I'm 100% sure about that)


Going further, you could have multiple (successive) target filters in each lossyWAV block (though obviously only one bits_to_remove value - unless you communicate varying block size to the encoder!). Multiple target filters in one block would bring an advantage in a signal where the spectrum shape changed a lot within one block (different filter required) while the area under the spectrum didn't change much (same bits to remove required), but it wouldn't help much otherwise.

Yes. This noise PSD target should probably be smoothly interpolated over time to reduce artefacts due to the otherwise "hard" switching from one filter to another.

Cheers!
SG

lossyWAV 1.3.0 Development Thread

Reply #114
I wonder if the effort to interpolate IIR filters properly is worth it in this case? Do you know of some neat tricks?

As far as I can tell, the hard switch doesn't cause any artefact that breaches the upper spectral bounds of both filters. So while it's not ideal, the output remains within the bounds set by whatever model created the target filter responses.

Cheers,
David.

lossyWAV 1.3.0 Development Thread

Reply #115
I don't know if it's necessary and/or overkill. But the idea of a slowly changing filter as opposed to one that switches from block to block is appealing.

The easiest way this can be done is by smoothly adapting the target curve and feeding it to the filter design routines more than once per block. That's it. There's no need to "interpolate filter coefficients". Although, this would be possible as well. The speech coding community has some experience with this (interpolation of LAR or LSF coefficients). But in case of lossyWAV there's no need to add complicated code for interpolating filter parameters. One can invoke the filter design a couple of times with a slowly changing (over time) target curve and be done...

Cheers!
SG

lossyWAV 1.3.0 Development Thread

Reply #116
I have made some headway regarding the implementation of the varying bin width power spectrum - but if I add +ve and -ve frequencies in the way that you suggested earlier then I get a symmetry about N/4 in the power spectrum - how is that used to create the desired shape for the filter?

Reading David and your most recent comments, it seems that 1 x 512 sample FFT > results > power spectrum > desired shape > filter is not the best way to go - maybe it should be 4 x 128 sample FFT and gradual transition of the desired shape? Should these FFT analyses overlap? I take it that the desired shape from the previous block for that channel should be taken into account.

Food for thought - and a direction in which to develop the code.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV 1.3.0 Development Thread

Reply #117
IMO you should take account of the 512 sample FFT and the 128 sample FFTs, just like you did before. It's a conservative approach.

Also IMO you should "just" grab the psy model from musepack or similar.

@SebG - yes, tried that (intermediate filters from intermediate target response) - of course it works, but it's just a series of smaller discrete blocks, rather than something that really changes smoothly. It didn't seem to be worth it. But then I was playing in MATLAB so it was all very slow, hence "worth it" might be a different threshold in real code.

Cheers,
David.

lossyWAV 1.3.0 Development Thread

Reply #118
Also IMO you should "just" grab the psy model from musepack or similar.


What would be the practical reason between using a Musepack-based-processed FLAC encode and, say, a Musepack --braindead encode? The bitrates should be roughly similar, and I imagine there will be some minorly adverse effects for transcoding to another lossy format either way.
Infrasonic Quartet + Sennheiser HD650 + Microlab Solo 2 mk3. 

lossyWAV 1.3.0 Development Thread

Reply #119
Also IMO you should "just" grab the psy model from musepack or similar.

Although the Musepack psychoacoustic model was very effective, the comments from developers at the time, were that it wasn't very portable. Maybe something can be reverse engineered? OTOH maybe a much simpler model (without Clear Voice Detection) would suffice.
In theory, there is no difference between theory and practice. In practice there is.

lossyWAV 1.3.0 Development Thread

Reply #120
What would be the practical reason between using a Musepack-based-processed FLAC encode and, say, a Musepack --braindead encode?
What's the practical reason to use lossyWAV at all?

Your DAP plays FLACs?
You have more confidence in FLAC (or whatever)'s longevity?
You don't want to use quite so much space as lossless?
You like the potential for fast lossless efficient transcoding to most other lossless formats?
Fun?

Quote
The bitrates should be roughly similar
I bet they're not - I bet lossyWAV would be higher - but maybe we'll see.

Quote
and I imagine there will be some minorly adverse effects for transcoding to another lossy format either way.
Maybe. Again, hopefully we'll see.

You can do pretty much any shaping you want with lossyWAV. You probably can with Musepack too (same potential "issue" - at worst you need to throw lots of bits at it to get exactly what you want) - though with Musepack you're stuck with a filterbank which causes problems (or is inefficient) with some signals (though should be a benefit on average overall - massively so).

Anyway, it was only an exactly. There are other decent psy models out there. What I was trying to say was: no point coding one from scratch. No point coding one at all unless you want to!

Cheers,
David.

lossyWAV 1.3.0 Development Thread

Reply #121
 
What's the practical reason to use lossyWAV at all?

Your DAP plays FLACs?
You have more confidence in FLAC (or whatever)'s longevity?
You don't want to use quite so much space as lossless?
You like the potential for fast lossless efficient transcoding to most other lossless formats?
Fun?

I apologize, I just realized I said "reason" while I actually meant "difference". For instance, from my personal standpoint, both LossyFLAC -q7 to -q10 and Musepack -q7 to -q10 are:

— decoded by my DAP with equal efficiency;

— by definition, lossy;

— supposed to be completely transparent (I understand that exceptions do occur);

— supposed to be relatively good for transcoding as far as lossy material goes.

As you see, there is very little (if any) practical difference for me, or probably anybody else who uses Rockbox on their DAP and/or a decent software player on their PC.

I take it that the reason you suggested taking an existing psy-model for LossyWAV is avoiding duplication of effort. That's understandable. But in order to avoid further duplication of effort, I believe LossyWAV's adaptive noise-shaping/psymodel research priority should be second-generation transparency in processed-to-lossy transcoding, unlike Musepack's first-generation transparency in listening. Since Musepack was never designed for transcoding, I'm sure it's possible to outperform it in that department by making LossyWAV's output friendlier to lossy codecs, and thus increase the transparency of transcoded material beyond that of Musepack.

What do you think?

Infrasonic Quartet + Sennheiser HD650 + Microlab Solo 2 mk3. 

lossyWAV 1.3.0 Development Thread

Reply #122
...in order to avoid further duplication of effort, I believe LossyWAV's adaptive noise-shaping/psymodel research priority should be second-generation transparency in processed-to-lossy transcoding, unlike Musepack's first-generation transparency in listening.
That's an interesting idea. I'm not sure it's possible. I was going to suggest a conservative "non-psy" based approach to be safer with potential/unknown future transcoding. Either way, it's a bit of a vague / unknown / unpredictable target to hit.

I think the psy based approach should at least target first generation transparency first. I don't think even that is going to be simple. Just because you import a psy model doesn't mean it will work! e.g. the format the psy model was original designed for will be restricted in the kind of artefacts it can add, hence the psy model may not be defensive against the kind of artefacts which the original format can't possibly add.

Cheers,
David.

lossyWAV 1.3.0 Development Thread

Reply #123
I disagree, lossywav is just a lossy codec that doesn't have some flaws of classic lossy codecs.
Its advantages are tied to the use of lossless codecs:
- potential broader hardware support & future proof hardware support.
- 100% safe gaplessness (for exemple, despite its audio quality, Nero AAC is not 100% safe regarding gaps with some tag editors. I tested with MP3tag, but it's likely true with even more obscure tag editors)
- the ability to split & join losslessly (no classic lossy codec can do that due to breaking gaps, I tested with vorbis & nero aac, I dunno & don't care for musepack)
- the above make lossywav the only lossy codec usable as CDImage+cue.

This means that transparency & bitrate is not everything in audio. Nero AAC is the best audio codec regarding the transparency point, but personnaly there is no way I would use it for CD encoding due to its non-native gaplessness implementation (I only use it for Video, as I use x264 & don't want to mix norms). The additionnal features of audio codecs are very important to make it handy for the end user.

What this means is that the main advantage of lossywav is not its audio quality, but some added flexibility in some particular uses. It doesn't mean that lowering the transparency point is useless, but it certainly means that comparing lossywav transparency point to classic lossy codecs is pointless. If you give it the right bitrate lossywav does achieve the same & even a better quality than classic lossy codecs by being more robust. (So far Lossywav is unaffected by many classic killer samples, particulary applauds in live performance)

The conclusion is that if your not ready to give up some bitrate for some flexibility, then you should not use lossywav.
If you think Musepack is more handy than lossywav, just use Musepack.

Also I completely disagree with the theorical myth that lossywav would be suited for further transcoding with classic lossy codecs, this might be true but there is no real evidence of this. Furthermore it likely highly rely on which lossywav setting you used prior doing further transcoding. If you used lossywav near its transparency point as the first encode, I personnaly highly doubt that you can transcode it a second time with a classic lossy codec & get an as good result as if it were a first classic lossy encode. All this is very theoric & untested, but what this means to me, is that because nobody really knows how lossywav react to transcoding (except for saying very vague statements like "the higher is the quality of the source the better will be the transcoding"), optimizing it for this use is insane.

Last time I tested lossywav, a transparent encode was minimum ~380Kbps, any classic lossy codec at this bitrate is not particular bad at transcoding too ... so thinking that lossywav would be better for transcoding due to a theoric advantage is a missundertanding of why transcoding is bad: transcoding is crap due to multiple generation loss. If you cannot ABX this generation loss for 1 generation loss then this advantage is ZERO. What is the use of a theoric non-ABXable benefit ? Transcode even more ? You're kidding, with many generation loss, transcoding lossy to lossy is crap, no matter if the source is lossywav. So if lossywav is theorically more suited for transcoding than other classic lossy codecs, then great, but I want lossywav to be a first class lossy codec first, no matter if it lowers its theoric "transcodability".

lossyWAV 1.3.0 Development Thread

Reply #124
LossyWAV is a totally new stuff that always makes me re-think of the definition of "lossy". It can explorer the use of psychaoustic model in a new way far different from traditional lossy codecs. Really excited to see how far it can go.