Help - Search - Members - Calendar
Full Version: lossyWAV Development
Hydrogenaudio Forums > Hydrogenaudio Forum > Uploads
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
Nick.C
QUOTE(halb27 @ Mar 3 2008, 09:01) *
Your new v0.8.0 settings are very attractive to me.
A well-spaced differentiation in quality parameters IMO, and everybody's needs should be satisfied by one of these settings.
Casual listening to v0.8.0 -7 is not revealing any glaring problems, so I'm happy!
2Bdecided
I take back my previous concerns. With Skew (is it fixed internally at 36?) and SNR, it's much harder to make the quality fall off a cliff, at least for samples where the lowest bins are at lower (more audible) frequencies. I'm guessing any "problems" will be for samples where the lowest bins are at higher frequencies (typically less audible).

I'm really impressed with the way all this tuning has come together - well done Nick.C, halb27, and other listeners.

You do realise that you've engineered a kind of crude psychoacoustic model? wink.gif

Cheers,
David.
Nick.C
QUOTE(2Bdecided @ Mar 3 2008, 12:04) *
I take back my previous concerns. With Skew (is it fixed internally at 36?) and SNR, it's much harder to make the quality fall off a cliff, at least for samples where the lowest bins are at lower (more audible) frequencies. I'm guessing any "problems" will be for samples where the lowest bins are at higher frequencies (typically less audible).

I'm really impressed with the way all this tuning has come together - well done Nick.C, halb27, and other listeners.

You do realise that you've engineered a kind of crude psychoacoustic model? wink.gif

Cheers,
David.
Oops - that wasn't what was meant to happen!! ohmy.gif It does seem to work though. Skew is indeed fixed at 36dB.

I think the final element which has allowed the bitrate to be reduced to the level that it has at v0.8.0 -7 is the addition of the variable maximum_bits_to_remove.

Very happy with the results - will move to v0.8.1 RC3 after a couple of days delay for problem reports.
The Sheep of DEATH
QUOTE(2Bdecided @ Mar 3 2008, 12:04) *
I take back my previous concerns. With
It's much harder to make the quality fall off a cliff...
You do realise that you've engineered a kind of crude psychoacoustic model? wink.gif


I wonder how it will sound at 200kbps. Has there been any experimentation on that low of a bitrate? I'm fairly certain it would be inferior to the average mp3, but I starting to get curious as to just how much the bitrate can be lowered...

So here's my suggestion. Why not go all the way to the bottom of the bitrate barrel, and tune your way up? That's what Aoyumi did/does with Vorbis, which gave it (literally) the best lossy quality in the world. Apparently, you can scale up the changes you make in the lower bitrates to the higher ones, and all bitrates would end up with the benefit.

The point is, it's much easier to catch and tune for artifacts at low bitrates. Once tuned, though, the tuning would apply to practically all bitrates, making all quality levels better...see what I'm saying? There's no way I can abx 350kbps, but if you sent me down to 200, I could, and we can "tune things up." wink.gif

[edit]On a different note, on many files, the difference between -7c and -6c is under 7kbps. This doesn't seem like what was intended...
Nick.C
QUOTE(The Sheep of DEATH @ Mar 4 2008, 00:37) *
QUOTE(2Bdecided @ Mar 3 2008, 12:04) *
I take back my previous concerns. With
It's much harder to make the quality fall off a cliff...
You do realise that you've engineered a kind of crude psychoacoustic model? wink.gif
I wonder how it will sound at 200kbps. Has there been any experimentation on that low of a bitrate? I'm fairly certain it would be inferior to the average mp3, but I starting to get curious as to just how much the bitrate can be lowered...

So here's my suggestion. Why not go all the way to the bottom of the bitrate barrel, and tune your way up? That's what Aoyumi did/does with Vorbis, which gave it (literally) the best lossy quality in the world. Apparently, you can scale up the changes you make in the lower bitrates to the higher ones, and all bitrates would end up with the benefit.

The point is, it's much easier to catch and tune for artifacts at low bitrates. Once tuned, though, the tuning would apply to practically all bitrates, making all quality levels better...see what I'm saying? There's no way I can abx 350kbps, but if you sent me down to 200, I could, and we can "tune things up." wink.gif

[edit]On a different note, on many files, the difference between -7c and -6c is under 7kbps. This doesn't seem like what was intended...
I do not really want to try to go that low.... Some of the albums I've processed using v0.8.0 -7 are coming in at about 280kbps - with no glaring artifacts. I think that the main objectives of the development process have been met (or exceeded) and I am content with the current -7.

Overall, as the encoded processed file will carry the file extension of the encoder, I want to make sure that the quality of any processed output will not negatively skew public opinion against the lossless encoder.

I too am interested in "how low can we go?" - so I'll post beta v0.8.1 with a revised -nts maximum value.

On the -7c / -6c bitrate delta, I think that that means that we are approaching a limit imposed by the combination of the parameters used to maintain quality and therefore it is working perfectly. Always remember, lossyWAV is pure VBR.

lossyWAV beta v0.8.1 attached to post #1 in this thread.

From a test using my 53 problem sample set:

CODE
|-----|-----------|-----------|-----------|
| SNR |  NTS=18   |  NTS=21   |  NTS=24   |
|-----|-----------|-----------|-----------|
|   6 | 305.8kbps | 295.2kbps | 287.8kbps |
|   7 | 307.3kbps | 297.1kbps | 289.9kbps |
|   8 | 309.2kbps | 299.2kbps | 292.3kbps |
|   9 | 311.2kbps | 301.6kbps | 294.9kbps |
|  10 | 313.6kbps | 304.2kbps | 297.8kbps |
|  11 | 316.3kbps | 307.3kbps | 301.1kbps |
|  12 | 319.7kbps | 311.1kbps | 305.2kbps |
|  13 | 323.8kbps | 315.6kbps | 310.1kbps |
|  14 | 328.3kbps | 320.6kbps | 315.4kbps |
|  15 | 333.2kbps | 326.0kbps | 321.1kbps |
|-----|-----------|-----------|-----------|
From which, -snr 15 -nts 18 and -snr 14 -nts 21 might be reasonable. I listened to -snr 6 -nts 24 and it was awful and -snr 9 -nts 24 wasn't much better.... I would consider the lower limit for -snr to be 12 and the upper limit for -nts to be 21.
2Bdecided
Don't forget Sheep that lossyWAV can only add spectrally flat noise. If you push it far enough, you'll just end up with something that's a very complex way of delivering a 5-bit LPCM file!

Tuning at a point where you can hear the noise, and then cranking the bitrate up, does have merit. However, it makes more sense when the noise is shaped to match the music. lossyWAV doesn't do that. It still makes some sense, however.

Cheers,
David.
shadowking
I don't think we should not go lower than -7 at this stage. My guess is that -7 and maybe higher setting can be abxed on a quite pasage with the volume cranked right up. Its not normal listening but its something to consider. Quality will collapse below 240 k or somewhere near. With wavpack and dualstream its possibe to get good output @ 235 k esp on louder music .. but audiable hiss / noise with quite passages will be there and not hard to hear on some critical sample. I don't know how lossywav will sound with a quality collapse - it could be spurts of offensive noise rather than just hiss. 280 k can yield mostly transparent results I think, but 235k is pushing it to the limit and lossywav doesn't need a bad rep. There are better solutions at < 250 k .
halb27
QUOTE(shadowking @ Mar 4 2008, 15:26) *

I don't think we should not go lower than -7 at this stage. ...

Exactly what I am thinking. We've reached an average bitrate of ~310 kbps with very good quality, and quality drops more than bitrate when trying to achieve significantly more - at least with the current techniques.

Nick, I think it was me who made you stop from further investigating the noise shaping approach. But that was in another situation. I still wouldn't like a development with a weak basis when it's up to the -3 or -2 quality region, especially as this approach isn't intrinsically safe - other than using -skew and -snr or the RMS oriented max_to_remove_bits which only make the basic approach more defensive. But now things have changed and there's interest in going rather low in bitrate while allowing the utmost quality to be missed a bit. Moreover I think it's safe to say the techniques used so far have matured. In this situation I'd like to encourage you to continue with what you once started in case you are interested.
Nick.C
QUOTE(shadowking @ Mar 4 2008, 13:26) *
I don't think we should not go lower than -7 at this stage. My guess is that -7 and maybe higher setting can be abxed on a quite pasage with the volume cranked right up. Its not normal listening but its something to consider. Quality will collapse below 240 k or somewhere near. With wavpack and dualstream its possibe to get good output @ 235 k esp on louder music .. but audiable hiss / noise with quite passages will be there and not hard to hear on some critical sample. 280 k can yield mostly transparent results I think, but 235k is pushing it to the limit and lossywav doesn't need a bad rep. There are better solutions at < 250 k .
I hear what you are saying - especially about not needing a bad reputation.....

The hiss on quiet passages may already be mitigated by the variable maximum_bits_to_remove which takes into account the RMS value of the codec_block being processed.

Off at a tangent, at the moment there are 3 spreading-function strings for -1, -2 and -3 (-4 to -7 being copies of -3). As the spreading-function string from -3 has done so well for -3 to -7, is there any merit in making all the spreading function strings the same as -3?

If this happened, then I could envisage a modification where quality could be specified between 0 and 1 where 0 = -7 and 1 = -1, using say 3 decimal points resolution, with 0.5 equating to the current -3.

Also, would it be beneficial to shift to 3 FFT analyses for quality presets -1? Possibly if quality<0.5 then FFT Analyses = 2, if quality>=0.5 then FFT Analyses=3.

Using the -3 spreading function, the revised -1 would produce 504.1kbps for my 53 problem sample set using the original 4 FFT analyses (501.3kbps with 3 FFT analyses) and the revised -2 would produce 468.2kbps using 3 FFT analyses.
skamp
The way I see it, there are basically three ranges of bitrates in mainstream music: 64-320 kbps (the upper limit being that of MP3 CBR); 600-1000 kbps (lossless codecs); lossy codecs such as WavPack Hybrid, OptimFROG DualStream and lossyWAV would fill the gap in-between quite nicely, IMO. I don't see much point in competing in two fields where there's already quite a lot of competition.

At 320 kbps, it doesn't take more space than the highest quality MP3's that some people swear by, so if it's transparent and more suitable for transcoding than psycho-acoustic codecs, I'm happy with it.
Nick.C
QUOTE(halb27 @ Mar 4 2008, 13:36) *
QUOTE(shadowking @ Mar 4 2008, 15:26) *
I don't think we should not go lower than -7 at this stage. ...
Exactly what I am thinking. We've reached an average bitrate of ~310 kbps with very good quality, and quality drops more than bitrate when trying to achieve significantly more - at least with the current techniques.

Nick, I think it was me who made you stop from further investigating the noise shaping approach. But that was in another situation. I still wouldn't like a development with a weak basis when it's up to the -3 or -2 quality region, especially as this approach isn't intrinsically safe - other than using -skew and -snr or the RMS oriented max_to_remove_bits which only make the basic approach more defensive. But now things have changed and there's interest in going rather low in bitrate while allowing the utmost quality to be missed a bit. Moreover I think it's safe to say the techniques used so far have matured. In this situation I'd like to encourage you to continue with what you once started in case you are interested.
My noise shaping attempt was in retrospect agricultural to say the least, including a bit of guesswork - it was quite rightly consigned to the recycler. I would really like to be able to understand how noise shaping works and, more importantly, how to implement it in this context - however, I haven't yet found any sources which are understandable to me.

To use noise shaping which relates to the music may be an infringement of the patents David mentioned some time ago however.
GeSomeone
QUOTE(shadowking @ Mar 4 2008, 14:26) *

My guess is that -7 and maybe higher setting can be abxed on a quiet pasage with the volume cranked right up. [..] Quality will collapse below 240 k or somewhere near.
[..] 280 k can yield mostly transparent results I think, but 235k is pushing it to the limit and lossywav doesn't need a bad rep.

Although this could be true, there is a bit of guessing involved. smile.gif
2 points to keep in mind
- (as 2Bdecided keeps telling) the bitrates are not fixed .. so the bit rate result for a loud track can be much different from a not so loud track. ( 280k may be ok for one track while another might need 380k)
- lossyWav does a good job in avoiding problems at quiet passages.

I agree there is no sense in having an awful sounding pre-set at the achieved bit rates. It seems that 0.8.0b hit a fairly good range of workable settings.
halb27
QUOTE(Nick.C @ Mar 4 2008, 15:44) *

... As the spreading-function string from -3 has done so well for -3 to -7, is there any merit in making all the spreading function strings the same as -3? ...

I see sense in having the spreading a little bit more demanding with -2 and especially -1 cause these settings are out for getting a certain security margin. I wouldn't put this only into the -nts value.
IMO there's no need for a change but instead of changing the spreading I'd rather use 3 analyses instead of 4 with -1 and maybe just 2 with -2. This would speed up things, and I don't think these many anakyses are really necessary.

I personally don't like a continuous quality scale but prefer it the way it is. Discrete values make me feel better as the quality details are more transparent.
2Bdecided
QUOTE(Nick.C @ Mar 4 2008, 13:50) *

My noise shaping attempt was in retrospect agricultural to say the least, including a bit of guesswork - it was quite rightly consigned to the recycler. I would really like to be able to understand how noise shaping works and, more importantly, how to implement it in this context - however, I haven't yet found any sources which are understandable to me.
Do you want me to dig out my fixed noise shaping version? I think it worked properly. It was a long time ago!

Cheers,
David.

Nick.C
QUOTE(2Bdecided @ Mar 4 2008, 17:11) *
QUOTE(Nick.C @ Mar 4 2008, 13:50) *
My noise shaping attempt was in retrospect agricultural to say the least, including a bit of guesswork - it was quite rightly consigned to the recycler. I would really like to be able to understand how noise shaping works and, more importantly, how to implement it in this context - however, I haven't yet found any sources which are understandable to me.
Do you want me to dig out my fixed noise shaping version? I think it worked properly. It was a long time ago!

Cheers,
David.
That would be wonderful - I can understand your code smile.gif .
2Bdecided
Nick,

Here it is. Hope it's some use to you. I'm sure SebG could explain noise shaping pretty well.

No claims that this is correct, but it seems to work. It's "optimised" for debugging, not reading or running!

NOTE: This is only provided to demonstrate fixed noise shaping. Don't use it to encode anything - it's a hack of two old versions and the rest of the code probably doesn't work properly.

Note too that I don't think it handles zero bits to remove properly. Without dither, it's easy to get limit cycles in this instance.

You'll have to figure out how much noise shaping "buys" you - obviously it depends on the input signal, which is why I didn't use fixed noise shaping - but it's probably useful if you're aiming for lower bitrates.

Cheers,
David.
Nick.C
QUOTE(2Bdecided @ Mar 5 2008, 16:43) *
Nick,

Here it is. Hope it's some use to you. I'm sure SebG could explain noise shaping pretty well.

No claims that this is correct, but it seems to work. It's "optimised" for debugging, not reading or running!

NOTE: This is only provided to demonstrate fixed noise shaping. Don't use it to encode anything - it's a hack of two old versions and the rest of the code probably doesn't work properly.

Note too that I don't think it handles zero bits to remove properly. Without dither, it's easy to get limit cycles in this instance.

You'll have to figure out how much noise shaping "buys" you - obviously it depends on the input signal, which is why I didn't use fixed noise shaping - but it's probably useful if you're aiming for lower bitrates.

Cheers,
David.
Thanks very much David, I'll try to get my teeth into it tonight....
carpman
Hi all,

Does anyone know if there are issues using FLCDrop with the latest version of LossyWav. The reason I ask is due to all these new settings. In FLCDrop AFAIK there's just the old 1,2,3 and I was wondering if those command switches are still relevant, with -3c and -7a et al ?

Thanks.
C.

jesseg
I'm 2 inches from releasing an updated version of the batch file and the front end. So far, the changelog looks like this, but it's not guaranteed final yet. wink.gif

CODE
lFLCDrop Change Log:
v1.2.0.5
- presets updated to -1 through -7
- all presets create correction files, except custom

lFLC.bat Change Log:
v1.0.0.7
- added automatic functionality for the -merge option
- new variable in custom preset to enable/disable automatic merging
- custom preset defaults match normal -2 preset functionality


I'm just dealing with a possible bug (or screw up on my part) for the automatic -merge function, and then merging that code into the custom preset section, and it should be fully updated and synced with current lossyWAV "goings-on".

re: the automatic merge function, if the FLAC file to decode has custom metadata, will check the decoded WAV file for the lossyWAV "tag". If it's a lossyWAV, then it will see if a .lwcdf.flac exists and decode to .lwcdf.wav, or if no .lwcdf.flac exists, it will check for a .lwcdf.wav, or else exit. And in the first two cases of lossyWAV correction file existing, it will ultimately run the -merge option, and delete the two lossyWAV files. (the .wav files, not the source .flac files)

[edit] yep, already thought of a needed change to the changelog... to add a custom preset variable to toggle the deleting of the pre-merged .lossy.wav files. and i also realized that i'm not handling the encoding of a .lwcdf.wav file to .lwcdf.flac file (if it exists) when encoding an already lossy .wav file. wowzahs. wink.gif now i'm more than 2 inches away. tongue.gif [/edit]
carpman
QUOTE(jesseg @ Mar 5 2008, 19:17) *

now i'm more than 2 inches away


Thanks jesseg for the update. Regardless of how many inches, I shall wait for your new release.

Good luck with it.

C.

Nick.C
QUOTE(jesseg @ Mar 5 2008, 19:17) *
re: the automatic merge function, if the FLAC file to decode has custom metadata, will check the decoded WAV file for the lossyWAV "tag". If it's a lossyWAV, then it will see if a .lwcdf.flac exists and decode to .lwcdf.wav, or if no .lwcdf.flac exists, it will check for a .lwcdf.wav, or else exit. And in the first two cases of lossyWAV correction file existing, it will ultimately run the -merge option, and delete the two lossyWAV files. (the .wav files, not the source .flac files)
Thanks for the PM: -merge function now appears to be working (if both files are in the same place.....).

I've had a play with the method David supplied for noise shaping and early though it is, I'd like to get a second opinion from better ears.

Static noise shaping has been employed and is not optional (at this time - I'll make it optional later, v0.8.1 is still available). I have listened to -7 -nts 30 -snr 12 and it's "acceptable" but I have limited allowable volume (kids in bed) and would like more ears to listen in. For my 53 problem sample set it produces 301.8kbps(!).

lossyWAV beta v0.8.2 attached to post #1 in this thread.
The Sheep of DEATH
Interesting. You might be aware of this, but the first post says 0.83 is actually available already (unannounced! laugh.gif), but no download links other than 0.6.7rc2 are available!

Maybe you're just in the process of updating? cool.gif
Nick.C
QUOTE(The Sheep of DEATH @ Mar 6 2008, 15:02) *
Interesting. You might be aware of this, but the first post says 0.83 is actually available already (unannounced! laugh.gif), but no download links other than 0.6.7rc2 are available!

Maybe you're just in the process of updating? cool.gif
blush.gif Oops - what a mistake.

lossyWAV beta v0.8.3 attached to post #1 in this thread.
The Sheep of DEATH
I've noticed that turning on shaping (originally just -7c), the resulting flac is actually 1.5% larger. Is this intentional? Or are tweaked snr and nts options a must with shaping?

I tried -7 -nts 30 -snr 12 -shaping, but quality was very scratchy (read: added noise) on the piano sample I tested with. In terms of artifacts, -snr 12 -nts 21 without shaping actually produced the better result on this sample, at roughly the same bitrate.

Maybe I got a b0rked build? I guess I can upload the sample a bit later. Cheers!
SebastianG
Hi, 2B!

I just skimmed trough LossyFLAC.m and noticed that there's a misunderstanding regarding filter coefficients. The filter coefficients from "the book" are b=[2.033 -2.165 1.959 -1.590 0.6149]; which corresponds to H(z)=2.033-2.165*z^-1...+0.6149*z^-4. But this isn't actually the noise shaping filter in this case. 1-z^-1*H(z) is. It's common and popular to write the transfer function of noise shaping filters as 1-z^-1*H(z). So, in case you have the filter coefficients for H(z) and want to plot the frequency response of the actual noise shaping filter you need to use freqz([1 -b]) for the FIR cases. Since you're removing the leading coefficient and inverting signs you just need to skip this part for the "book filter".

You'll see that the response of the filter isn't that bad after all. Its deviation from the one I was suggesting is within +/-5 dB at nearly all frequencies.

Just to confuse you a bit more I'm rewriting the transfer function's expression of the filter I was suggesting:
CODE

1 -1.1474 z^-1 +0.5383 z^-2 -0.3520 z^-3 +0.3475 z^-4
-----------------------------------------------------  =
1 +1.0587 z^-1 +0.0676 z^-2 -0.6054 z^-3 -0.2738 z^-4


            2.2061 -0.4707 z^-1 -0.2534 z^-2 -0.6213 z^-3
1 - z^-1 -----------------------------------------------------
         1 +1.0587 z^-1 +0.0676 z^-2 -0.6054 z^-3 -0.2738 z^-4

The new numerator is simply a-b with the leading zero removed (polynomial division + factoring out -z^-1). This form has its advantages when it comes to implementig noise shaping. The following image is a "DSP circuit picture" explaining how noise shaping can be done:
IPB Image


Still, I think the use of fixed shaping for this purpose is very limited. You could do much better with some easy signal adaptive filters like H(z)/A(z) where H(z) is some fixed filter and 1/A(z) is the LPC synthesis filter for the current frame or something like that.


Cheers,
SG
2Bdecided
QUOTE(SebastianG @ Mar 6 2008, 18:23) *

Hi, 2B!

I just skimmed trough LossyFLAC.m and noticed that there's a misunderstanding regarding filter coefficients. The filter coefficients from "the book" are b=[2.033 -2.165 1.959 -1.590 0.6149]; which corresponds to H(z)=2.033-2.165*z^-1...+0.6149*z^-4. But this isn't actually the noise shaping filter in this case. 1-z^-1*H(z) is. It's common and popular to write the transfer function of noise shaping filters as 1-z^-1*H(z). So, in case you have the filter coefficients for H(z) and want to plot the frequency response of the actual noise shaping filter you need to use freqz([1 -b]) for the FIR cases. Since you're removing the leading coefficient and inverting signs you just need to skip this part for the "book filter".
Thank you. I didn't know that's what was being quoted. It's amazing it works as well as it does!

I don't need that step for any of the filters then (just drop the leading ones before typing them in!). As I said, the code is hacked from another version, which does need that process.

QUOTE
Just to confuse you a bit more I'm rewriting the transfer function's expression of the filter I was suggesting:
I'm laid up in bed with a cold. I'm not even going to try to follow this now!

QUOTE
The following image is a "DSP circuit picture" explaining how noise shaping can be done:
IPB Image
I think what I've done might be equivalent. In no fit state to know for sure!

QUOTE
Still, I think the use of fixed shaping for this purpose is very limited. You could do much better with some easy signal adaptive filters like H(z)/A(z) where H(z) is some fixed filter and 1/A(z) is the LPC synthesis filter for the current frame or something like that.
Yes, I know, I've done it, it's patented by Sony. We've had this conversation, haven't we? wink.gif

Cheers,
David.
SebastianG
QUOTE(2Bdecided @ Mar 6 2008, 19:55) *

QUOTE
Just to confuse you a bit more I'm rewriting the transfer function's expression of the filter I was suggesting:
I'm laid up in bed with a cold. I'm not even going to try to follow this now!
QUOTE
The following image is a "DSP circuit picture" explaining how noise shaping can be done:
IPB Image
I think what I've done might be equivalent. In no fit state to know for sure!

I havn't checked how noise shaping is implemented in lossyFLAC.m. So, if you say your implementation is equivalent to what's shown in the picture and you are requireing the coefficients for H(z) then you need to use the numerator's and denominator's coefficients of the rewritten transfer function because removing the leading one doesn't do it for IIR filters...

QUOTE(2Bdecided @ Mar 6 2008, 19:55) *

Yes, I know, I've done it, it's patented by Sony. We've had this conversation, haven't we? wink.gif

Yeah, that rings a bell. But I can't remember agreeing on whether the patent really applies or not.

Cheers,
SG
Nick.C
QUOTE(SebastianG @ Mar 6 2008, 18:23) *
Hi, 2B!

I just skimmed trough LossyFLAC.m and noticed that there's a misunderstanding regarding filter coefficients. The filter coefficients from "the book" are b=[2.033 -2.165 1.959 -1.590 0.6149]; which corresponds to H(z)=2.033-2.165*z^-1...+0.6149*z^-4. But this isn't actually the noise shaping filter in this case. 1-z^-1*H(z) is. It's common and popular to write the transfer function of noise shaping filters as 1-z^-1*H(z). So, in case you have the filter coefficients for H(z) and want to plot the frequency response of the actual noise shaping filter you need to use freqz([1 -b]) for the FIR cases. Since you're removing the leading coefficient and inverting signs you just need to skip this part for the "book filter".

You'll see that the response of the filter isn't that bad after all. Its deviation from the one I was suggesting is within +/-5 dB at nearly all frequencies.

Just to confuse you a bit more I'm rewriting the transfer function's expression of the filter I was suggesting:
CODE
1 -1.1474 z^-1 +0.5383 z^-2 -0.3520 z^-3 +0.3475 z^-4
-----------------------------------------------------  =
1 +1.0587 z^-1 +0.0676 z^-2 -0.6054 z^-3 -0.2738 z^-4


            2.2061 -0.4707 z^-1 -0.2534 z^-2 -0.6213 z^-3
1 - z^-1 -----------------------------------------------------
         1 +1.0587 z^-1 +0.0676 z^-2 -0.6054 z^-3 -0.2738 z^-4
The new numerator is simply a-b with the leading zero removed (polynomial division + factoring out -z^-1). This form has its advantages when it comes to implementig noise shaping. The following image is a "DSP circuit picture" explaining how noise shaping can be done: IPB Image

Still, I think the use of fixed shaping for this purpose is very limited. You could do much better with some easy signal adaptive filters like H(z)/A(z) where H(z) is some fixed filter and 1/A(z) is the LPC synthesis filter for the current frame or something like that.


Cheers,
SG
Okay, I admit to being a bit baffled at the moment....

I looked up noise shaping in Wikipedia and found
CODE
y(n) = x(n)+A.E(x(n-1))+B.E(x(n-2))+C.E(x(n-3))+D.E(x(n-4))+
E.E(x(n-5))+F.E(x(n-6))+G.E(x(n-7))+H.E(x(n-8))+I.E(x(n-9))
I also found some code which was using a a filter with 9 coefficients so I implemented the noise shaping in lossyWAV like that, i.e. output = input - coeff[0..8] x quantization_error[0..-8], where quantization_error = output - input.

Initially, this kept crashing until I divided all coefficients by coeff[0] and then disregarded coeff[0] (per David's code).

Looking again at the Wikipedia article, it appears that I have omitted to include dither in the calculation.

I still feel as if I'm groping in the dark here and would gratefully accept any advice, pointers, etc.
botface
Hi,
I'm completely new to the forum - this is my first post so please excuse me if I'm posting in the wrong place or whatever

I'm fascinated and quite excited by lossywav and have done some testing with v0.6.7_RC2. I have not found any obvious problems using 16/44 input files but I can't get it to run with 24 bit files. I've read the item that says that most testing has been done with 16/44 but the wiki says that it can handle up to 32/48 and I'm very keen to use it on 24 bit files as all of the lossless codecs give relatively poor results in terms of file size with 24 bit files.

I have tried files generated by adobe audition in "24 bit packed int (type 1 - 24 bit)". This causes Lossywav to error instantly with the message "FMT chunk wrong size"

I have also tried files in "24 bit packed int (type 1 - 20 bit". Lossywav manges to openthe file, recognises the format as 48.00khz; 2ch.; 20 bit but it then fails with a Windows error message "lossyWAV.exe has encountered a problem and needs to close. We are sorry for the inconvenience."

As I said, I'm not sure if I'm posting in the right place but I would like very much to help with testing if you think it might be useful. I should point out though that I am not very technical, I'm just a music lover. In fact this is the first time I've run anything using cmd - I've always relied on GUI front ends

Botface
Nick.C
QUOTE(botface @ Mar 6 2008, 21:43) *
Hi,
I'm completely new to the forum - this is my first post so please excuse me if I'm posting in the wrong place or whatever

I'm fascinated and quite excited by lossywav and have done some testing with v0.6.7_RC2. I have not found any obvious problems using 16/44 input files but I can't get it to run with 24 bit files. I've read the item that says that most testing has been done with 16/44 but the wiki says that it can handle up to 32/48 and I'm very keen to use it on 24 bit files as all of the lossless codecs give relatively poor results in terms of file size with 24 bit files.

I have tried files generated by adobe audition in "24 bit packed int (type 1 - 24 bit)". This causes Lossywav to error instantly with the message "FMT chunk wrong size"

I have also tried files in "24 bit packed int (type 1 - 20 bit". Lossywav manges to openthe file, recognises the format as 48.00khz; 2ch.; 20 bit but it then fails with a Windows error message "lossyWAV.exe has encountered a problem and needs to close. We are sorry for the inconvenience."

As I said, I'm not sure if I'm posting in the right place but I would like very much to help with testing if you think it might be useful. I should point out though that I am not very technical, I'm just a music lover. In fact this is the first time I've run anything using cmd - I've always relied on GUI front ends

Botface
Hi there,

lossyWAV will only work with PCM integer values (4 to 32 bit as in wiki article, *not* 32bit float). These are packed out to the nearest byte and stored. I am unsure what type of audio data your values apply to. [edit] If the FMT chunk is the wrong size (i.e. not integer values) then lossyWAV will exit. [/edit]

Sorry not to be much help.

Nick.

[edit] ps. Please could you post a sample (<=30 seconds in length) for me to test with? It would be very much appreciated. [/edit]
SebastianG
QUOTE(Nick.C @ Mar 6 2008, 21:27) *

Okay, I admit to being a bit baffled at the moment....

It's probably the z-domain thingy. It takes a while to wrap one's head around it.

QUOTE(Nick.C @ Mar 6 2008, 21:27) *

I looked up noise shaping in Wikipedia and found
CODE
y(n) = x(n)+A.E(x(n-1))+B.E(x(n-2))+C.E(x(n-3))+D.E(x(n-4))+
E.E(x(n-5))+F.E(x(n-6))+G.E(x(n-7))+H.E(x(n-8))+I.E(x(n-9))
I also found some code which was using a a filter with 9 coefficients so I implemented the noise shaping in lossyWAV like that, i.e. output = input - coeff[0..8] x quantization_error[0..-8], where quantization_error = output - input.

The 1st problem with this wikipedia article is that it's not really obvious what E is. Is it the unfiltered or the filtered noise? Btw, output-input isn't the the quantization error. It's the already-filtered error. So, in your case you'll get an all-pole filter which is a totally different beast than a FIR filter where the actual quantization errors are used. The difference is subtle: Note, that in the picture I made I pick up the signal after the feedback and right before dither and quantization noise is added to compute the "quantization error" (unfiltered noise).
The 2nd problem with this wikipedia article is that it doesn't say anything about FIR or IIR filters and whether and/or how they can be used and what type of filter is actually described there.

That said: Regardless of what E is, their noise shaping formula is equivalent to the structure I drew where H(z) either corresponds to an all-pole-IIR or a FIR filter.

QUOTE(Nick.C @ Mar 6 2008, 21:27) *

Initially, this kept crashing until I divided all coefficients by coeff[0] and then disregarded coeff[0] (per David's code).

By "crashing" I guess you mean the filter went unstable. You probably used the coefficients in a wrong way. It might be a sign problem (sign of E is wrong) or you got the wrong E (filtered noise instead of unfiltered noise).

QUOTE(Nick.C @ Mar 6 2008, 21:27) *

Looking again at the Wikipedia article, it appears that I have omitted to include dither in the calculation.
I still feel as if I'm groping in the dark here and would gratefully accept any advice, pointers, etc.

If you omit dither you can't guarantee the quantization error to be white/uncorrelated. The noise shaping stuff still works but you may get unexpected results because the filter is supposed to be applied on white/uncorrelated noise. So, that's why at least rectangular dithering should be used.

More explanations and pseudo code following...
CODE

You might have missed some informations regarding the picture I drew
X      : input signal
Y      : output signal ( = input + filtered noise )
E      : dither & quantization noise (unfiltered white noise please)
+      : is obviously mixing two signals. Note it can also be used
         for subtraction (source line(s) marked with a minus)
         Also, quantization is modelled as mixing the signal with
         errors.
[z^-1] : This is a simple filter: A delay of one sample
[H(z)] : This is any filter you like to use

So, suppose you have some given filter coefficients for H(z):
b[] = {b[0],b[1],b[2],...,b[n]}; // array, indexed starting at 0
a[] = {  1 ,a[1],a[2],...,a[m]}; // array, we don't need a[0]
The index actually corresponds to the power of 1/z for the z-domain
interpretation, 'b' holds the numerator coefficients and 'a' holds
the denominator coefficients.

x[k] and y[k] are the input and output samples.

We also need some filter memory with exactly max(n+1,m) samples. Let's
write fifo[0] for the last sample we added to the fifo, fifo[1] was
the last sample in the previous loop and so on...

Then, the inner loop over 'k' would look like this:
{
   wanted_temp = x[k] + fifo[0] * b[0]
                      + fifo[1] * b[1]
                      + ..............
                      + fifo[n] * b[n];
   y[k] = quantize( wanted_temp + dither );
   qerror_temp = wanted_temp - y[k];
   new_fifo_sample = qerror_temp - fifo[0] * a[1]
                                 - fifo[1] * a[2]
                                 - ..............
                                 - fifo[m-1] * a[m];
   fifo_add( fifo, new_fifo_sample );
   // Now: fifo[0] == new_fifo_sample
}

For implementing H(z) I used the direct form II structure where the delay-line is shared among the recursive and non-recursive filter parts.

The 4th order filter I was suggesting for 24->16 bit word length reduction @ 44 kHz sampling frequency leads to the following coefficients for H(z):
b[0..3] = { 2.2061 , -0.4707 , -0.2534 , -0.6213 };
a[1..4] = { 1.0587 , 0.0676 , -0.6054 , -0.2738 };
Again: H(z) is NOT the transfer function of the noise shaper, it is G(z) = 1 - z^-1 * H(z).

Note: This post comes with no warrenty and might contain errors. smile.gif

Cheers,
SG
Nick.C
QUOTE(SebastianG @ Mar 6 2008, 23:29) *
QUOTE(Nick.C @ Mar 6 2008, 21:27) *
Okay, I admit to being a bit baffled at the moment....
It's probably the z-domain thingy. It takes a while to wrap one's head around it.
QUOTE(Nick.C @ Mar 6 2008, 21:27) *
I looked up noise shaping in Wikipedia and found
CODE
y(n) = x(n)+A.E(x(n-1))+B.E(x(n-2))+C.E(x(n-3))+D.E(x(n-4))+
E.E(x(n-5))+F.E(x(n-6))+G.E(x(n-7))+H.E(x(n-8))+I.E(x(n-9))
I also found some code which was using a a filter with 9 coefficients so I implemented the noise shaping in lossyWAV like that, i.e. output = input - coeff[0..8] x quantization_error[0..-8], where quantization_error = output - input.

The 1st problem with this wikipedia article is that it's not really obvious what E is. Is it the unfiltered or the filtered noise? Btw, output-input isn't the the quantization error. It's the already-filtered error. So, in your case you'll get an all-pole filter which is a totally different beast than a FIR filter where the actual quantization errors are used. The difference is subtle: Note, that in the picture I made I pick up the signal after the feedback and right before dither and quantization noise is added to compute the "quantization error" (unfiltered noise).
The 2nd problem with this wikipedia article is that it doesn't say anything about FIR or IIR filters and whether and/or how they can be used and what type of filter is actually described there.

That said: Regardless of what E is, their noise shaping formula is equivalent to the structure I drew where H(z) either corresponds to an all-pole-IIR or a FIR filter.

QUOTE(Nick.C @ Mar 6 2008, 21:27) *

Initially, this kept crashing until I divided all coefficients by coeff[0] and then disregarded coeff[0] (per David's code).
By "crashing" I guess you mean the filter went unstable. You probably used the coefficients in a wrong way. It might be a sign problem (sign of E is wrong) or you got the wrong E (filtered noise instead of unfiltered noise).
QUOTE(Nick.C @ Mar 6 2008, 21:27) *
Looking again at the Wikipedia article, it appears that I have omitted to include dither in the calculation.
I still feel as if I'm groping in the dark here and would gratefully accept any advice, pointers, etc.
If you omit dither you can't guarantee the quantization error to be white/uncorrelated. The noise shaping stuff still works but you may get unexpected results because the filter is supposed to be applied on white/uncorrelated noise. So, that's why at least rectangular dithering should be used.

More explanations and pseudo code following...
CODE

You might have missed some informations regarding the picture I drew
X      : input signal
Y      : output signal ( = input + filtered noise )
E      : dither & quantization noise (unfiltered white noise please)
+      : is obviously mixing two signals. Note it can also be used
         for subtraction (source line(s) marked with a minus)
         Also, quantization is modelled as mixing the signal with
         errors.
[z^-1] : This is a simple filter: A delay of one sample
[H(z)] : This is any filter you like to use

So, suppose you have some given filter coefficients for H(z):
b[] = {b[0],b[1],b[2],...,b[n]}; // array, indexed starting at 0
a[] = {  1 ,a[1],a[2],...,a[m]}; // array, we don't need a[0]
The index actually corresponds to the power of 1/z for the z-domain
interpretation, 'b' holds the numerator coefficients and 'a' holds
the denominator coefficients.

x[k] and y[k] are the input and output samples.

We also need some filter memory with exactly max(n+1,m) samples. Let's
write fifo[0] for the last sample we added to the fifo, fifo[1] was
the last sample in the previous loop and so on...

Then, the inner loop over 'k' would look like this:
{
   wanted_temp = x[k] + fifo[0] * b[0]
                      + fifo[1] * b[1]
                      + ..............
                      + fifo[n] * b[n];
   y[k] = quantize( wanted_temp + dither );
   qerror_temp = wanted_temp - y[k];
   new_fifo_sample = qerror_temp - fifo[0] * a[1]
                                 - fifo[1] * a[2]
                                 - ..............
                                 - fifo[m-1] * a[m];
   fifo_add( fifo, new_fifo_sample );
   // Now: fifo[0] == new_fifo_sample
}
For implementing H(z) I used direct form II for arbitrary IIR filters.

The 4th order filter I was suggesting for 24->16 bit word length reduction @ 44 kHz sampling frequency leads to the following coefficients for H(z):
b[0..3] = { 2.2061 , -0.4707 , -0.2534 , -0.6213 };
a[1..4] = { 1.0587 , 0.0676 , -0.6054 , -0.2738 };
Again: H(z) is NOT the transfer function of the noise shaper, it is G(z) = 1 - z^-1 * H(z).

Note: This post comes with no warrenty and might contain errors. smile.gif

Cheers,
SG
Huge thanks, Sebastian - It will take me some time to get my head round it but I will endeavour to implement it when I get back from a few days away....
botface
QUOTE(Nick.C @ Mar 6 2008, 22:03) *

QUOTE(botface @ Mar 6 2008, 21:43) *
Hi,
I'm completely new to the forum - this is my first post so please excuse me if I'm posting in the wrong place or whatever

I'm fascinated and quite excited by lossywav and have done some testing with v0.6.7_RC2. I have not found any obvious problems using 16/44 input files but I can't get it to run with 24 bit files. I've read the item that says that most testing has been done with 16/44 but the wiki says that it can handle up to 32/48 and I'm very keen to use it on 24 bit files as all of the lossless codecs give relatively poor results in terms of file size with 24 bit files.

I have tried files generated by adobe audition in "24 bit packed int (type 1 - 24 bit)". This causes Lossywav to error instantly with the message "FMT chunk wrong size"

I have also tried files in "24 bit packed int (type 1 - 20 bit". Lossywav manges to openthe file, recognises the format as 48.00khz; 2ch.; 20 bit but it then fails with a Windows error message "lossyWAV.exe has encountered a problem and needs to close. We are sorry for the inconvenience."

As I said, I'm not sure if I'm posting in the right place but I would like very much to help with testing if you think it might be useful. I should point out though that I am not very technical, I'm just a music lover. In fact this is the first time I've run anything using cmd - I've always relied on GUI front ends

Botface
Hi there,

lossyWAV will only work with PCM integer values (4 to 32 bit as in wiki article, *not* 32bit float). These are packed out to the nearest byte and stored. I am unsure what type of audio data your values apply to. [edit] If the FMT chunk is the wrong size (i.e. not integer values) then lossyWAV will exit. [/edit]

Sorry not to be much help.

Nick.

[edit] ps. Please could you post a sample (<=30 seconds in length) for me to test with? It would be very much appreciated. [/edit]

Nick,
I've tried to send you a test file a couple of times but my posts just don't seem to be there. I'm assuming the file was too large as the attach procedure took ages. So, here's another, smaller file. It was recorded from vinyl at 32/48 and saved as "24 bit packed int (type 1 - 24 bit)".

Let me know if you need anything else

botface
jesseg
CODE
------------------------------------------------------------------------------
lFLCDrop v1.2.0.5
lFLC.bat for lFLCDrop v1.0.0.7
------------------------------------------------------------------------------

lFLCDrop Change Log:
v1.2.0.5
- presets updated to -1 through -7
- all presets always create correction files, except custom
- "Delete Source Files" option removed

lFLC.bat Change Log:
v1.0.0.7
- added a new set of variables for decoding
- added automatic functionality for the -merge option
- added support for auto-merging legacy lossyWAVs with proper naming convention
- added automatic encoding of .lwcdf.wav while encoding an already lossy .wav
- custom preset defaults now match -2 (default) frontend preset functionality


Let me know if you encounter any bugs. The batch file is just getting to the level of complexity (10.4KB!) where there may be combinations of logic in the code that I just haven't thought to test fully. But it should all be working without bugs, and there's error checking built into everything, so the main thing is that the logic would end up doing something that doesn't seem like it's what should happen.

After the command-line options (if any) for noise-shaping settle down, I'll do a release to support those additions, and I'll include a documentation on what command-line options to send to lFLC.bat for encoding & decoding from other software or batch files. That way people can implement things like tagging through batch files in EAC, and call lFLC.bat for the dirty work of encoding, and then tag afterwards. Feel free to use the methods in lFLC.bat for creating your own. emot-science.gif


Enjoy cool.gif
carpman
@jesseg --- Thanks for the update!

Just been running:

lFLCDrop v1.2.0.5
lFLC.bat for lFLCDrop v1.0.0.7

First thing I noticed was that after doing the correction file encoding a single wav the DOS Window prompts "press any key to continue", is that supposed to happen - as I'd prefer it just encoded the whole batch without punctuation.

Also, is there any way you could create an option whereby the user specifies a directory (browse/create directory) for the correction files.

e.g.

Source Folder/ [inputs] *.wav , [outputs] *.lossy.flac
Source Folder/Corrections Files/ [outputs] *.lwcdf.flac

C.
jesseg
I re-zipped the directory and re-uploaded it. Somehow I had removed that pause at the last second, but forgot to re-zip it before uploading. My bad, thanks for catching it. smile.gif It was in the exit, so it would have happened no matter what you were trying to do. Oops. laugh.gif

[edit]
And re: a sub-folder of current folder option, could be added, but it would only be controllable through lFLC.bat, not through the frontend - unless I make my own frontend. And if I do or anyone else does, I can imagine that it's not going to rely on a batch file at all.
[/edit]
halb27
QUOTE(botface @ Mar 8 2008, 18:17) *

... So, here's another, smaller file. ... [Budapest_10_secs.wav]

I have no problem at all with your file.
First I renamed your file to a.wav, called 'lossywav a.wav' from cmd.exe, and got a wonderful 24 bit 48 kHz a.lossy.wav file.
Then I used my standard lossyFLAC bat file with foobar on your Budapest_10_secs.wav, and this too yielded a perfect lossy.flac result.
Did you try plain 'lossywav a.wav'?
botface
QUOTE(halb27 @ Mar 9 2008, 20:04) *

QUOTE(botface @ Mar 8 2008, 18:17) *

... So, here's another, smaller file. ... [Budapest_10_secs.wav]

I have no problem at all with your file.
First I renamed your file to a.wav, called 'lossywav a.wav' from cmd.exe, and got a wonderful 24 bit 48 kHz a.lossy.wav file.
Then I used my standard lossyFLAC bat file with foobar on your Budapest_10_secs.wav, and this too yielded a perfect lossy.flac result.
Did you try plain 'lossywav a.wav'?

Funnily enough I am now able to process the file without problems either. I also have no problems with the latest beta. I've also successfully procesed a 24/88.2 file. I can't imagine what went wrong the first time.

Thanks for trying it anyway
2Bdecided
QUOTE(SebastianG @ Mar 6 2008, 19:21) *
I havn't checked how noise shaping is implemented in lossyFLAC.m. So, if you say your implementation is equivalent to what's shown in the picture and you are requireing the coefficients for H(z) then you need to use the numerator's and denominator's coefficients of the rewritten transfer function because removing the leading one doesn't do it for IIR filters...
I think I cheated - can you take a look and tell me if it works or not?

Also, I think several of us would really appreciate it if you could spend some time writing a good page on noise shaping for the HA wiki. (If you don't have the time, just ignore our questions on here!).

I couldn't find a single decent reference to IIR filters in noise shaping, hence my guess at how to do it.

The other problem is that the explanations that exist are often written for mathematicians. I suppose engineers and programmers should be able to understand such explanations, but I usually find them lacking. On the one hand, I want to understand at a high level what's happening, at on the other hand I want to understand bit-by-bit what's happening. Many explanations walk a fine line down the middle leaving both of these unclear to me.

Cheers,
David.
P.S. It wasn't a cold - it was/is a chest infection. Still laid up. sick.gif
SebastianG
QUOTE(2Bdecided @ Mar 11 2008, 11:40) *

Also, I think several of us would really appreciate it if you could spend some time writing a good page on noise shaping for the HA wiki.

I'm on it.

edit: I finished the article. Still waiting for wiki write acces, though.

QUOTE(2Bdecided @ Mar 11 2008, 11:40) *

P.S. It wasn't a cold - it was/is a chest infection. Still laid up. sick.gif

Ouch! Hope you get well soon!.

Cheers,
SG
Nick.C
Right, I'm back from a few days away.....

I've tried to implement the method SebastianG so kindly posted and I'm getting unexpected (even more hiss) results when I use it. I'm posting an excel fragment which will show how I've implemented it, for comment / criticism.....

As well as that, I've been re-thinking the spreading function and have realised that the current method takes into account certain values more often than it should because of the relationship between bin-width and the frequency bands used (some bands have the same start bin at short FFT lengths due to bin-width). So, I'm in the process of rewriting the spread function and will include it as a -newspread parameter to allow back to back comparison.

I'm going to try to puzzle my way through the input / output directory problem as well.

[edit] Having transcribed the function to excel, I seem to have identified and corrected an error in my implementation of the noise shaping function. I'm listening to -7 -nts 36 -snr 0 -shaping at the moment and it's not bad at all (for DAP purposes)........ [/edit]

[edit2] David, I take it that I should re-calculate the reference_threshold values with noise shaping activated to get the full benefit? [/edit2]
Nick.C
lossyWAV beta v0.8.4 attached to the first post in this thread.
Table of processed bitrates, for my 53 problem sample set, using lossyWAV v0.8.4 with and without -shaping & -newspread.
CODE
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
| -shaping | -newspread |    -1     |    -2     |    -3     |    -4     |    -5     |    -6     |    -7     |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|    n     |     n      | 543.5kbps | 494.6kbps | 433.9kbps | 408.2kbps | 385.6kbps | 365.4kbps | 348.1kbps |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|    y     |     n      | 560.1kbps | 518.3kbps | 466.8kbps | 445.8kbps | 427.5kbps | 411.9kbps | 399.2kbps |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|    n     |     y      | 568.0kbps | 533.9kbps | 462.9kbps | 442.6kbps | 400.9kbps | 383.8kbps | 352.7kbps |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|    y     |     y      | 581.4kbps | 552.0kbps | 491.4kbps | 475.0kbps | 441.7kbps | 428.4kbps | 403.8kbps |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
SebastianG
QUOTE(Nick.C @ Mar 14 2008, 10:09) *

I've tried to implement the method SebastianG so kindly posted and I'm getting unexpected (even more hiss) results when I use it.
[...]
[edit] Having transcribed the function to excel, I seem to have identified and corrected an error in my implementation of the noise shaping function. I'm listening to -7 -nts 36 -snr 0 -shaping at the moment and it's not bad at all (for DAP purposes)........ [/edit]

Still, the noticable hiss could be explained. The fletcher munson equal loudness curves have different shapes at different levels. The ATH-derived noise shaping filter is only a special case for low noise levels. So, at higher noise levels the noise shaping filter might expose the high frequency part of the noise noticably which is why I think the use of this kind of fixed filter for lossyWAV is rather limited.

Cheers!
SG
Nick.C
QUOTE(SebastianG @ Mar 14 2008, 15:29) *
QUOTE(Nick.C @ Mar 14 2008, 10:09) *
I've tried to implement the method SebastianG so kindly posted and I'm getting unexpected (even more hiss) results when I use it.
[...]
[edit] Having transcribed the function to excel, I seem to have identified and corrected an error in my implementation of the noise shaping function. I'm listening to -7 -nts 36 -snr 0 -shaping at the moment and it's not bad at all (for DAP purposes)........ [/edit]
Still, the noticable hiss could be explained. The fletcher munson equal loudness curves have different shapes at different levels. The ATH-derived noise shaping filter is only a special case for low noise levels. So, at higher noise levels the noise shaping filter might expose the high frequency part of the noise noticably which is why I think the use of this kind of fixed filter for lossyWAV is rather limited.

Cheers!
SG
The hiss I experienced in the previous build was *very* pronounced, in beta v0.8.4 I can't hear anything wrong with the output at all when using -shaping.

The only minor disappointment when using -shaping is, as David said previously, the bitrate increases quite dramatically.

I transcoded my Mike Oldfield collection (261 tracks) this evening using -7a -nts 30 -snr 6 -shaping and got an average bitrate of 340kbps. I've listened to several of the tracks and am very pleased with the results.
Nick.C
Thinking about the added bitrate due to noise shaping, are there some 2 or 3 coefficient filters which might be useful as a compromise between the quality and high bitrate of SebastianG's filters and no noise shaping but lower bitrate?

[edit] I'm not going to go any further with -7a -nts 30 -snr 6 -shaping and have reverted to -7a -shaping. I've converted 1643 tracks and the average bitrate is 374kbps.

There may be some merit in revising the skewing value when noise shaping is enabled (or even when -newspread is enabled) - however, this would take a bit of work from those who have ABXed during settings testing (and would require the -spf parameter to make a re-appearance). [/edit]
SebastianG
QUOTE(Nick.C @ Mar 15 2008, 18:22) *

Thinking about the added bitrate due to noise shaping, are there some 2 or 3 coefficient filters which might be useful as a compromise between the quality and high bitrate of SebastianG's filters and no noise shaping but lower bitrate?


There's a simple way of softening minimum phase filters:
Take the coefficients from the noise transfer function (N)
[1 b1 b2 b3 b4 ... ] (numerator)
[1 a1 a2 a3 a4 ... ] (denominator)
and create a new set of coefficients like this:
[1 b1*s b2*s^2 b3*s^3 b4*s^4 ... ] (numerator)
[1 a1*s a2*s^2 a3*s^3 a4*s^4 ... ] (denominator)
where s=1 leads to the original filter and s=0 to N(z)=1 which is no noise shaping at all.
All values inbetween are also fine.

However, you should seriously think about adaptive filters at this stage. Maybe 2B can shed some more light on the alleged danger of patent infringement. I hardly think this is an issue. Adaptive spectral noise shaping isn't big news. Pretty much every speech codec does it including Speex, by the way.

You're already very close to it: You're doing spectral analysis, psychoacoustic modelling and have a working noise shaper implementation. The only thing that's missing now is code to compute the filters. Jean-Marc Valin (jmspeex) and Monty wrote a paper about how Speex can benefit from Vorbis' psychoacoustic model. The same thing applies to LossyWAV. I don't remember how Monty and Jean-Marc did it but I guess it's somthing like computing the autocorrelation of the optimal noise shaping filter's impulse response via iFFT and feeding the result to the Levinson-Durbin algorithm which would give you all the denominator's coefficients (a1, a2, ...) for an all-pole noise shaping filter (b1=b2=...=0).

Cheers!
SG
Nick.C
QUOTE(SebastianG @ Mar 15 2008, 21:10) *
QUOTE(Nick.C @ Mar 15 2008, 18:22) *
Thinking about the added bitrate due to noise shaping, are there some 2 or 3 coefficient filters which might be useful as a compromise between the quality and high bitrate of SebastianG's filters and no noise shaping but lower bitrate?
There's a simple way of softening minimum phase filters:
Take the coefficients from the noise transfer function (N)
[1 b1 b2 b3 b4 ... ] (numerator)
[1 a1 a2 a3 a4 ... ] (denominator)
and create a new set of coefficients like this:
[1 b1*s b2*s^2 b3*s^3 b4*s^4 ... ] (numerator)
[1 a1*s a2*s^2 a3*s^3 a4*s^4 ... ] (denominator)
where s=1 leads to the original filter and s=0 to N(z)=1 which is no noise shaping at all.
All values inbetween are also fine.

However, you should seriously think about adaptive filters at this stage. Maybe 2B can shed some more light on the alleged danger of patent infridgement. I hardly think this is an issue. Adaptive spectral noise shaping isn't big news. Pretty much every speech codec does it including Speex, by the way.

You're already very close to it: You're doing spectral analysis, psychoacoustic modelling and have a working noise shaper implementation. The only thing that's missing now is code to compute the filters. Jean-Marc Valin (jmspeex) and Monty wrote a paper about how Speex can benefit from Vorbis' psychoacoustic model. The same thing applies to LossyWAV. I don't remember how Monty and Jean-Marc did it but I guess it's somthing like computing the autocorrelation of the optimal noise shaping filter's impulse response via iFFT and feed the Levinson-Durbin algorithm with it which would give you all the denominator's coefficients (a1, a2, ...) for an all-pole noise shaping filter (b1=b2=...=0).

Cheers!
SG
Thanks for the pointer - I'll have a play with it, maybe even allow -shaping to take a supplementary value in the range 0..1 as you said above.

From memory, David was very reluctant to publish his code which included adaptive filtering. Another consideration is that each codec block is only 512 samples long (per channel) - would this not require fairly heavy processing input to calculate the optimal noise shaping filter?

As an aside (and I know that looking at the spectrum in foobar is not any way to evaluate anything....) I looked at the spectral output for a lossyWAV correction file (replaygained +45dB or so) and almost all of the signal was in the high end of the spectrum - so it "looks" like my implementation of your noise shaping filter works!
SebastianG
QUOTE(Nick.C @ Mar 15 2008, 22:18) *

Another consideration is that each codec block is only 512 samples long (per channel) - would this not require fairly heavy processing input to calculate the optimal noise shaping filter?


No. The iFFT+LevinsonDurbin approach should be quite fast. But the resulting filters aren't the best ones which is why I'm currently trying to understand how this can be combined with frequency warping. I have a stack of papers about this on my desk waiting to be read by me. ;-)


QUOTE(Nick.C @ Mar 15 2008, 22:18) *

so it "looks" like my implementation of your noise shaping filter works!

Cool!

Cheers!
SG
Nick.C
QUOTE(SebastianG @ Mar 15 2008, 21:54) *
QUOTE(Nick.C @ Mar 15 2008, 22:18) *
Another consideration is that each codec block is only 512 samples long (per channel) - would this not require fairly heavy processing input to calculate the optimal noise shaping filter?
No. The iFFT+LevinsonDurbin approach should be quite fast. But the resulting filters aren't the best ones which is why I'm currently trying to understand how this can be combined with frequency warping. I have a stack of papers about this on my desk waiting to be read by me. ;-)
QUOTE(Nick.C @ Mar 15 2008, 22:18) *
so it "looks" like my implementation of your noise shaping filter works!
Cool!

Cheers!
SG
I've added the supplementary parameter to -shaping in the range 0..1 and at 0.5 the added bitrate due to noise shaping is significantly reduced. I'll do a bit more testing with a view to posting v0.8.5 tomorrow.

[edit] Most of your post flew right over my head.... However, whatever can be added to lossyWAV to improve the quality of the output is well worth the effort - many thanks again! [/edit]

[edit2] 3556 tracks processed using -7a -shaping 1.000, 372kbps average bitrate...... [/edit2]
carpman
Hi,

Something struck me about dB level and lossy.wav performance which may well have implications as to how to get the best out of LossyWAV.

To date I've used MP3Gain (for MP3) and WavGain (for WAV prior to lossless encoding), as I've wanted my files to play at same level regardless of the player (I use foobar, so I could get foobar to do this - but I'm not always listening to my files on my system). Anyway, the results of my very small test suggest that for lossy.FLAC files encoding the original WAV versus the WavGained WAV would be a good idea:

Using:
lFLCDrop.v1.2.0.5.lFLC.bat.v1.0.0.7
lossyWAV beta v0.7.9
FLAC 1.2.1

Test:
2 copies of the same file (original.wav and wavgained.wav), the only difference being that the latter has been through Wav Gain and is 4.55 dB lower in volume.

SETTINGS: lossy.wav -3, FLAC -5

original.wav (93.55 dB) [FLAC-5 = 690kbps, lossy.FLAC = 475kbps]
<edited to make sense>
lossy.FLAC is 31% smaller than the FLAC. </edit>

wavgained.wav (89.00 dB) [FLAC-5 = 626kbps, lossy.FLAC = 477kbps]
<edit>lossy.FLAC is 24% smaller than the FLAC.</edit>

So this tells me that if I use the original and use foobar to look after the replay gain function, my lossy.FLAC collection would be approx 2/3 of the size of my FLAC collection.

But if I WavGain my files prior to encoding (which I had been doing) my lossy.FLAC collection would be approx 3/4 of the size of my FLAC collection.

That's a substantial difference.

If this is not news and everyone already knows this then fine, but it will be useful later to make clear to users that this is the case, as obviously it has implications regarding which method of replay gain one goes for.

C.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.