Help - Search - Members - Calendar
Full Version: lossyWAV Development
Hydrogenaudio Forums > Hydrogenaudio Forum > Uploads
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
jesseg
I see what' you're saying, and I think what you did was either switch around the wording or the numbers you used. I think you switched around the wording, because it would make sense that the WavGained version would be compressed more easily, since the original least significant bits are removed and get re-quantized into a new least significant bit - thereby having less most significant bits actually being used and FLAC being able to compress it more efficiently.

The problem with doing this is two fold, in my opinion:
1 - with WavGain, you're losing least significant bits that might not necessarily be insignificant.
2 - WavGain doesn't have a correction file. there's no way to ever get a 1:1 copy of the original again. If you're using normal FLAC, and the ReplayGain built into it, you can still enjoy the benefits of the volume equalization, and still be able to generate a 1:1 copy of the original source file at any time. And if ReplayGain code ever becomes more transparent in audio quality. Or if there is every a more accurate ReplayGain algorithm designed, you can enjoy the benefits of that by re-running ReplayGain over your libraries.

Also I noticed that the WavGained source file was slightly larger after lossWAV+FLAC. I wonder if someone else knows why that might have been. My initial thought was that lossyWAV thought the quantization noise introduced by WavGain was a slightly quieter noise-floor than in the original?
carpman
QUOTE(jesseg @ Mar 17 2008, 04:24) *

I see what' you're saying, and I think what you did was either switch around the wording or the numbers you used. I think you switched around the wording, because it would make sense that the WavGained version would be compressed more easily, since the original least significant bits are removed and get re-quantized into a new least significant bit - thereby having less most significant bits actually being used and FLAC being able to compress it more efficiently.

The edit was simply that in the original I said the "lossy.FLAC is 69% smaller than the FLAC" when what I'd meant was that its size was 69% of the FLAC. Instead I kept the "smaller" and made it 31%. That's all the edit was.

QUOTE(jesseg @ Mar 17 2008, 04:24) *

Also I noticed that the WavGained source file was slightly larger after lossWAV+FLAC.

Yeah, I thought that was strange.

C.

Nick.C
I've been testing beta v0.8.5 using the -shaping <n> parameter (0 to 1 in 0.05 steps) to process my 53 problem sample set and here are the results:
CODE
|---------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
| Shaping |    -1     |    -2     |    -3     |    -4     |    -5     |    -6     |    -7     |
|---------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|  0.000  | 543.5kbps | 494.6kbps | 433.9kbps | 408.2kbps | 385.6kbps | 365.4kbps | 348.1kbps |
|  0.050  | 543.7kbps | 495.0kbps | 434.5kbps | 408.9kbps | 386.5kbps | 366.7kbps | 349.7kbps |
|  0.100  | 543.9kbps | 495.3kbps | 435.0kbps | 409.6kbps | 387.3kbps | 367.6kbps | 350.9kbps |
|  0.150  | 544.2kbps | 495.7kbps | 435.6kbps | 410.4kbps | 388.3kbps | 368.8kbps | 352.2kbps |
|  0.200  | 544.4kbps | 496.2kbps | 436.3kbps | 411.3kbps | 389.3kbps | 370.1kbps | 353.7kbps |
|  0.250  | 544.8kbps | 496.8kbps | 437.2kbps | 412.3kbps | 390.6kbps | 371.6kbps | 355.5kbps |
|  0.300  | 545.2kbps | 497.4kbps | 438.2kbps | 413.5kbps | 392.1kbps | 373.4kbps | 357.6kbps |
|  0.350  | 545.7kbps | 498.1kbps | 439.2kbps | 414.8kbps | 393.6kbps | 375.2kbps | 359.8kbps |
|  0.400  | 546.2kbps | 498.9kbps | 440.4kbps | 416.2kbps | 395.4kbps | 377.3kbps | 362.2kbps |
|  0.450  | 546.7kbps | 499.8kbps | 441.7kbps | 417.8kbps | 397.3kbps | 379.6kbps | 364.8kbps |
|  0.500  | 547.5kbps | 500.9kbps | 443.3kbps | 419.7kbps | 399.5kbps | 382.4kbps | 368.3kbps |
|  0.550  | 548.2kbps | 502.0kbps | 444.9kbps | 421.5kbps | 401.7kbps | 385.0kbps | 371.3kbps |
|  0.600  | 549.1kbps | 503.3kbps | 446.6kbps | 423.5kbps | 403.9kbps | 387.4kbps | 374.1kbps |
|  0.650  | 550.1kbps | 504.7kbps | 448.6kbps | 425.7kbps | 406.2kbps | 389.9kbps | 376.7kbps |
|  0.700  | 551.1kbps | 506.2kbps | 450.7kbps | 428.0kbps | 408.7kbps | 392.3kbps | 379.0kbps |
|  0.750  | 552.3kbps | 507.8kbps | 452.9kbps | 430.4kbps | 411.3kbps | 395.0kbps | 381.6kbps |
|  0.800  | 553.5kbps | 509.6kbps | 455.2kbps | 432.9kbps | 413.9kbps | 397.8kbps | 384.6kbps |
|  0.850  | 554.9kbps | 511.4kbps | 457.7kbps | 435.6kbps | 416.7kbps | 400.7kbps | 387.7kbps |
|  0.900  | 556.5kbps | 513.5kbps | 460.4kbps | 438.6kbps | 419.9kbps | 404.1kbps | 391.3kbps |
|  0.950  | 558.2kbps | 515.8kbps | 463.4kbps | 442.0kbps | 423.5kbps | 407.8kbps | 395.2kbps |
|  1.000  | 560.1kbps | 518.3kbps | 466.8kbps | 445.8kbps | 427.5kbps | 411.9kbps | 399.2kbps |
|---------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
It is interesting that adding shaping has less of an effect at the higher bitrate end of the quality spectrum than at the lower end.

Please disregard the -newspread parameter as added to beta v0.8.4, it had a bug in it which, when rectified, produces the same results as the existing method (although that in itself was a bit of a surprise....). However, the means by which it arrives at the result is likely to be quicker once optimised in IA-32/x87, so I'll replace the existing code in due course.
Nick.C
lossyWAV beta v0.8.5 attached to post #1 in this thread.
2Bdecided
Patent:
http://patft.uspto.gov/netacgi/nph-Parser?...RS=PN/5,204,677

Post:
http://www.hydrogenaudio.org/forums/index....st&p=512342

I haven't reviewed it properly.

carpman, yes ReplayGain before encoding is a nice efficiency boost. Only apply negative gains. Put it within lossyWAV itself to avoid (extra) dithering.

Cheers,
David.
The Sheep of DEATH
All of this brings up 2 questions:
1. How is the quality of -7 -shaping 1.000 compared to -4 -shaping 0 at similar bitrate (or to -5 -shaping 0.5, etc)?
2. How can we use shaping to get at lower bitrates than -7 -shaping 0? Are tweaking snr and such necessary, or will there be a -8 or -9 added in the future, or something else?
SebastianG
QUOTE(The Sheep of DEATH @ Mar 18 2008, 18:04) *

1. How is the quality of -7 -shaping 1.000 compared to -4 -shaping 0 at similar bitrate (or to -5 -shaping 0.5, etc)?

That's a good question.

QUOTE(The Sheep of DEATH @ Mar 18 2008, 18:04) *

2. How can we use shaping to get at lower bitrates than -7 -shaping 0?

The answer is much simpler than its implementation: Adaptive noise shaping driven by a good psychoacoustic model.

The increased bitrate with activated shaping (currently) is due to the quantization noise that is added in a spectral area where there usually is no music. This leads to a worse prediction gain. The method I was sketching in one of the first posts does the opposite. It adds quantization noise "under" the signal exploiting the masking effect. This approach won't decrease the predictability too much since the spectral shape of the signal is preserved.

Cheers,
SG
carpman
1. Strange results (at least to a non-technical person)

I've been running a few tests with WavGain & lossyWAV, taking into account what jesseg and 2Bdecided have said.

I had thought that "1.wav" encoded via FLAC Drop then decoded to WAV would give me "1.lossy.wav" (i.e. the result of the lossyWAV processor. (for why I was doing this -- see 2 below)

But when I encoded 1.lossy.wav (without any other processing) back to FLAC using foobar and latest flac.exe (1.2.1) at -5, the file was much larger (522kbps) than the FLAC Dropped 1.lossy.flac (475kbps).

Can someone explain why?

Additionally, I copied the 1.lossy.wav" and WavGained it and then encoded that to FLAC using foobar and flac.exe (1.2.1) and the file was much, much larger (628kbps). I'm assuming this is to do with WavGain undoing some of the work done by lossyWAV?

2. Request for help:

I was trying to get foobar to convert to lossy.wav (rather than direct to lossy.flac) but kept getting a can't flush file error.

CODE
Error flushing file (Object not found) : file://C:\Documents and Settings\[...my edit...]\test1.lossy.wav


I'd set it up as per wiki. The only difference was the batch file, which I edited to leave out the FLAC encoding (though I didn't really know what I was doing unsure.gif ).

CODE
@echo off
D:\lossywav\lossyWAV %1 %3 %4 %5 %6 %7 %8 %9 -below -nowarn -quiet


Can someone tell me where I've gone wrong.

Thanks

C.
jesseg
QUOTE(carpman @ Mar 18 2008, 20:24) *
But when I encoded 1.lossy.wav (without any other processing) back to FLAC using foobar and latest flac.exe (1.2.1) at -5, the file was much larger (522kbps) than the FLAC Dropped 1.lossy.flac (475kbps).

Can someone explain why?
Because you're not using the -b 512 option in your FLAC.exe command to force FLAC to use a 512 sample block size for compression. That's my best guess.


QUOTE(carpman @ Mar 18 2008, 20:24) *
Additionally, I copied the 1.lossy.wav" and WavGained it and then encoded that to FLAC using foobar and flac.exe (1.2.1) and the file was much, much larger (628kbps). I'm assuming this is to do with WavGain undoing some of the work done by lossyWAV?
Yes. When it re-quantizes the sample values, I'm pretty sure it doesn't care at all about creating new LSB all the way to the bit-floor. If it did, it would be compromising it's own quality.


QUOTE(carpman @ Mar 18 2008, 20:24) *
CODE
@echo off
D:\lossywav\lossyWAV %1 %3 %4 %5 %6 %7 %8 %9 -below -nowarn -quiet


Can someone tell me where I've gone wrong.
When you use a full path, you have to use the full filename, like so:
CODE
D:\lossywav\lossyWAV.exe %1 %3 %4 %5 %6 %7 %8 %9 -below -nowarn -quiet

smile.gif
[edit]
but now after looking at the error, i'm not sure that's the problem either.
[/edit]
Nick.C
QUOTE(2Bdecided @ Mar 18 2008, 11:04) *
Patent:
http://patft.uspto.gov/netacgi/nph-Parser?...RS=PN/5,204,677

Post:
http://www.hydrogenaudio.org/forums/index....st&p=512342

I haven't reviewed it properly.

carpman, yes ReplayGain before encoding is a nice efficiency boost. Only apply negative gains. Put it within lossyWAV itself to avoid (extra) dithering.

Cheers,
David.
I'll need to read up on ReplayGain to see how I would implement it in lossyWAV. However, I can't seen any simple way of linking tracks together to albums so I think that the lossyWAV ReplayGain implementation would only calculate / use track gain (which if you processed a whole album as one file [as i do] would in effect be album gain).

I've been optimising the code again and the FFT unit is now about 95% IA-32/x87 and is a little bit faster / smaller.

Awaiting feedback on -shaping : how does it sound? Is it worth the extra bitrate? Does anyone have any ideas for alternate filters?

QUOTE(jesseg @ Mar 19 2008, 03:02) *
QUOTE(carpman @ Mar 18 2008, 20:24) *
Additionally, I copied the 1.lossy.wav" and WavGained it and then encoded that to FLAC using foobar and flac.exe (1.2.1) and the file was much, much larger (628kbps). I'm assuming this is to do with WavGain undoing some of the work done by lossyWAV?
Yes. When it re-quantizes the sample values, I'm pretty sure it doesn't care at all about creating new LSB all the way to the bit-floor. If it did, it would be compromising it's own quality.
It certainly will - WavGaining a lossy.wav file will almost certainly destroy all the carefully zeroed lsb's....
QUOTE(jesseg @ Mar 19 2008, 03:02) *
QUOTE(carpman @ Mar 18 2008, 20:24) *
CODE
@echo off
D:\lossywav\lossyWAV %1 %3 %4 %5 %6 %7 %8 %9 -below -nowarn -quiet
Can someone tell me where I've gone wrong.
When you use a full path, you have to use the full filename, like so:
CODE
D:\lossywav\lossyWAV.exe %1 %3 %4 %5 %6 %7 %8 %9 -below -nowarn -quiet

smile.gif
[edit]
but now after looking at the error, i'm not sure that's the problem either.
[/edit]
I think that the problem is that %1 (%s from foobar) = "temp"+32 (or so) random hex characters+".wav" and %d is the expected output filename. If you don't rename %1 to %2 (%d from foobar) then foobar will not find the file named by %d and will give you this error.... So, I think adding:
CODE
ren %~n1.lossy.wav %2
inserted as the last line might fix your problem.
2Bdecided
You can't run lossyWAV and then WaveGain, as Nick has explained.

Nick, what you need is to run WaveGain, grab the album gain value, and then use that to scale the data within lossyWAV in floating point or the 24-bit domain, but still only outputting 16-bits. I would heretically suggest not using dither at the output, but people can if they want. I put an option into the MATLAB version "always dither LSB" to ensure that even if you didn't remove any bits, the output was still dithered - that's the correct way of applying a gain change, but I would disable it by default.

There was some code or script that allowed you to run WaveGain, grab the scaling value, and use it when encoding with lame. It was in this thread but the links have died...
http://www.hydrogenaudio.org/forums/index....0637&st=150
...you need something like that to pass the value to lossyWAV - I don't think you should try to integrate ReplayGain itself into lossyWAV because just having track mode available is kind of useless.

Cheers,
David.
spies
QUOTE(2Bdecided @ Mar 19 2008, 03:44) *

Nick, what you need is to run WaveGain, grab the album gain value, and then use that to scale the data within lossyWAV in floating point or the 24-bit domain, but still only outputting 16-bits.

I would absolutely love to see this functionally added to lossyWAV! I envision it being an input option just like the --scale option in lame.

Depending on how this is implemented within the lossyWAV code, would it be possible to undue the gain processing with the correction file? That would be totally awesome!

All my flacs already have replaygain value tags. So that could be another source of the gain value for lossyWAV. I suppose that functionality would be part of the flac to lossyWAV to flac script i.e. lFLCDrop.
jesseg
Yep, I can for sure add track ReplayGain to the flac commands, and ability to turn that on/off in the custom section. I don't see why ReplayGain should be implemented with lossyWAV any other way than in the final codec used, unless the final codec doesn't support it. (and then, should you even be using it, if you like ReplayGain?)

I don't really see a way to have a correction file to undo the WavGain "distortion" without it being a huge file after lossless compression. As it is now, lossyFLAC + lwcdfFLAC is still smaller than a plain FLAC, especially using lossyWAV -1 preset.

cool.gif
Nick.C
QUOTE(spies @ Mar 19 2008, 16:41) *
QUOTE(2Bdecided @ Mar 19 2008, 03:44) *
Nick, what you need is to run WaveGain, grab the album gain value, and then use that to scale the data within lossyWAV in floating point or the 24-bit domain, but still only outputting 16-bits.
I would absolutely love to see this functionally added to lossyWAV! I envision it being an input option just like the --scale option in lame.

Depending on how this is implemented within the lossyWAV code, would it be possible to undue the gain processing with the correction file? That would be totally awesome!

All my flacs already have replaygain value tags. So that could be another source of the gain value for lossyWAV. I suppose that functionality would be part of the flac to lossyWAV to flac script i.e. lFLCDrop.
Your wish is my command....

-scale <n> parameter implemented which takes a value in the range 0 to 1 and scales the input WAV data by that amount. -scale is compatible with -correction and -merge will combine both files to re-create the lossless master.

*WARNING* filesizes may get large.... when I had a test with -scale 0.5 -correction using my 53 problem sample set, I got a combined filesize for 53 lossy.flac and 53 lwcdf.flac files of 93.1MB, compared to 69.3MB for the 53 lossless originals. Interestingly, the lwcdf.flac file is not too bad to listen to...

lossyWAV beta v0.8.6 attached to post #1 in this thread.
spies
QUOTE(Nick.C @ Mar 19 2008, 14:11) *
Your wish is my command....

-scale <n> parameter implemented which takes a value in the range 0 to 1 and scales the input WAV data by that amount. -scale is compatible with -correction and -merge will combine both files to re-create the lossless master.

*WARNING* filesizes may get large.... when I had a test with -scale 0.5 -correction using my 53 problem sample set, I got a combined filesize for 53 lossy.flac and 53 lwcdf.flac files of 93.1MB, compared to 69.3MB for the 53 lossless originals. Interestingly, the lwcdf.flac file is not too bad to listen to...

lossyWAV beta v0.8.6 attached to post #1 in this thread.

Wow Nick, that was fast, thanks! Can't wait to try it out when I get home.

Quick question though, when using the -scale option are you dithering at the output? I think I would prefer it not to be dithered or at least make it an option of the -scale command as suggested by David.

I would not be surprised to find that the change from 69.3MB to 93.1MB would be caused by scaling and dither, but would be quite surprised if it was caused by just scaling alone. I wonder what the result would be with ReplayGain scale values as opposed to using a scale factor of 0.5 for your test suite.
carpman
Thanks to everyone for their answers and patience - on the WavGain issue. That's cleared up some confusion on my part.

@ Nick, thanks, haven't had time yet, but I'll give that a go " -- ren %~n1.lossy.wav %2"

While lossyWAV is on the Replay Gain issue:

I use replay gain on a track by track basis. I tried jesseg's suggestion of using FLAC's internal replay gain, the problem I had was that I needed to switch on replay gain processing in Foobar for it to have the desired effect. Foobar automatically adjusted all my (non-tagged-replay-gained) files which are less or greater than 89dB (because I set them that way).

It's very possible I'm missing something obvious here, but just not obvious to me.

What I would like from lossyWAV is either:
a) a way to set the value at the lossy.wav stage (i.e. 1.2 dB below 89dB - as is possible with WavGain) --- and it looks like Nick may have already achieved this, or
b) be able to manually alter the replay gain value in FLAC's vorbis comments? (is this already possible?)

Slighly OT (not LossyWAV specific):
Is it possible to scan all the files in my collection with foobar and then manually adjust the track replay gain value to 0.0000, so these files aren't affected by the gain processing? If I do this then I will be able to use the FLAC Tag replay gain method and keep the lossy.flacs at their original volume and thus get the most out of lossyWAV. ---- I think that makes sense.

Thanks
C.

Nick.C
QUOTE(spies @ Mar 20 2008, 01:02) *
Wow Nick, that was fast, thanks! Can't wait to try it out when I get home.

Quick question though, when using the -scale option are you dithering at the output? I think I would prefer it not to be dithered or at least make it an option of the -scale command as suggested by David.

I would not be surprised to find that the change from 69.3MB to 93.1MB would be caused by scaling and dither, but would be quite surprised if it was caused by just scaling alone. I wonder what the result would be with ReplayGain scale values as opposed to using a scale factor of 0.5 for your test suite.
No dithering is employed at all (yet) in lossyWAV. All WAV data related calculations (apart from the final difference for correction values) are performed using 64-bit real values, the only rounding used is for bit-reduction. Using -7 -shaping 1.0 -scale 0.5, the lossy & lwcdf files totalled 98.5MB - it's almost not worth useing correction at all, merely keep a lossless FLAC copy and transcode to lossy.FLAC for an additional lossy copy.
2Bdecided
If the correction file is just the difference between the original, and the scaled lossy version, then it'll be huge. You shouldn't do it that way. If you think about it (or even if you don't!), you are storing the entire data twice, since the correction file is also a scaled lossy version!

What you should try is this:

First, you must store the scale somewhere. Don't lose it. It's vital. ("keep foreign metadata" in correction file?)

Then...

lossy = original * scale + quantisation noise

correction = original - (lossy / scale)

merged = (lossy / scale) + correction
= (lossy / scale) + original - (lossy / scale)
= original!

It's more complicated, and you're going to have to check there are no differential rounding errors (i.e. lossy/scale gives the same result each time, whatever that happens to be), but it's far more efficient. You won't double the file size this way.



I think you need to separate "ReplayGain applied to make lossyWAV more efficient", and "ReplayGain applied for whatever people use ReplayGain for" (!). I'm not sure. It depends how people use it.

I would use the AlbumGain (negative ones only) with lossyWAV. If people are already using ReplayGain anyway, I would pass through all the ReplayGain data (with appropriate adjustment, because the original is now quieter) to include in the final FLAC. If people aren't using ReplayGain normally, then there's nothing to pass through - just apply the negative album gains and leave it at that.

Cheers,
David.
Nick.C
I've just realised that in my haste to implement -scale, I have omitted to correctly scale codec_blocks which are not having any bits removed whatsoever. Big problem.

I am working on implementing David's recent scale-then-only-store-the-scaled-difference method and will post v0.8.7 asap.

!!Please do not use -scale in beta v0.8.6!!
Nick.C
QUOTE(2Bdecided @ Mar 20 2008, 11:10) *

If the correction file is just the difference between the original, and the scaled lossy version, then it'll be huge. You shouldn't do it that way. If you think about it (or even if you don't!), you are storing the entire data twice, since the correction file is also a scaled lossy version!

What you should try is this:

First, you must store the scale somewhere. Don't lose it. It's vital. ("keep foreign metadata" in correction file?)

Then...

lossy = original * scale + quantisation noise

correction = original - (lossy / scale)

merged = (lossy / scale) + correction
= (lossy / scale) + original - (lossy / scale)
= original!

It's more complicated, and you're going to have to check there are no differential rounding errors (i.e. lossy/scale gives the same result each time, whatever that happens to be), but it's far more efficient. You won't double the file size this way.
Ouch my poor head..... I'm getting *close* to the right answer as you detailed above - however I need to get to grips with x87 rounding (again)......
Nick.C
lossyWAV beta v0.8.7 attached to post #1 in this thread.
Nick.C
I've been focussing on speed-ups and have been working on the FFT unit in particular. I tripped over a method of calculating a real FFT of length 2N using a complex FFT of length N. As the FFT in use in lossyWAV is complex, it seems attractive to use this method. However, I am having trouble "untangling" the results to form the result of the real analysis from the complex analysis. I'll keep working on it as I think it will speed up the processing by about 25% overall.
Nick.C
I've finally managed to crack the problem I was having in implementing the 2N real FFT in an N Complex fft speedup - the improvement is between 20% and 25%.

I have however found a problem with the correction / merge / scale combination for 24 bit files - this will be investigated and beta v0.8.8 will be posted.

I played about with an 8 bit WAV file and I am going to remove 8 bit processing as removing any bits from an 8 bit WAV will most probably produce foul results.....
Nick.C
lossyWAV beta v0.8.8 attached to post #1 in this thread.
Mitch 1 2
In my own tests, lossyWAV 0.8.8 is significantly faster than version 0.8.7.

Would it make sense to drop all of the extra FFT analysis options (-Xa/b/c), in favour of an "-exhaustive" (or "-e") parameter, which would perform analysis passes using all suitable FFT sizes?
Nick.C
QUOTE(Mitch 1 2 @ Mar 28 2008, 03:57) *
In my own tests, lossyWAV 0.8.8 is significantly faster than version 0.8.7.
biggrin.gif
QUOTE(Mitch 1 2 @ Mar 28 2008, 03:57) *
Would it make sense to drop all of the extra FFT analysis options (-Xa/b/c), in favour of an "-exhaustive" (or "-e") parameter, which would perform analysis passes using all suitable FFT sizes?
I ran some tests (process my 53 problem sample set (125.1MB) 5 times and discard highest lowest time, taking average of the remaining 3) on a 2.0GHz Core2Duo (single instance, nothing else running.....) and got the following for v0.8.8:
CODE
|======|==================|==================|
|  QS  | Time/Rate v0.8.8 | Time/Rate v0.8.7 |
|======|==================|==================|
|  -7  | 15.71s / 47.34x  | 20.47s / 36.33x  |
|  -7a | 21.29s / 34.93x  | 27.84s / 26.71x  |
|  -7b | 27.13s / 27.41x  | 35.97s / 20.67x  |
|  -7c | 32.44s / 22.92x  | 43.44s / 17.12x  |
|======|==================|==================|
So, I *think* that I would rather leave the 3 options in place as the extra analyses still have a major effect on the processing time / rate. All tests were carried out with the input files cached in memory to ignore read latency.
halb27
Now that the machinery has changed quite a bit I tried to abx my problem sample set
Atemlied, badvilbel, bibilolo, , Blackbird/Yesterday, bruhns, dither_noise_test, eig, fiocco, furious, harp40_1, herding_calls, keys_1644ds, Livin_In_The_Future, S37_OTHERS_MartenotWaves_A, triangle-2_1644ds, trumpet, Under The Boardwalk.

My personal transparency level is where quality level -4 is now. So I tried to abx quality level -4, and I can only say I can't abx any problem. The only thing mentionable is very weak suspicion that Atemlied is not totally transparent (at the very moment when the 'music' starts [sec. 0.0-1.6]), but my abx result doesn't back this up at all.

So everything is great also with the changed machinery which brought a surprisingly high speed increase (from memory - didn't do a real comparison).
Because of the good speed I also encoded my regular track and my problem sample set to learn about average bitrate for the various quality levels:

quality    regular/problem set [kbps]
     -1              467/561
     -2              418/518
     -3              372/472
     -4              346/447
     -5              325/421
     -6              306/397
     -7              291/375

These are very good properties IMO.
Even for a purist who wants to keep the basic principle and uses -2, the average bitrate for regular music is only slightly above 400 kbps. Average bitrate for the problems is 100 kbps higher on average which IMO is more than enough of security (with earlier and less sophisticated lossyWAV versions ~470 kbps on average for my problem set was necessary to make them transparent).
-3 and -4 are the perfect solutions IMO for the non-purists struggling for transparency.
From -5 up there is an increasing risk of arriving at non-transparent results, but judging from practical listening quality is still very good (just tried [no abxing] some samples from my regular track set using -6 and was very content).

Wonderful work, Nick. Congratulations.
shadowking
Thanks for the tests halb27. Good to hear we have a strong 300..350k range. Bitrates are also looking good with -2 giving archive quality at half the normal bitrate of lossless.
halb27
QUOTE(Nick.C @ Mar 15 2008, 23:18) *

... As an aside (and I know that looking at the spectrum in foobar is not any way to evaluate anything....) I looked at the spectral output for a lossyWAV correction file (replaygained +45dB or so) and almost all of the signal was in the high end of the spectrum - so it "looks" like my implementation of your noise shaping filter works!

Finally I tried -shaping too and also looked at the correction file's spectrum. Noise is less audible than without shaping, so it works well. Noise gathers mainly in the highest spf frequency zone and above (12.4+ khz). Because of this bitrate is often expected to be higher than without shaping in order to arrive at the same S/N ratio in the highest frequency zone.

Some proposals on the bitrate bloat issue:

a) on a per block basis decide which shaping yields the higher number of bits to remove:
shaping 0 or shaping 1 (or shaping x, y, z, ....).
I came to this idea because shaping 1 does not always yield a bitrate bloat. For dither_noise_test shaping 0 yields 705 kbps, whereas shaping 1 yields 295 kbps (when used together with -7), and I couldn't hear a problem.
Sure this means at least doubling the encoding time. Moreover maybe the changing of the noise shaping is audible (but the same argument applies as towards the sudden noise increase and decrease when the anti-clipping strategy goes to work: as long as the noise is hidden we shouldn't mind).
Maybe an autoshaping strategy like this is most adequate: start for the first block with a low shaping value like shaping = 0.2.
For any current block: try the shaping done with the last block, as well as a with two shaping values that add resp. subtract a certain delta from the last shaping value. Always use the shaping from the three possible values which maximizes the number of bits to remove.
This is the basic principle. In order to save some work, checking for changing the shaping value is not necessarily done on every block or with both directions. For instance with the current block it is checked only whether an increase of the shaping value is useful. On the next block the check goes in the opposite direction: check only whether decreasing the shaping value is useful, and so on interchangingly possibly increasing resp. decreasing shaping. Things like that. Maybe the frequency of the changes can trigger pauses for the shaping checking. When changes are rare it's not useful to do the checking with every block.

b) Maybe a more direct approach is more efficient: decide on the spectrum of the signal which shaping to use. When there's a lot of HF in the music, putting the noise into the HF region should be a good thing. Not quite so when there's no HF present in the music to hide the noise.

c) Maybe giving the potential to shift the noise also towards low frequencies instead of only shifting up may be useful with the approaches of a) or b).

d) Think of quality level -7 as of targeting at rather low bitrate lovers accepting some compromise. So for -7 soften the accuracy requirements for the highest spf frequency zone. For instance don't check this zone at all other than for the 64 sample FFT analyses (so far this shouldn't seriously hurt). If necessary: use a spreading of 5 or even 6 instead of 4 for the highest frequency zone in the 64 sample FFT analyses (this hurts - a bit for a spreading of 5, more so when going higher).
Nick.C
QUOTE(halb27 @ Mar 29 2008, 16:34) *
Some proposals on the bitrate bloat issue:

a) on a per block basis decide which shaping yields the higher number of bits to remove:
shaping 0 or shaping 1 (or shaping x, y, z, ....).
I came to this idea because shaping 1 does not always yield a bitrate bloat. For dither_noise_test shaping 0 yields 705 kbps, whereas shaping 1 yields 295 kbps (when used together with -7), and I couldn't hear a problem.
Sure this means at least doubling the encoding time. Moreover maybe the changing of the noise shaping is audible (but the same argument applies as towards the sudden noise increase and decrease when the anti-clipping strategy goes to work: as long as the noise is hidden we shouldn't mind).
Maybe an autoshaping strategy like this is most adequate: start for the first block with a low shaping value like shaping = 0.2.
For any current block: try the shaping done with the last block, as well as a with two shaping values that add resp. subtract a certain delta from the last shaping value. Always use the shaping from the three possible values which maximizes the number of bits to remove.
This is the basic principle. In order to save some work, checking for changing the shaping value is not necessarily done on every block or with both directions. For instance with the current block it is checked only whether an increase of the shaping value is useful. On the next block the check goes in the opposite direction: check only whether decreasing the shaping value is useful, and so on interchangingly possibly increasing resp. decreasing shaping. Things like that. Maybe the frequency of the changes can trigger pauses for the shaping checking. When changes are rare it's not useful to do the checking with every block.

b) Maybe a more direct approach is more efficient: decide on the spectrum of the signal which shaping to use. When there's a lot of HF in the music, putting the noise into the HF region should be a good thing. Not quite so when there's no HF present in the music to hide the noise.

c) Maybe giving the potential to shift the noise also towards low frequencies instead of only shifting up may be useful with the approaches of a) or b).

d) Think of quality level -7 as of targeting at rather low bitrate lovers accepting some compromise. So for -7 a soften the accuracy requirements for the highest spf frequency zone. For instance don't check this zone at all other than for the 64 sample FFTs (so far this shouldn't seriosly hurt). If necessary: use a spreading of 5 or even 6 instead of 4 for the highest frequency zone in the 64 sample FFT analyses (this hurts - a bit for a spreading of 5, more so when going higher).
Unfortunately, the -shaping parameter barely changes the bits-to-remove as calculated by lossyWAV, however as SebastianG said earlier it makes the predictors in the lossless codec work less efficiently. I wonder if a variable shaping which is 1.0 at 8 or more bits to remove and 0.0 at 0 bits to remove, i.e. a resolution of 0.125 shaping per bit-to-remove, might be effective?

I'll try it out this evening and if it works at all, I'll post a new beta, probably with a parameter "-autoshape" which will be compatible with -shaping <n> in the sense that if autoshape says -shaping 0.125, but -shaping has been specified as 0.5 then shaping will be in the range 0.5 to 1, treating the -shaping value as a minimum value.
halb27
QUOTE(Nick.C @ Mar 29 2008, 19:12) *

Unfortunately, the -shaping parameter barely changes the bits-to-remove as calculated by lossyWAV, however as SebastianG said earlier it makes the predictors in the lossless codec work less efficiently. ...

I see, and now that you write it I remember SebastianG's remark. If this is so: maybe shifting noise downwards in a controlled way makes things easier for the predictor. It's often helpful with wavPack lossy when using rather low bitrate.
Nick.C
QUOTE(halb27 @ Mar 29 2008, 17:22) *
QUOTE(Nick.C @ Mar 29 2008, 19:12) *
Unfortunately, the -shaping parameter barely changes the bits-to-remove as calculated by lossyWAV, however as SebastianG said earlier it makes the predictors in the lossless codec work less efficiently. ...
I see, and now that you write it I remember SebastianG's remark. If this is so: maybe shifting noise downwards in a controlled way makes things easier for the predictor. It's often helpful with wavPack lossy when using rather low bitrate.
That would take a new noise shaping function. SebastianG very kindly donated his 44.1kHz and 48kHz functions to get noise shaping working in lossyWAV, but I have no idea how to derive a new one which ideally would push noise above 20kHz and below, say, 10Hz.
halb27
QUOTE(Nick.C @ Mar 29 2008, 19:33) *

[That would take a new noise shaping function. SebastianG very kindly donated his 44.1kHz and 48kHz functions to get noise shaping working in lossyWAV, but I have no idea how to derive a new one which ideally would push noise above 20kHz and below, say, 10Hz.

I see, but maybe some day you will run upon such a function.

Another idea for the low bitrate lovers:
What about lowpassing to 17 or so kHz before letting the lossless codec do its job. Guess that brings the bitrate bloat down a bit. I'll try that using sox.

Good Lord: BS of course as this destroys the work of lossyWAV.
halb27
Quite a pity, but interesting:

While with low bitrate settings the bitrate bloat is very remarkable when using -shaping 1, both in an absolute and, more so, in a relative sense (+25% for -7 with my regular full length track set), the absolute and relative difference is lower for the high quality settings (+12% for -2, +9% for -1).
Listening to the correction file of a -1 encoding the noise is so much less audible with -shaping 1 that it may be desirable to use -shaping 1 especially with quality -1.
Bourne
-
Nick.C
QUOTE(Bourne @ Mar 29 2008, 19:18) *
I have a question for Nick C.

The resulting processed WAV file is smaller than the original ?
Eg. I could burn an entire lossyWAV album in WAV/PCM format without taking the whole space it would with the standard WAV?
I'm afraid not - all lossyWAV does is to zero lsb's in each sample as required. It does not change the bitdepth of the sample and therefore does not change the size of the ouput file, other than to add a 'fact' chunk with the lossyWAV processing information near the beginning of the WAV file.
Bourne
-
Nick.C
QUOTE(Bourne @ Mar 29 2008, 20:33) *
so it's the lossless codec that takes advantage over the processed WAV...
Exactly right, David mentioned this in the first post in his original thread.
Nick.C
lossyWAV beta v0.8.9 attached to post #1 in this thread.

QUOTE(shadowking @ Mar 29 2008, 14:35) *
Thanks for the tests halb27. Good to hear we have a strong 300..350k range. Bitrates are also looking good with -2 giving archive quality at half the normal bitrate of lossless.
I remember your comments in the first page of the thread and I am glad that we are getting close to your desired 340kbps...... smile.gif

Using the -7 -autoshape with my 53 sample set I get 366.8kbps compared to 348.1kbps for -7 and 399.2kbps for -7 -shaping 1.0.
halb27
QUOTE(Nick.C @ Mar 29 2008, 23:14) *

... Using the -7 -autoshape with my 53 sample set I get 366.8kbps compared to 348.1kbps for -7 and 399.2kbps for -7 -shaping 1.0.

Sounds promising. I'm curious what it looks like with regular music.
Nick.C
QUOTE(halb27 @ Mar 29 2008, 22:38) *
QUOTE(Nick.C @ Mar 29 2008, 23:14) *
... Using the -7 -autoshape with my 53 sample set I get 366.8kbps compared to 348.1kbps for -7 and 399.2kbps for -7 -shaping 1.0.
Sounds promising. I'm curious what it looks like with regular music.
I've had a thought - the current implementation of -autoshape is analogous to maximum-bits-to-remove was prior to the use of the RMS value of the codec-block (expressed in bits) to determine the variable maximum-bits-to-remove. I am working on a variant which will take into account the RMS value of the codec-block at the same time - should be posted tomorrow.
halb27
Anyway it looks very promising already:

-7 -autoshape => 325/385 kbps (my regular/problem set)
-4 -autoshape => 369/450 kbps (my regular/problem set)
-2 -autoshape => 432/519 kbps (my regular/problem set)

Again, penalty is lower the higher the quality setting.

For a fair comparison I encoded 'Livin_In_The_Future' with -5 as well as -7 -autoshape.
Looking at the spectrum noise behavior is better with -7 -autoshape up to ~ 9 kHz.
This is a valuable extension to the effect of the skewing machinery which keeps noise especially low up to ~ 3 kHz.
So the entire range of the fundamentals is kept within pretty low noise this way.
Of course this isn't a judgement about audible quality in the end.
I listened to the correction files of -5 and -7 -autoshape and as expected the coloured -autoshape noise of the -7 encoding is less audible than than the white noise of -5 though it's higher in ampitude.
Again this doesn't really tell about quality for quality levels using a positive -nts value.
With -nts 0 or negative however I think the quality control mechanism makes sure everything is fine (assuming the control mechanism is really working, and we don't have a reason to doubt that).

Dynamic
QUOTE(jesseg @ Mar 19 2008, 19:26) *

As it is now, lossyFLAC + lwcdfFLAC is still smaller than a plain FLAC, especially using lossyWAV -1 preset.


Sorry to revive a 2-week-old quote, but I've been away from the forums for quite some time. The above comment surprised me.

I'd be surprised if the lossless combination of (lossyFLAC plus correction files) would on average come out to occupy less disk space than plain FLAC for the same music. I say this because, for example, the combination (Wavpack hybrid plus correction file) is usually a little larger than plain Wavpack lossless, and this seems understandable because you're sacrificing some efficiency to enable you to split the total information content into a playable but smaller file plus a near-random noise correction file.

If it were generally true, then you've just found a way to improve the lossless compression ratio of FLAC - an unlikely result (certainly in comparison to optimal lossless FLAC settings and block lengths), but potentially valuable if true.

As an aside, I'm loving your work, everyone. This project looks rather exciting.

I would be fascinated to probe the boundaries of how good lossyWAV is as a transcoding source for conventional lossy encoders - i.e. to store my music on my PC in, say, lossyFLAC or lossyWV format, then transcode for portable devices on demand (I've already considered applying RG Album Gain before using lossless compression, accepting a theoretical but inaudible SNR degradation on overly-loud albums in exchange for reducing the bitrate).

As an approach to verifying transcoding robustness, I wonder about choosing, say, known LAME problem samples and encoding those from original WAV versus lossyWAV sources. If the artifact behaviour stays broadly similar, is that a valid reassurance that lossyWAV at setting X makes a robust source for transcoding so long as no other problems have been found with the problem samples that affect Wavpack Lossy, such as atemlied? Even for problems that are fixed in newer LAME versions, I guess one could use an older version of LAME and a newer one to check that the original artifact and the fixed version are substantially unchanged when using lossyWAV.

This approach might then help to guide the choice of quality setting for those who desire a transcoding source. I presume lossyWAV with quality -2, for example, would even work well as source material for transcoding down in quality into lossyWAV quality -7 for the PDA-DAP low battery drain approach because the changes made should barely affect the measured noise floor compared to the original WAV, and you're going much more aggressive anyway on that second pass to quality -7.

Best regards,
Dynamic
2Bdecided
Great work Nick.

Re: the bitrate bloat due to "shaping" - surely the whole point of shaping the noise is so that you can add more of it while maintaining the same level in the audible band? So when you enable shaping, you should also add a threshold shift. (Sorry if you're doing this already!).

Re: lower bitrate than normal FLAC: lossyFLAC uses a different default block size. Sometimes the output is smaller than the default FLAC blocksize, sometimes it's larger - maybe this is what's happening? Overall, what you say is correct: lossy+correction should be larger than lossless, but it would be nice if it wasn't! wink.gif

Re: transcodability: I tested with mp3 problem samples very early on. It would be worth re-testing with the current version. I found you can't be too aggressive if you want to avoid any audible difference (e.g. -1!), but if "different but not worse" is good enough, you can use normal settings (which at the time was -2!). that horrible "trumpet" sample was quite revealing.

Cheers,
David.
Nick.C
QUOTE(2Bdecided @ Apr 1 2008, 17:23) *
Great work Nick.

Re: the bitrate bloat due to "shaping" - surely the whole point of shaping the noise is so that you can add more of it while maintaining the same level in the audible band? So when you enable shaping, you should also add a threshold shift. (Sorry if you're doing this already!).

Re: lower bitrate than normal FLAC: lossyFLAC uses a different default block size. Sometimes the output is smaller than the default FLAC blocksize, sometimes it's larger - maybe this is what's happening? Overall, what you say is correct: lossy+correction should be larger than lossless, but it would be nice if it wasn't! wink.gif

Re: transcodability: I tested with mp3 problem samples very early on. It would be worth re-testing with the current version. I found you can't be too aggressive if you want to avoid any audible difference (e.g. -1!), but if "different but not worse" is good enough, you can use normal settings (which at the time was -2!). that horrible "trumpet" sample was quite revealing.

Cheers,
David.
Thanks David, I'm glad you like it!

Re: Bitrate Bloat - I'm not doing that at present as my ears are not really sensitive enough, but if anyone wants to do some testing, I would suggest something along the lines of the following, taking into account the relationship between -snr and -nts:

CODE
  quality_noise_threshold_shifts    : array[1..Quality_Levels] of Integer = (-3,-0,3,6,9,12,15);

  quality_signal_to_noise_ratio     : array[1..Quality_Levels] of Integer = (24,22,20,19,18,17,16);


So, if I was going to go further, I would initially add 3 to -nts for every 1 taken from -snr, i.e. -8 = -nts 18 -snr 15; -9 = -nts 21 -snr 14; etc.

I tried "-nts 30 -snr 11 -autoshape" and with my problem set it doesn't sound particularly bad - probably a starting point.

Re: Transcodability - lossyWAV does not allow re-processing of an already processed file. I would prefer to keep it that way, although if the 'fact' chunk is removed, the program will not be able to tell the difference.

I found a small bug in the noise shaping code and also some quite nice speedups (approx 7% to 10%), so:

lossyWAV beta v0.9.0 attached to post #1 in this thread.
Nick.C
I have run some new tests (process my 53 problem sample set (125.1MB) on a 2.0GHz Core2Duo (single instance, nothing else running.....) and got the following for beta v0.9.0:
CODE
|======|==================|==================|
|  QS  | Time/Rate v0.8.8 | Time/Rate v0.9.0 |
|======|==================|==================|
|  -7  | 15.71s / 47.34x  | 14.34s / 51.86x  |
|  -7a | 21.29s / 34.93x  | 19.30s / 38.54x  |
|  -7b | 27.13s / 27.41x  | 24.56s / 30.27x  |
|  -7c | 32.44s / 22.92x  | 29.47s / 25.23x  |
|======|==================|==================|
All tests were carried out with the input files cached in memory to ignore read latency.
halb27
Thank you, Nick.
I'd like to do some abxing using -autoshape, but because abxing isn't so much fun I've been waiting for your version which takes the RMS value into account.
Does v0.9.0 contain this feature?
Nick.C
QUOTE(halb27 @ Apr 1 2008, 21:22) *
Thank you, Nick.
I'd like to do some abxing using -autoshape, but because abxing isn't so much fun I've been waiting for your version which takes the RMS value into account.
Does v0.9.0 contain this feature?
I tried to implement the -autoshape taking into account RMS value and the bitrate went through the roof. I may re-visit it, but I think the -autoshape in v0.9.0 is fairly robust, 0% shaping at 0 bits-to-remove and 100% shaping at (bits-per-sample - 3) bits-to-remove. Using -7 -autoshape -snr 11 -nts 30, my 53 problem sample set ends up at 327.3kbps, and the quality is not too bad - a starting point as I said above.
halb27
OK, I'll try to abx my usual problem samples with v0.9.0 -7 -autoshape. I'll also search for other tracks looking for hiss or other HF problems.
Nick.C
QUOTE(halb27 @ Apr 1 2008, 21:39) *
OK, I'll try to abx my usual problem samples with v0.9.0 -7 -autoshape. I'll also search for other tracks looking for hiss or other HF problems.
Many thanks!
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.