Help - Search - Members - Calendar
Full Version: lossyWAV Development
Hydrogenaudio Forums > Hydrogenaudio Forum > Uploads
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
2Bdecided
Axon,

I share your unease at the way pseudo-psychoacoustics have been arrived at for lossyWAV. I wouldn't put it any stronger than that though. I don't have the time to get involved, and am very grateful to Nick and halb27 for pushing this forward with such enthusiasm.

QUOTE(Axon @ Nov 28 2007, 07:42) *
It seems like 2BDecided's original code had some artifact problems... which makes no sense if it was purely by the book.
The basic algorithm is just "find the noise floor, and quantise at or below it".

The fundamental flaw in my implementation was that it couldn't "see" dips in the noise floor at low frequencies which are audible to human listeners - so it would happily fill them with noise. The "resolution" I used wasn't sufficient for low frequencies. The solution is either to skew the results, or modify the spreading, or both (I haven't taken the time to figure out which is the "right" approach) - the current version does both, to great effect. The reason my original script got away with it most of the time is because there are very few recordings where the noise floor is lowest at low frequencies - normally, the lower limit is at a high frequency, so inaccuracies in estimating it at low frequencies have no effect on the result for most recordings.

There was also a bug in later lossyFLAC MATLAB scripts which caused it to analyse the tail end of the "noise it had just added to the previous block" when assessing the noise floor of the current block. Nick spotted that, and corrected it in his code. I haven't generated a "fixed" MATLAB version.


The obvious "extras" for lossyWAV are a hybrid/lossless mode (quite possible), and a noise-shaped mode (already implemented, but not released for IP reasons). Finally, it might make sense to delineate between a proper psychoacoustic model (borrow one?) and a non-psychoacoustic implementation (close to now, but tamed a little).


btw Nick, I don't have any objections to you leaving switches in the final release for testing - just hide them well away in the depths of the manual! And please don't feel like you have to respect my wishes or anything - you've well and truly adopted my baby now! wink.gif

Cheers,
David.
Nick.C
QUOTE(2Bdecided @ Nov 30 2007, 14:26) *
Axon,

I share your unease at the way pseudo-psychoacoustics have been arrived at for lossyWAV. I wouldn't put it any stronger than that though. I don't have the time to get involved, and am very grateful to Nick and halb27 for pushing this forward with such enthusiasm.

QUOTE(Axon @ Nov 28 2007, 07:42) *
It seems like 2BDecided's original code had some artifact problems... which makes no sense if it was purely by the book.
The basic algorithm is just "find the noise floor, and quantise at or below it".

The fundamental flaw in my implementation was that it couldn't "see" dips in the noise floor at low frequencies which are audible to human listeners - so it would happily fill them with noise. The "resolution" I used wasn't sufficient for low frequencies. The solution is either to skew the results, or modify the spreading, or both (I haven't taken the time to figure out which is the "right" approach) - the current version does both, to great effect. The reason my original script got away with it most of the time is because there are very few recordings where the noise floor is lowest at low frequencies - normally, the lower limit is at a high frequency, so inaccuracies in estimating it at low frequencies have no effect on the result for most recordings.

There was also a bug in later lossyFLAC MATLAB scripts which caused it to analyse the tail end of the "noise it had just added to the previous block" when assessing the noise floor of the current block. Nick spotted that, and corrected it in his code. I haven't generated a "fixed" MATLAB version.


The obvious "extras" for lossyWAV are a hybrid/lossless mode (quite possible), and a noise-shaped mode (already implemented, but not released for IP reasons). Finally, it might make sense to delineate between a proper psychoacoustic model (borrow one?) and a non-psychoacoustic implementation (close to now, but tamed a little).


btw Nick, I don't have any objections to you leaving switches in the final release for testing - just hide them well away in the depths of the manual! And please don't feel like you have to respect my wishes or anything - you've well and truly adopted my baby now! wink.gif

Cheers,
David.
Thanks David, I'll look after her..... As to switches, I agree with the concensus that they should remain, although hidden from the attentions of casual users. I would also probably limit the input ranges so that truly awful results can be avoided.

A hybrid / lossless mode is totally possible - either at the same time as the processing, or as a stand alone program. If I venture down the piping route, it would have to be at the same time.

I corrected the Matlab script as well as my code and posted it as LossyFLAC6_x (I think).

All the best, and thanks again.

Nick.
2Bdecided
QUOTE(Nick.C @ Nov 30 2007, 14:46) *
I corrected the Matlab script as well as my code and posted it as LossyFLAC6_x (I think).
Yes, you did thanks. I didn't get chance to merge the fix back into what I had.

It would be interesting to put all your tweaks into the noise shaping version, but the wait for (a) time and (b) the IP to expire means I'm looking at, er, sometime after I retire! (I'm currently 30-something!). I think I'll just release what I have when the IP expires and let someone else play with it. It would be so cool to have the option of going from true lossless to virtually lossless to high VBR mp3-like lossy (but with fewer problem samples) in the one codec.

Mind you, you're pretty much there already, without the noise shaping!

Cheers,
David.
GeSomeone
QUOTE(Nick.C @ Nov 30 2007, 13:33) *

Attached File Spread___Skew.zip ( 7.99k )

It is very hard to see the effect of a parameter change because of the random Log FFT output unsure.gif

Having read 2Bdecided's comment it might be best to ditch the -0 settings as they emulate a flawed implementation.
Nick.C
QUOTE(GeSomeone @ Nov 30 2007, 17:32) *

QUOTE(Nick.C @ Nov 30 2007, 13:33) *

Attached File Spread___Skew.zip ( 7.99k )

It is very hard to see the effect of a parameter change because of the random Log FFT output unsure.gif

Having read 2Bdecided's comment it might be best to ditch the -0 settings as they emulate a flawed implementation.
You could copy the random number column and paste it in place as values to fix it. That would allow you to see the effects more clearly on a static example. Try looking again at the relativities between the two lines for minimum and the two lines for average....
jesseg
QUOTE(GeSomeone @ Nov 30 2007, 11:32) *
Having read 2Bdecided's comment it might be best to ditch the -0 settings as they emulate a flawed implementation.


I agree to some extent. But perhaps a commandline string something like -allowbadsettings which will allow people to use -0 as well as remove the limits to the limited settings. This would of course be another great option to hide deeeep in the manual.
Nick.C
The -0 setting is no longer required as it can be re-created from the relevant parameters. -clipping, -dither and -allowable will also be removed at the next revision.

I have started the coding for correction files and can now create a WAV file (.lwcdf.WAV : lossyWAV correction data file) of the difference between the source and bit_removed data. It's basically just hiss and compresses less well than the lossy.WAV file.

There's still a lot to do on the correction file side of things, but it's shouldn't be too difficult - just time consuming.

I'm a bit concerned as to how, if I go down the route of two WAV files: one lossy; one lwcdf; that if a WAV file is processed more than once, then what happens if the wrong correction file is added to the lossy file? Probably something not too good......

@Halb27: I've narrowed down my variations to -3: -snr 18 -skew 36 -nts 6 -spf 22235-22236-22347-22358-22469 -fft 10001 -cbs 512.

This permutation yields 34.77MB / 392.2kbps on my 53 sample set.
halb27
QUOTE(Nick.C @ Dec 2 2007, 00:20) *

I'm a bit concerned as to how, if I go down the route of two WAV files: one lossy; one lwcdf; that if a WAV file is processed more than once, then what happens if the wrong correction file is added to the lossy file? Probably something not too good....

I'd store a checksum of the lossyWAV result in the correction file so you can figure out a wrong combination in the bring-it-back-to-lossless application.

Other than that I'm having a hard time with listening tests resulting from your -snr -215 approach.
I easily found that there's no magic with negative snr values: for my sample sets -snr -215/-100/-10/0 all gave the same average bitrate, and the result of -snr 10 was close by. So it's just the same machinery as with positive snr values: modifying the FFT min if the snr offset from the FFT average is lower. With -snr -215 or similar there's simply no modification of the FFT min, and -snr -215 simply works as if there was no snr machinery at all.

-3 -snr -215 yields 313/430 kbps with my regular/problem samples set. While this is welcome with regular tracks, it looks a bit low with the problem samples.
I listened to it (to get used to problems I started with -nts 16), and I added more problem samples. The result wan't good with badvilbel, bibilolo, bruhns, dithernoise_test, eig, furious, keys_1644ds, utb. There are clear artifacts/distortions audible. Sure that was with an insane setting of -nts 16 for a warm up.
Using -nts 9 and -nts 6 improves a lot, the distortion like noise is gone, I'd even call the results 'acceptable', but I can still abx furious, dithernoise_test, keys, utb, and badvilbel.

My usual approach for improving is to bring bitrate up for the problem set but to a minor degree with the regular set. From current -3 setting and previous experience I know a '1' instead of the '2' for the first frequency zone of the 1024 sample FFT should do the job. It does, but only for the statistics, my listening experience yielded pretty much the same not totally satisfying quality.

That's my current state. The interesting question is: if -2 -snr -215 is a bit poor for some problems, what is the most effective way to improve: may be a higher -skew value will do it, or may be just the basic thing of the entire machinery: a lower -nts value (would match the idea of going a bit back to the pure basics), or may be really the snr machinery has en essential participation in preserving quality (after all the current -3 quality is very good). Quite interesting questions, but the answers will take some time.

And of course I'll try your new suggestion for the -3 setting.
Nick.C
QUOTE(halb27 @ Dec 2 2007, 10:20) *

I'd store a checksum of the lossyWAV result in the correction file so you can figure out a wrong combination in the bring-it-back-to-lossless application.
Not sure how I will achieve this inside a WAV file....
QUOTE(halb27 @ Dec 2 2007, 10:20) *
Other than that I'm having a hard time with listening tests resulting from your -snr -215 approach.
I easily found that there's no magic with negative snr values: for my sample sets -snr -215/-100/-10/0 all gave the same average bitrate, and the result of -snr 10 was close by. So it's just the same machinery as with positive snr values: modifying the FFT min if the snr offset from the FFT average is lower. With -snr -215 or similar there's simply no modification of the FFT min, and -snr -215 simply works as if there was no snr machinery at all.
That was exactly the point, to be able to switch off the -snr setting.
QUOTE(halb27 @ Dec 2 2007, 10:20) *
-3 -snr -215 yields 313/430 kbps with my regular/problem samples set. While this is welcome with regular tracks, it looks a bit low with the problem samples.
I listened to it (to get used to problems I started with -nts 16), and I added more problem samples. The result wan't good with badvilbel, bibilolo, bruhns, dithernoise_test, eig, furious, keys_1644ds, utb. There are clear artifacts/distortions audible. Sure that was with an insane setting of -nts 16 for a warm up.
Using -nts 9 and -nts 6 improves a lot, the distortion like noise is gone, I'd even call the results 'acceptable', but I can still abx furious, dithernoise_test, keys, utb, and badvilbel.

My usual approach for improving is to bring bitrate up for the problem set but to a minor degree with the regular set. From current -3 setting and previous experience I know a '1' instead of the '2' for the first frequency zone of the 1024 sample FFT should do the job. It does, but only for the statistics, my listening experience yielded pretty much the same not totally satisfying quality.

That's my current state. The interesting question is: if -2 -snr -215 is a bit poor for some problems, what is the most effective way to improve: may be a higher -skew value will do it, or may be just the basic thing of the entire machinery: a lower -nts value (would match the idea of going a bit back to the pure basics), or may be really the snr machinery has en essential participation in preserving quality (after all the current -3 quality is very good). Quite interesting questions, but the answers will take some time.

And of course I'll try your new suggestion for the -3 setting.
I've come to the realisation that the -snr setting is what (along with -skew and -nts) makes -3 so acceptable. Before -snr, we didn't have a way to stop minimum values which were close to the average introducing noise close to the average. Now we do - if we set -snr to 21 then we will never add noise above the average -21dB level.

I think that -3 is close to finished - I await your listening results with anticipation!

Nick.
halb27
QUOTE(Nick.C @ Dec 2 2007, 22:01) *

QUOTE(halb27 @ Dec 2 2007, 10:20) *

I'd store a checksum of the lossyWAV result in the correction file so you can figure out a wrong combination in the bring-it-back-to-lossless application.
Not sure how I will achieve this inside a WAV file....

Depends on the overall procedure. I guess you want to compress the correction file (though the compression ratio may be small - which just says that lossyWAV is working efficiently), so the final representation of the correction file won't have a WAV format. If you compress by your own method you can take care of the checksum easily, and if you use FLAC or similar, you can use tags to store the checksum of the lossyWAV result.

I've finished my investigations on -3.
First I wave edited all the old and new serious problem samples so that they consist only of the problematic spots. This way I hope to get a more meaningful statistics. With current -3 the average bitrate of this problem essence is 464 kbps.
When using -3 -snr -215 I got good, but not perfect results qualitywise, and I tried already without success to increase quality by using a spreading length of 1 for the lowest frequency zone of the 1024 sample FFT.
Next I tried to improve by using a higher -skew value. But this also doesn't bring the solution: using -3 -snr -215 -skew 44 yields an average bitrate of 422 kbps for my problem essence which is too low.
Next I lowered the -nts value, and -3 -snr -215 -nts 3 yields a bitrate of 444 kbps for my problem essence. I listened to it and was content with the result though to me it's a bit much on the cutting edge as my furious result was 7/10 and I also have the suspicion that utb isn't perfect though my ABX results don't back this up. With my regular sample set the average bitrate is 344 kbps which is nearly identical to the 345 kbps of current -3. Qualitywise the current -3 setting is more secure IMO, so I prefer it.
Then I used your new -3 proposal, but with the -spf value of current -3, that is I used -3 -snr 18, and the statistics is 331 kbps for my regular set, and 445 kbps for my problem essence. Listening to the problems showed that nearly everything is fine to me with the exception of dithernoise_test which was easy to abx 10/10 due to 1 spot where the noise like sound suddenly changes with the lossyWAV result in contrary to the original. With utb again I have the suspicion that it's not totally correct though I couldn't abx it and thus may be wrong.
Finally I tried your very new -3 proposal -3 -snr 18 -spf 22235-22236-22347-22358-22469. dithernoise_test is better now, it was harder for me to abx, and I arrived at 8/10. For utb my suspicion for being not perfect is gone.

So your new proposal is within the quality demand which to me is fine for -3 though it's on the cutting edge. But that's just my listening with my old ears to not very many samples (cosidered to be extraordinarily problematic though). The average bitrate for my regular sample set is 335 kbps which is only 10 kbps lower than that of current -3. Average bitrate however of the problem essence is 446 kbps, and that's 18 kbps less than that of current -3.
So we lose a lot more kbps in the problem area where a higher degree of kbps is wanted than we gain in the regular area. Once sensitive for especially dithernoise_test I tested it again with current -3, and everything is fine to me. As is utb.

So in the end IMO we should stick with current -3. An average bitrate of ~350 kbps for regular music is very good I think, and it seems we can't do essentially better with our weaponry without sacrificing safety margin to a considerable extent.
What the investigation has shown is that -snr has it's own specific part in preserving quality. It's not just an amplification of the merits of the -skew option.
Nick.C
QUOTE(halb27 @ Dec 2 2007, 21:01) *
So in the end IMO we should stick with current -3. An average bitrate of ~350 kbps for regular music is very good I think, and it seems we can't do essentially better with our weaponry without sacrificing safety margin to a considerable extent.
What the investigation has shown is that -snr has it's own specific part in preserving quality. It's not just an amplification of the merits of the -skew option.
Thank you very much my friend for spending a lot of time on settings validation. I was nearly at the same conclusion when you posted. Therefore, -3 is fixed - permanently (unless we find a particularly awkward sample......).

I am tidying up the code and removing redundant parameters. Will post beta v0.5.5 tonight or tomorrow.

Thanks again.

Nick.
GeSomeone
QUOTE(halb27 @ Dec 2 2007, 22:01) *

So your new proposal is within the quality demand which to me is fine for -3 though it's on the cutting edge.

Isn't that exactly where -3 should be? And -2 being "transparent as far as could be determined"?

QUOTE
The average bitrate for my regular sample set is 335 kbps which is only 10 kbps lower than that of current -3. Average bitrate however of the problem essence is 446 kbps, and that's 18 kbps less than that of current -3.

3% to 4% extra compression is something lossless codecs would have to work very hard for, so nothing to give away easily, except for a reason of course.

Thanks, for your testing and observations.
Nick.C
QUOTE(GeSomeone @ Dec 3 2007, 16:52) *
QUOTE(halb27 @ Dec 2 2007, 22:01) *
So your new proposal is within the quality demand which to me is fine for -3 though it's on the cutting edge.
Isn't that exactly where -3 should be? And -2 being "transparent as far as could be determined"?
QUOTE
The average bitrate for my regular sample set is 335 kbps which is only 10 kbps lower than that of current -3. Average bitrate however of the problem essence is 446 kbps, and that's 18 kbps less than that of current -3.
3% to 4% extra compression is something lossless codecs would have to work very hard for, so nothing to give away easily, except for a reason of course.

Thanks, for your testing and observations.
I take on board what you're saying, but I agree with Halb27 that we're aiming for transparency at -3 with increasing resilience at -2 and -1. The initial aim of the process was to "slightly" reduce bitrate - what we have currently with -3 is significant reduction using the interplay of -nts, -skew and -snr. Maybe -3 -snr 18 -nts 7.5 would produce adequate results, maybe not. However, while there's only really Halb27 doing the ABX'ing, I will unconditionally accept his opinion.

Anyway,

lossyWAV beta v0.5.5 attached: Superseded.

-allowable, -dither, -clipping and -overlap removed;

Reference_threshold values used to determine bits_to_remove from calculated minimum_value have been re-calculated. Very slight increase in bitrate (406.9 v0.5.4 vs 407.3 v0.5.5 for my 53 sample set).

Code tidied.

CODE
lossyWAV beta v0.5.5 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme settings [4xFFT] (-cbs 512 -nts -2.0 -skew 36 -snr 21
              -spf 22224-22225-11235-11246-12358 -fft 11011)
-2            default settings [3xFFT] (-cbs 512 -nts +1.5 -skew 36 -snr 21
              -spf 22224-22235-22346-12347-12358 -fft 10101)
-3            compact settings [2xFFT] (-cbs 512 -nts +6.0 -skew 36 -snr 21
              -spf 22235-22236-22347-22358-2246C -fft 10001)

Standard Options:

-o <folder>   destination folder for the output file
-nts <n>      set noise_threshold_shift to n dB (-48.0dB<=n<=+48.0dB)
              (-ve values reduce bits to remove, +ve values increase)
-force        forcibly over-write output file if it exists; default=off

Codec Specific Options:

-wmalsl       optimise internal settings for WMA Lossless codec; default=off

Advanced / System Options:

-snr <n>      set minimum average signal to added noise ratio to n dB;
              (-215.0dB<=n<=48.0dB) Increasing value reduces bits to remove.
-skew <n>     skew fft analysis results by n dB (0.0db<=n<=48.0db) in the
              frequency range 20Hz to 3.45kHz
-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 22235-22236-22347-22358-2246C (Characters must be one of
              1 to 9 and A to F (zero excluded).
-fft <5xbin>  select fft lengths to use in analysis, using binary switching,
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

David Robinson for the method itself and motivation to implement it in Delphi.
Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
halb27
QUOTE(GeSomeone @ Dec 3 2007, 18:52) *

... 3% to 4% extra compression is something lossless codecs would have to work very hard for, so nothing to give away easily, except for a reason of course. ...

Please keep in mind that we did not have a big amount of testing so far, and I did abx dithernoise_test 8/10 with my 58 year old ears for this 10 kbps saving setting. We are not in the situation of lossless codecs where lossless is lossless after all, but also with -3 IMO we should be pretty safe qualitywise, cause otherwise there's no good distinction from mp3 etc. If after years of lossyWAV usage a sample should come up which isn't totally transparent but has a negligible issue this is an acceptable situation for -3 IMO, but we should take some care not to be in this situation at the very lossyWAV start. Not for the advantage of just having an average bitrate of 340 kbps instead of 350.

As -nts is an official option you can easily save some kbps by increasing the -nts value as Nick said if you prefer to be a little bit adventurous.
jesseg
newer version of lFLCDrop, check the last page(s)
Nick.C
It's competition time for all the graphically creative users out there.... As the wiki is now up and running (many thanks to Mitch 1 2!), complete with Foobar2000 converter settings, I/we need an icon for lossyWAV.

Answers on the back of used large denomination currency of your choice (wink.gif) to: this thread.....
halb27
It's not very important, but those who use lossyWAV together with FLAC may find this useful:

Synthetic Soul found already that FLAC -5 yields nearly the same file size as -8. I can confirm and extend this:

For FLAC used in our context in many respect it makes nearly no difference whether we use -8, -5, or -3.
What's important to many tracks is the -m parameter (defaulted with -8 and -5, but not with -3).
To a small degree also the -e parameter makes a difference (defaulted with -8, but not with -5 and -3).

So -8, -5 -e, or -3 -m -e all yield an identical file size in a practical sense (at least with my test set), and -3 -m -e is the fastest encoding procedure among these.
If you allow for another option -3 -m -e -r 2 speeds things up a bit more while not really sacrificing file size (with my test set).
Dropping -e speeds up things further. File size increases a bit more noticable than with the -m -e variants, but to most users it's probably still negligible. Use -3 -m for an amazing speed (together with -r 2 if you like to), or -5. File size for -3 -m and -5 usually is identical in a practical sense.

Keep in mind though that with these speed settings overall encoding time is dominated by lossyWAV. So it may not be wise to hunt for the ultimate FLAC speed.
Mitch 1 2
I can also confirm it. The increase in speed justifies the consistently negligible (<1kbps) increase in bitrate.
Nick.C
QUOTE(Mitch 1 2 @ Dec 5 2007, 01:41) *
I can also confirm it. The increase in speed justifies the consistently negligible (<1kbps) increase in bitrate.
Good find Halb27, thanks for the confirmation Mitch 1 2! So, for those on a time budget, flac -3 -e -m -r 2 -b 512 is the way to go.....

On my 53 sample set, this increases the bitrate from 407.5 (-8) to 413.4 (-3 -e -r 2 -m -b 512) - Fast though......
halb27
QUOTE(Nick.C @ Dec 5 2007, 09:56) *

...On my 53 sample set, this increases the bitrate from 407.5 (-8) to 413.4 (-3 -e -r 2 -m -b 512) - Fast though......

That's not negligible to me, but I hope that's due to the nature of your more or less problematic snippets set (guess that's still your 53 sample set). With full sized regular music as Mitch_1_2 said I expect the difference to be <1 kbps on average.

If somebody finds that on a real life sample set of several full length tracks difference is > 1 kbps please let us know. For getting the precise difference we can look at the total size of the files under consideration. I expect difference to be ~0.1%.
Nick.C
QUOTE(halb27 @ Dec 5 2007, 08:11) *
QUOTE(Nick.C @ Dec 5 2007, 09:56) *
...On my 53 sample set, this increases the bitrate from 407.5 (-8) to 413.4 (-3 -e -r 2 -m -b 512) - Fast though......
That's not negligible to me, but I hope that's due to the nature of your more or less problematic snippets set (guess that's still your 53 sample set). With full sized regular music as Mitch_1_2 said I expect the difference to be <1 kbps on average.

If somebody finds that on a real life sample set of several full length tracks difference is > 1 kbps please let us know. For getting the precise difference we can look at the total size of the files under consideration. I expect difference to be ~0.1%.
I'll run a "real-world" conversion test - the same as my previous set, abut 10 albums and revert.
halb27
QUOTE(Nick.C @ Dec 5 2007, 10:18) *

...
Jean Michel Jarre - Oxygene / 773kbps / 454kbps / 372kbps / 377kbps
...So, overall an average of 850kbps / 410kbps / 350kbps / 351kbps

Thanks for your test, Nick. So in an overall sense FLAC -3 -m -e -r 2 is fine IMO, though it's quite interesting that with an album like Oxygene things aren't totally satisfying.
Do you mind trying FLAC -3 -m -e -r 3 and FLAC -3 -m -e on Oxygene?
Nick.C
QUOTE(halb27 @ Dec 5 2007, 09:36) *

QUOTE(Nick.C @ Dec 5 2007, 10:18) *

...
Jean Michel Jarre - Oxygene / 773kbps / 454kbps / 372kbps / 377kbps
...So, overall an average of 850kbps / 410kbps / 350kbps / 351kbps

Thanks for your test, Nick. So in an overall sense FLAC -3 -m -e -r 2 is fine IMO, though it's quite interesting that with an album like Oxygene things aren't totally satisfying.
Do you mind trying FLAC -3 -m -e -r 3 and FLAC -3 -m -e on Oxygene?
Apologies, using revised calculated constants for Reference_Threshold for beta v0.5.5, Oxygene has increased to 372kbps, and 5kbps increase with -3 -e -m -r 2 -b 512. I forgot I did the last comparison using a previous version. I will do it again with vanilla -3 / -8.

Artist - Album / FLAC / lossyFLAC -2 / lossyFLAC-3; lossyFLAC -3/-3 -e -m -r 2 -b 512;

CODE
AC/DC - Dirty Deeds Done Dirt Cheap    / 781kbps / 398kbps / 331kbps / 332kbps
B52's - Good Stuff                     / 993kbps / 408kbps / 361kbps / 362kbps
David Byrne - Uh-Oh                    / 937kbps / 398kbps / 344kbps / 345kbps
Fish - Songs From The Mirror           / 854kbps / 384kbps / 336kbps / 336kbps
Gerry Rafferty - City To City          / 802kbps / 400kbps / 338kbps / 338kbps
Iron Maiden - Can I Play With Madness  / 784kbps / 422kbps / 371kbps / 372kbps
Jean Michel Jarre - Oxygene            / 773kbps / 454kbps / 372kbps / 377kbps
Marillion - The Thieving Magpie        / 790kbps / 404kbps / 344kbps / 344kbps
Mike Oldfield - Tr3s Lunas             / 848kbps / 421kbps / 365kbps / 366kbps
Scorpions - Best Of Rockers N' Ballads / 922kbps / 421kbps / 354kbps / 354kbps


So, overall an average of 850kbps / 410kbps / 351kbps / 351kbps

I'm not worried about one spurious result - Oxygene, after all, is a fairly specific type of music.
halb27
QUOTE(Nick.C @ Dec 5 2007, 12:24) *

...
So, overall an average of 850kbps / 410kbps / 351kbps / 351kbps

This matches perfectly my experience with -3 -e -m -r 2 -b 512 as well as that of Mitch 1 2 as of his post.
You're right: we shouldn't care too much about specific music, especially as the result isn't extraordinarily bad.

Thanks again for your test.
Nick.C
Having listened to the comments on noise shaping, I had a look on wikipedia and found the basic principles.

As I already have a mechanism to store the difference between the original sample and the bit_removed sample, I have some of a noise shaping algorithm already in place.

The coefficients have so far eluded me.

One simple possibility that springs to mind is to start with zero at the codec block / channel start and then add the first difference then divide by two. Then add the next difference and divide by two. And so on.

We'll see how it sounds.
jesseg
[edit]
nasty 1st version logo removed, check the 1st post on the next page for the new one.
[/edit]
halb27
QUOTE(Nick.C @ Dec 6 2007, 00:29) *

Having listened to the comments on noise shaping, I had a look on wikipedia and found the basic principles.

... I have some of a noise shaping algorithm already in place.

The coefficients have so far eluded me.

One simple possibility that springs to mind is to start with zero at the codec block / channel start and then add the first difference then divide by two. Then add the next difference and divide by two. And so on.

We'll see how it sounds.

You're moving on fast. Wonderful!
When considering noise shaping: into what frequency range do yo want to put the noise?
Nick.C
QUOTE(halb27 @ Dec 6 2007, 08:17) *
QUOTE(Nick.C @ Dec 6 2007, 00:29) *
Having listened to the comments on noise shaping, I had a look on wikipedia and found the basic principles.

... I have some of a noise shaping algorithm already in place.

The coefficients have so far eluded me.

One simple possibility that springs to mind is to start with zero at the codec block / channel start and then add the first difference then divide by two. Then add the next difference and divide by two. And so on.

We'll see how it sounds.
You're moving on fast. Wonderful!
When considering noise shaping: into what frequency range do yo want to put the noise?
Ah, that's the problem - I don't yet know how to determine that.
Nick.C
I've implemented a simplistic bit_removal noise shaping function, enabled with the -shaping parameter. As yet, I haven't re-calculated the reference_threshold values nor have I included any dithering - need advice from someone with more of a clue than myself. Going to re-read the wikipedia article and a PDF I found.

lossyWAV beta v0.5.6 attached. Superseded.

CODE
lossyWAV beta v0.5.6 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme settings [4xFFT] (-cbs 512 -nts -2.0 -skew 36 -snr 21
              -spf 22224-22225-11235-11246-12358 -fft 11011)
-2            default settings [3xFFT] (-cbs 512 -nts +1.5 -skew 36 -snr 21
              -spf 22224-22235-22346-12347-12358 -fft 10101)
-3            compact settings [2xFFT] (-cbs 512 -nts +6.0 -skew 36 -snr 21
              -spf 22235-22236-22347-22358-2246C -fft 10001)

Standard Options:

-o <folder>   destination folder for the output file
-nts <n>      set noise_threshold_shift to n dB (-48.0dB<=n<=+48.0dB)
              (-ve values reduce bits to remove, +ve values increase)
-force        forcibly over-write output file if it exists; default=off

Codec Specific Options:

-wmalsl       optimise internal settings for WMA Lossless codec; default=off

Advanced / System Options:

-shaping      enable fixed shaping using bit_removal difference of previous
              samples [value = brd(-1)/(2^1)+brd(-2)/(2^2)+...+brd(-n)/(2^n)];
              default=off
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (-215.0dB<=n<=48.0dB) Increasing value reduces bits to remove.
-skew <n>     skew fft analysis results by n dB (0.0db<=n<=48.0db) in the
              frequency range 20Hz to 3.45kHz
-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 22235-22236-22347-22358-2246C (Characters must be one of
              1 to 9 and A to F (zero excluded).
-fft <5xbin>  select fft lengths to use in analysis, using binary switching,
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

David Robinson for the method itself and motivation to implement it in Delphi.
Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
Please have a listen to this and let me know......
halb27
QUOTE(Nick.C @ Dec 6 2007, 10:47) *

QUOTE(halb27 @ Dec 6 2007, 08:17) *
...
When considering noise shaping: into what frequency range do yo want to put the noise?
Ah, that's the problem - I don't yet know how to determine that.

Well, I can't contribute other than with these thoughts:
Usually noise would be most welcome IMO to go into the frequency range > 16 kHz cause we're not sensitive there. But with our approach of nts +6 which means reduced control in the high frequency area this may be a bit dangerous. May be noise in the >18 kHz region could do it. But for the sake of tweeters may be it's wise to only do a mild noise shaping.
May be doing it the other way around (noise in the < 3 kHz range) is the better way to go cause we have a better control here due to the work of skew and snr and spf).

Anyway as you provided already details like snr and a positive nts value which have shown up to be very advantageous I have full confidence you will arrive at a good result.
Nick.C
QUOTE(halb27 @ Dec 6 2007, 13:55) *
Anyway as you provided already details like snr and a positive nts value which have shown up to be very advantageous I have full confidence you will arrive at a good result.
I tried -3 -shaping -snr 18 -nts 15 and got 31.28MB / 352.9kbps for my 53 sample set - quite reasonable on my DAP.
Nick.C
QUOTE(Nick.C @ Dec 6 2007, 14:55) *
I tried -3 -shaping -snr 18 -nts 15 and got 31.28MB / 352.9kbps for my 53 sample set - quite reasonable on my DAP.
Artist - Album / FLAC / lossyFLAC -2 / lossyFLAC-3 / lossyFLAC -3 & FLAC -3 -e -m -r 2 -b 512 / lossyFLAC -3 -shaping -snr 21 -nts 15 & FLAC -3 -e -m -r 2 -b 512:

CODE
AC/DC - Dirty Deeds Done Dirt Cheap    / 781kbps / 398kbps / 331kbps / 332kbps / 294kbps
B52's - Good Stuff                     / 993kbps / 408kbps / 361kbps / 362kbps / 329kbps
David Byrne - Uh-Oh                    / 937kbps / 398kbps / 344kbps / 345kbps / 315kbps
Fish - Songs From The Mirror           / 854kbps / 384kbps / 336kbps / 336kbps / 306kbps
Gerry Rafferty - City To City          / 802kbps / 400kbps / 338kbps / 338kbps / 300kbps
Iron Maiden - Can I Play With Madness  / 784kbps / 422kbps / 371kbps / 372kbps / 334kbps
Jean Michel Jarre - Oxygene            / 773kbps / 454kbps / 372kbps / 377kbps / 316kbps
Marillion - The Thieving Magpie        / 790kbps / 404kbps / 344kbps / 344kbps / 307kbps
Mike Oldfield - Tr3s Lunas             / 848kbps / 421kbps / 365kbps / 366kbps / 322kbps
Scorpions - Best Of Rockers N' Ballads / 922kbps / 421kbps / 354kbps / 354kbps / 318kbps


So, overall an average of 850kbps / 410kbps / 351kbps / 351kbps / 314kbps

Also, Mitch 1 2 has indicated that values of -nts in excess of 15 are acceptable while maintaining the -snr 21 value.
[edit] Using -3 -nts 48 -snr 21 -shaping, I get 363.6kbps on my 53 sample set. [/edit]
halb27
Sounds very interesting, but I'd really like to know a bit about where the noise goes.

I also have a problem about the target and how it fits into what we have so far. So far we have the quality targets -3, -2, and -1 which should all be transparent (-1 in an overkill sense, -2 in a sense with a certain but not overkill safety margin, -3 with only a minor safety margin).
What's the target when using noise-shaping?
Most important:
Do we still want transparency with a certain though small safety margin (equivalent to: should we use it as -3 with the current meaning of -3)?
Or do we want to have something like -4 which should transparent nearly any time but is allowed to be not transparent though only in an acceptable way on rare occassion?
Or should we use the meaning I just described for a potential -4 for our final -3, and readjust the internal details of -2 and -1 so that the new -2 is somewhere between the current -3 and -2, and the new -1 is somewhere between the current -2 and -1?
Nick.C
QUOTE(halb27 @ Dec 7 2007, 08:29) *
Sounds very interesting, but I'd really like to know a bit about where the noise goes.

I also have a problem about the target and how it fits into what we have so far. So far we have the quality targets -3, -2, and -1 which should all be transparent (-1 in an overkill sense, -2 in a sense with a certain but not overkill safety margin, -3 with only a minor safety margin).
What's the target when using noise-shaping?
Most important:
Do we still want transparency with a certain though small safety margin (equivalent to: should we use it as -3 with the current meaning of -3)?
Or do we want to have something like -4 which should transparent nearly any time but is allowed to be not transparent though only in an acceptable way on rare occassion?
Or should we use the meaning I just described for a potential -4 for our final -3, and readjust the internal details of -2 and -1 so that the new -2 is somewhere between the current -3 and -2, and the new -1 is somewhere between the current -2 and -1?
I know what you mean about wanting to know where the noise goes.

I am hoping that you will / have had a play about with the -shaping parameter and also -snr / -nts to find a good compromise.

If we can get transparency using -shaping on the existing problem samples below current -3 bitrate, then I think that we should revise -3. If not then I would not be averse to the introduction of a carefully crafted -4 quality setting which would be "very-nearly-transparent-on-problem-samples" if there was a noticable reduction in bitrate compared to the existing -3.
Mitch 1 2
@halb27:

Would you mind testing out lossyWAV -3 -nts 48 (without noise shaping) on your test set? Casually listening, I've found that I can't hear any difference between files processed with this setting and the originals. Nick also found that he couldn't hear the difference. The -snr 21 default setting seems to be preventing audible distortion, even with the maximum nts value.
halb27
QUOTE(Mitch 1 2 @ Dec 7 2007, 12:03) *

@halb27:

Would you mind testing out lossyWAV -3 -nts 48 (without noise shaping) on your test set? Casually listening, I've found that I can't hear any difference between files processed with this setting and the originals. Nick also found that he couldn't hear the difference. The -snr 21 default setting seems to be preventing audible distortion, even with the maximum nts value.

Yes I will.
'Casually listening' ? That was my very question about what shall we target at.
I hold it back at the moment, and just try and see what it sounds like.
-nts 48 is a huge value though even without noise shaping -nts 20 isn't real bad except for bad samples.
halb27
QUOTE(Mitch 1 2 @ Dec 7 2007, 12:03) *

@halb27:

Would you mind testing out lossyWAV -3 -nts 48 (without noise shaping) on your test set? Casually listening ...

OOPs, I didn't read carefully: didnt read 'without noise shaping'.
Anyway I did listen carefully to my regular sample set using -3 -nts 48, and to me too quality is ok. I even did some abxing on several spot and couldn't find a difference.
But: it's different with spots that are hard to encode. I proved that already for -nts 30 and -nts 20. And my regular set yielded 320 kbps with -3 -nts 48.
When allowing for really bad results though on rare occasion we're better off using vorbis, aac, mpc, and mp3 in the 200- kbps range.

Anyway I'll test the -shaping version.
Nick.C
QUOTE(halb27 @ Dec 7 2007, 18:58) *
QUOTE(Mitch 1 2 @ Dec 7 2007, 12:03) *
@halb27:

Would you mind testing out lossyWAV -3 -nts 48 (without noise shaping) on your test set? Casually listening ...
OOPs, I didn't read carefully: didnt read 'without noise shaping'.
Anyway I did listen carefully to my regular sample set using -3 -nts 48, and to me too quality is ok. I even did some abxing on several spot and couldn't find a difference.
But: it's different with spots that are hard to encode. I proved that already for -nts 30 and -nts 20. And my regular set yielded 320 kbps with -3 -nts 48.
When allowing for really bad results though on rare occasion we're better off using vorbis, aac, mpc, and mp3 in the 200- kbps range.

Anyway I'll test the -shaping version.
I don't think that we want to go into the <300kbps range for normal music - there's plenty of good quality competition there. Thinking about it, I don't really want to implement a -4 quality setting if it's going to let through artefacts. Going back to the beginning, the stated aim is transparency for all quality settings.

That said, I've been playing with -3 -shaping -nts 18 -snr 18 and I can't notice any problems at all.

Maybe a reasonable target for a -shaping setting would be a bitrate slightly below the existing -3 setting.
halb27
I tried -shaping -snr 18 -nts 15.
My regular sample set was encoded with an average bitrate of 308 kbps, and for my problem sample essence it was 425 kbps.
With the problem sample set I can easily abx keys_1644ds (9/10). I have also the suspicionthat furious and utb aren't totally transparent, but I'm not the one who can prove it.
So far not so bad.

Then I decided to listen to some regular music, and it was with the very first track (Blackbird, Yesterday from The Beatles: Love, sec. 31.2-34.4) that didn't sound fine to me. I tried to abx it and got at 7/7, then 8/10.
It's an inaccuracy with the voice, so I don't think noise shaping moves noise into the high frequency range.
Nick.C
QUOTE(halb27 @ Dec 7 2007, 21:17) *
I tried -shaping -snr 18 -nts 15.
My regular sample set was encoded with an average bitrate of 308 kbps, and for my problem sample essence it was 425 kbps.
With the problem sample set I can easily abx keys_1644ds (9/10). I have also the suspicionthat furious and utb aren't totally transparent, but I'm not the one who can prove it.
So far not so bad.

Then I decided to listen to some regular music, and it was with the very first track (Blackbird, Yesterday from The Beatles: Love, sec. 31.2-34.4) that didn't sound fine to me. I tried to abx it and got at 7/7, then 8/10.
It's an inaccuracy with the voice, so I don't think noise shaping moves noise into the high frequency range.
Okay, back to the drawing board with -shaping then.... I'll need to research fixed shaping coefficients. Thanks for the listening time.

[edit] lossyWAV beta v0.5.7 attached: Superseded. modified (even simpler) noise shaping feedback function. [/edit]
halb27
My results for v0.5.7 -snr 18 -nts 15 -shaping:

My regular sample set gets at a very good 293 kbps.
The Blackbird, Yesterday problem has gone for me, and I don't have a suspicion on furious any more.
However I abxed utb 7/8 (and ended up 7/10), eig 7/10, and bruhns 9/10. There's a rather strong inaccuracy with bruhns at ~sec. 7.

Other than that I agree with you, Nick, that we should have only 3 quality parameters, and -3 should be transparent from the best of our experience when we go final. In case we should really get at a final average bitrate of ~300 kbps for -3 I personally don't have the demand for talking about a security margin with -3.
In case we should really arrive at that bitrate for -3 we should readjust -2 and -1 IMO: -2 being near current -3 but a little more demanding, and -1 being more where -2 is now (but definitely with nts <= 0).

Just an idea: you do a static noise shaping right now, and the noise shaping machinery is supposed to be simple, that is shifts noise up or down in frequency. In case of shifting up: wouldn't it be more or less equivalent (or may be at least a clearer approach) to allow for a weakened noise threshold tn the 12+ kHz range?
Nick.C
QUOTE(halb27 @ Dec 8 2007, 08:00) *
My results for v0.5.7 -snr 18 -nts 15 -shaping:

My regular sample set gets at a very good 293 kbps.
The Blackbird, Yesterday problem has gone for me, and I don't have a suspicion on furious any more.
However I abxed utb 7/8 (and ended up 7/10), eig 7/10, and bruhns 9/10. There's a rather strong inaccuracy with bruhns at ~sec. 7.

Other than that I agree with you, Nick, that we should have only 3 quality parameters, and -3 should be transparent from the best of our experience when we go final. In case we should really get at a final average bitrate of ~300 kbps for -3 I personally don't have the demand for talking about a security margin with -3.
In case we should really arrive at that bitrate for -3 we should readjust -2 and -1 IMO: -2 being near current -3 but a little more demanding, and -1 being more where -2 is now (but definitely with nts <= 0).

Just an idea: you do a static noise shaping right now, and the noise shaping machinery is supposed to be simple, that is shifts noise up or down in frequency. In case of shifting up: wouldn't it be more or less equivalent (or may be at least a clearer approach) to allow for a weakened noise threshold tn the 12+ kHz range?
I still don't really know where the noise is going. If you could try -3 -shaping -snr 21 -nts 15, I feel that this may be better. Selective -nts parameter for the bin in which the min value is found is possible to implement - I'll have a think and revert tonight.
[JAZ]
I went a bit lost lately with the addition of "noise shaping", so i'm going to give my thoughts, in case any of them is good:

Usual objective of noise shaping: reduce the effect (noise) of a produced artifact (usually when applying dither), changing (shaping) it from white noise (flat) to a curve that is less perceptible.

lossyWav tries to reduce the bitdepth of a portion of audio, so that the lossless encoder can benefit and reduce the bitrate demands.
Right now, lossywav works like: It runs different FFT's to verify the requirements at different resolutions, can define a minimum signal to (quantization) noise margin and has the skew function to correct a misinterpretation of the FFT analisys. I have to recognize that i don't completely know what the -spf function does (does it affect the analisys, or the generated audio?), and now has the noise shaping function to reduce the artifacts on some bad cases.

Context of noise shaping within lossyWav: Lossywav artefacts are the consequence of a reduced SNR, caused by the bitdepth reduction. This translates to quantization of a signal, and possibly should be applying dithering to it. This means, then, applying noise shaping in the context of dither, and as such, noise shaping determines where the dither noise goes.

Consequences: Two things to have in mind:
a) noise shaping uses the fact that we're less sensistive to higher frequencies, but the lower the bitdetph, the lower the SNR is.
b) noise is generally hard to compress losslessly, and more so, in the higher frequencies.

From the above: it should be used only where the engine detects that the lowest signal (in the block being processed) compared to the quantization level is too low, and assuring the bitdepth is not too small (i recall reading here that applying noise shaping to 8bits is already not recommended).


Conclusion: I believe that right now you are only doing dither, not noise shaping. Shaping is the output of a filter, with white noise as input, similar to a notchband. (at least, the way i understand it). You should find if the problems you're trying to fix are really on soft signals, or in strong signals.
If the latter, then the problem really is too small bitdepth, and there you should not apply noise shaping.
SebastianG
QUOTE(Nick.C @ Dec 7 2007, 22:21) *

Okay, back to the drawing board with -shaping then.... I'll need to research fixed shaping coefficients. Thanks for the listening time.

Ok, I see there's an increased interest in using noise shaping. I'm not sure whether I understand what you are trying to do and why -- and please forgive me for not following the discussion to closely. To be honest, it looks a bit like groping in the dark. In case you have an idea of what "noise shaping" is actually supposed to do in your case I might be able to help you show how it could be implemented. In case you don't you might wanna try my very first suggestion (see the first page of 2B's lossy flac thread).

Cheers!
SG
Nick.C
QUOTE(SebastianG @ Dec 8 2007, 14:00) *
QUOTE(Nick.C @ Dec 7 2007, 22:21) *
Okay, back to the drawing board with -shaping then.... I'll need to research fixed shaping coefficients. Thanks for the listening time.
Ok, I see there's an increased interest in using noise shaping. I'm not sure whether I understand what you are trying to do and why -- and please forgive me for not following the discussion to closely. To be honest, it looks a bit like groping in the dark. In case you have an idea of what "noise shaping" is actually supposed to do in your case I might be able to help you show how it could be implemented. In case you don't you might wanna try my very first suggestion (see the first page of 2B's lossy flac thread).

Cheers!
SG
I'd be the first to admit that I'm groping in the dark when it comes to noise shaping. Thanks for the reminder about your post on the first page of the original thread. I'll go and re-read and try to interpret / formulate an algorithm to enable implementation.

I really do need help with noise shaping, I'm a noob when it comes to audio processing - the offer is very welcome SebastianG!

@[JAZ]: Nothing apart from the window function (Hanning) affects the FFT analyses themselves. Any other parameters, i.e. -skew, -spf, -snr & -nts, modify the process of taking the FFT output and working out the "lowest" signal for that particular analysis. At the moment, there is no dither in the process at all, only rounding on bit-reduction.

What I'm really looking for is, as has been said above, a method of shifting the noise to the >16kHz band.

Any aid in comprehending this difficult topic would be greatly appreciated.

halb27
We have a problem.

I tried v0.5.7 -snr 21 -nts 15 -shaping according to your proposal, Nick. It yields 318 kbps with my regular set and 444 kbps with my problem essence set.
I listened to the beginning of Blackbird, Yesterday, and quality was very fine to me.
I started to try to abx the problems from my last test, and used eig as the first example. abx result was 9/10.

For a comparison I also tried to abx plain -3. Now used to the kind of problem (smearing, kind of an echo) I was able to abx plain -3 9/10 as well, though to me quality is better.

I think we should fix this before continuing the noise shaping way.
I am not very sensitive to temporal resolution problems, so it would be very kind if somebody could help testing lossyWAV on samples known to be pre-echo prone to mp3 etc.
Nick.C
QUOTE(halb27 @ Dec 8 2007, 20:27) *
We have a problem.

I tried v0.5.7 -snr 21 -nts 15 -shaping according to your proposal, Nick. It yields 318 kbps with my regular set and 444 kbps with my problem essence set.
I listened to the beginning of Blackbird, Yesterday, and quality was very fine to me.
I started to try to abx the problems from my last test, and used eig as the first example. abx result was 9/10.

For a comparison I also tried to abx plain -3. Now used to the kind of problem (smearing, kind of an echo) I was able to abx plain -3 9/10 as well, though to me quality is better.

I think we should fix this before continuing the noise shaping way.
I am not very sensitive to temporal resolution problems, so it would be very kind if somebody could help testing lossyWAV on samples known to be pre-echo prone to mp3 etc.
Bummer...... I agree that -shaping may need to wait until we have ironed out the problems with this newly discovered artefact. It may be possible that it can be dealt with using existing settings, however I am beginning to think that removing -dither was not the best idea that I've ever had.

Out of interest, how does -3 -shaping sound with this smearing sample? [edit2] More importantly, does the smearing exist at vanilla -2? [/edit2]

[edit] And a (very) belated thankyou to jesseg for taking the time to come up with an icon. I love the rows of bits - the font is a bit too curly for my taste, but I'll go with a consensus opinion. I still haven't worked out how to change the default console icon in Delphi though...... [/edit]
halb27
v0.5.7 -2: eig is fine to me.
v0.5.7 -3 -nts 0: some partial and the final result: 4/5, 5/7, 6/8, 6/10. So though I couldn't abx it according to the final result it is expected to be not ransparent. But as the effect is extremely subtle to me, and respecting the very special nature of this artificial sample, I personally can live with it for -3. But a lot more experience with potentially temporal resolution problems is most welcome (for instance with the castanets sample and other percussion instruments).
v0.5.7 -3 -shaping: eig is ok to me, but testing bruhns I got at 9/10. As the problem to me sounds like an artefact in the HF range I guess shifting noise up is working.

I don't want to spoil the party, but for the next period of time (never say never) I'm not in the mood of testing noise shaping.

As for the classical non-noise-shaping we should try to work out -nts, -skew, -snr, -fft, and -spf default values by moving the current values to more defensive ones and make samples like eig sound transparent.
I'll take my part in this.
sundance
Nick.C,
QUOTE
I still haven't worked out how to change the default console icon in Delphi though......

In Delphi 7 it's "Project | Options | Application" to apply a custom symbol. Hope you'll find it since this is from a German version of Delphi 7.

.sundance.
Nick.C
QUOTE(sundance @ Dec 9 2007, 13:01) *
Nick.C,
QUOTE
I still haven't worked out how to change the default console icon in Delphi though......

In Delphi 7 it's "Project | Options | Application" to apply a custom symbol. Hope you'll find it since this is from a German version of Delphi 7.

.sundance.
Thanks sundance - I'll try that this evening.

@Halb27: I think there might me some benefit in reducing the C at the end of the 1024 fft spf to, say, 9, to reduce the number of bins being averaged.

It may be that a more conservative approach to HF spreading will allow -shaping to become more useful.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.