Help - Search - Members - Calendar
Full Version: lossyWAV 1.2.0 Development Thread
Hydrogenaudio Forums > Lossy Audio Compression > Other Lossy Codecs
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
Nick.C
Following the release of lossyWAV 1.1.0b, I feel it is (again) time to kick off development of the next minor release.

Items currently on the list for inclusion in 1.x.0:

1.32.0: Implementation of SG's new noise shaping method;
1.2.0: Checking of S (=L-R) channel for matrix surround content;
1.2.0: Revisit the spreading function;
If you have any ideas, suggestions, code optimisations, etc, please post them here.

Link to the hydrogenaudio wiki article

lossyFLAC resultant bitrates:
CODE
+----------------------------------------------------------------------------------------------------------+
¦10 Album Test Set ¦   FLAC   ¦ --insane ¦--extreme ¦--standard¦--portable¦  --zero  ¦ --nasty  | --awful  |
+------------------+----------+----------+----------+----------+----------+----------+----------+----------¦
¦lossyWAV 1.1.0b   ¦ 854kbit/s¦ 632kbit/s¦ 548kbit/s¦ 463kbit/s¦ 376kbit/s¦ 285kbit/s¦ -------- | -------- |
+------------------+----------+----------+----------+----------+----------+----------+----------+----------¦
¦lossyWAV 1.1.4n   ¦ 854kbit/s| 639kbit/s| 556kbit/s| 471kbit/s| 383kbit/s| 288kbit/s| 230kbit/s| 200kbit/s|
+------------------+----------+----------+----------+----------+----------+----------+----------+----------¦
¦  --altpreset     ¦ 854kbit/s| 624kbit/s| 534kbit/s| 451kbit/s| 363kbit/s| 275kbit/s| 224kbit/s| 198kbit/s|
+----------------------------------------------------------------------------------------------------------+

+----------------------------------------------------------------------------------------------------------+
¦55 Problem Samples¦   FLAC   ¦ --insane ¦--extreme ¦--standard¦--portable¦  --zero  ¦ --nasty  ¦ --awful  ¦
+------------------+----------+----------+----------+----------+----------+----------+----------+----------¦
¦lossyWAV 1.1.0b   ¦ 782kbit/s¦ 654kbit/s¦ 583kbit/s¦ 508kbit/s¦ 425kbit/s¦ 321kbit/s¦ -------- ¦ -------- ¦
+------------------+----------+----------+----------+----------+----------+----------+----------+----------¦
¦lossyWAV 1.1.5b   ¦ 782kbit/s| 654kbit/s| 584kbit/s| 510kbit/s| 427kbit/s| 325kbit/s| 259kbit/s| 218kbit/s|
+------------------+----------+----------+----------+----------+----------+----------+----------+----------¦
¦  --altpreset     ¦ 782kbit/s| 626kbit/s| 567kbit/s| 508kbit/s| 442kbit/s| 391kbit/s| 354kbit/s| 322kbit/s|
+----------------------------------------------------------------------------------------------------------+


Suggested foobar2000 converter setup:

lossyFLAC:
CODE
Encoder: c:\windows\system32\cmd.exe
Extension: lossy.flac
Parameters: /d /c c:\"program files"\bin\lossywav - --standard --silent --stdout|c:\"program files"\bin\flac - -b 512 -5 -f -o%d
Format is: lossless or hybrid
Highest BPS mode supported: 24
lossyTAK:
CODE
Encoder: c:\windows\system32\cmd.exe
Extension: lossy.tak
Parameters: /d /c c:\"program files"\bin\lossywav - --standard --silent --stdout|c:\"program files"\bin\takc -e -p2m -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24
lossyWV:
CODE
Encoder: c:\windows\system32\cmd.exe
Extension: lossy.wv
Parameters: /d /c c:\"program files"\bin\lossywav - --standard --silent --stdout|c:\"program files"\bin\wavpack -hm --blocksize=512 --merge-blocks -i - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24

There is a known problem within foobar2000 (although more likely to do with cmd.exe itself) when running an executable within the cmd.exe command line from a path which includes spaces. The suggested fix for this is to enclose the element of the path which contains spaces within double quotation marks ("), e.g. c:\"program files"\directory_where_executable_is\executable_name

Change log 1.1.5c: 21/11/09
Minor revision to internal setting for --altpreset.

Change log 1.1.5b: 20/11/09
Major revision to internal setting for --altpreset.

Change log 1.1.5a: 18/11/09
Bugfix: Correction to high sample-rate processing.

Change log 1.1.4s: 07/11/09
Bugfix: manual --limit setting not working as it should.

Change log 1.1.4r: 03/11/09
Bugfix: shaping in altpreset mode was artificially limited to 50% (only affected -q 6.5 and above).

Change log 1.1.4q: 02/11/09
Reversion to use of previous noise pre-calculated constant;
Shaping now OFF by default. To enable shaping use -s or --shaping, without a parameter for automatic shaping or with a value 0<=n<=1 for user specified shaping.

Change log 1.1.4p: 22/10/09
Mutual exclusivity of shaping, hilimit and altpreset removed;
Added noise pre-calculated constant removed in favour of improved derived formula;
--altpreset parameter now also -t.

Change log 1.1.4n: 27/09/09
Mutual exclusivity of shaping, hilimit and altpreset corrected.

Change log 1.1.4m: 26/09/09
--postanalyse function removed;
--limit changed to --hilimit and --lolimit;
--altpreset parameter introduced which changes default behaviour for shaping and hilimit.
[shaping = 0.5*(max(0,q/10)+max(0,q/10)^2.584962)) -q 0 = 0; -q 5 = 0.3333; -q 10 = 1]
[hilimit = round(14000 + 2000 * max(0,q/10)) / samplerate * 64) * (64/samplerate)]

Change log 1.1.4k: 24/08/09
--postanalyse function modified to use existing spreading function.

Change log 1.1.4j: 23/08/09
--limit lower range changed to 10000Hz.

Change log 1.1.4h: 22/08/09
--limit lower range changed to 14500Hz.

Change log 1.1.4g: 20/08/09
--maxsnr removed. -p or --postanalyse parameter implemented. Using this parameter checks the noise level of the correction data and compares to the low value derived from the associated source audio. If the correction noise (i.e. that of the difference signal) is greater than the source audio low value then the bits_to_remove value is reduced for the codec-block until the added noise is lower. Code further tidied. -F or --fftw parameter removed as FFTW dll is now automatically used if found (slight speed-up makes this the fastest way to go). Stack error fixed which occurs when libfftw3-3.dll v3.2.2 is used (newly released).

Change log 1.1.4f: 24/07/09
Bug in --maxsnr parameter fixed. Bug in pure Delphi compile fixed.

Change log 1.1.4e: 22/07/09
Major code redevelopment - more units, hopefully clearer. New parameter: -Y, --maxsnr <n> which allows specification of difference between maximum FFT result and added noise. Maxsnr works with both default spreading and --sortspread. Link to FFTW Windows DLL download page.

Change log 1.1.4d: 07/06/09
Bug fixed whereby lossyWAV would crash if 'libfftw3-3.dll' could not be initialised. If --fftw parameter is used and the DLL cannot be found then lossyWAV will revert to the existing FFT routines and output a warning. Link to FFTW Windows DLL download page.

Change log 1.1.4c: 05/06/09
FFTW can now be optionally used for FFT analyses in lossyWAV. Use of FFTW requires the presence of "libfftw3-3.dll" on the host computer, somewhere on the path and the addition of -F or --fftw to the lossyWAV command line. FFT (Delphi and assembler) further optimised. General code tidy-up. Link to FFTW Windows DLL download page.

Change log 1.1.4b: 14/05/09
FFT (Delphi and assembler) further optimised. Radix-4 FFT implemented in assembler and Delphi and Radix-8 in Delphi. Significant speedup of Delphi FFT throughput.
General code tidy-up.

Change log 1.1.4a: 05/05/09
--sorspread parameter no longer takes an additional parameter, now on/off;
spreading function changed slightly - now properly computes old and new averages separately;
FFT Real routine corrected as was giving wrong signs of some complex output values (did not affect magnitude of results);

Change log 1.1.3k: 30/04/09
Fault-finding release #1 to attempt to determine cause of WINE incompatibility. (Successful!! smile.gif)

Change log 1.1.3j: 15/04/09
--sortspread parameter modified (again), now takes a parameter between 0 and 7, 2 is equivalent to beta 1.1.3i.
--centre parameter removed.
Reference_threshold tables removed in favour of direct calculation of the level of added noise due to bitdepth reduction using derived formula.

Change log 1.1.3i: 07/04/09
New --sortspread parameter modified (again). Bitrate matched with default spreading for my 55 problem sample set. Will revise table for my 10 Album Test Set.

Change log 1.1.3h: 05/04/09
New --sortspread parameter modified.
Removed - bug found.

Change log 1.1.3g: 02/04/09
New --sortspread parameter introduced for testing purposes.

Change log 1.1.3f: 31/03/09
New --centre and --underlap <n> parameters introduced for testing purposes; Revised source.

Change log 1.1.3e: 18/03/09
Removal of old and new spreading functions in favour of variant; Code tidy up - speed improvements for pure delphi compile; Revised source.

Change log 1.1.3d: 05/03/09
Bug fix (would crash with a range error sometimes); Speedup of --varspread code. Revised source.

Change log 1.1.3c: 24/02/09
Introduction of -V or --varspread parameter to enable variant spreading function - a hybrid between the old and the new. Revised source.

Change log 1.1.3b: 23/02/09
Bug-fix: high sample rates with 1.1.3 would cause a range-check error or random results. Revised source.

Change log 1.1.3: 22/02/09
Integration of data structures used in new and old spreading functions. Source release.

Change log 1.1.2j: 18/02/09
Implementation of -O or --oldspread parameter to enable the use of the spreading function used in v1.1.0b instead of the revised version currently under development. This gives very slightly different results to v1.1.0b as is to be expected due to the revision of the reference-threshold constants at beta v1.1.1d.

Change log 1.1.2i: 12/02/09
Addition of a -N or --nasty (-q -2.0) and -A or --awful (-q -4.0) to allow extremely low quality levels to be explored.

Change log 1.1.2h: 12/02/09
Addition of a -N or --nasty (-q -2.0) to allow extremely low quality levels to be explored.
Removed: Bug to be fixed.

Change log 1.1.2g: 10/02/09
Addition of a -r or --randombits parameter to randomise the zeroed lsbs.

Change log 1.1.2f: 09/02/09
Further modification to the spreading_function.

Change log 1.1.2e: 06/02/09
Further modification to the noise shaping process - first attempt to attenuate noise-shaping where bits_to_remove is zero for a particular codec block.

Change log 1.1.2d: 05/02/09
Further modification to the noise shaping process - audio data now no longer scaled prior to noiseshaping.

Change log 1.1.2c: 04/02/09
Further modification to the noise shaping process - noise shaping performed even when no bits removed.

Change log 1.1.2b: 03/02/09
Repair of the noise shaping process - now continuous for each channel rather than treating each codec-block totally separately;

Change log 1.1.2: 28/01/09
Code optimisations and data optimisations;
Revisions to the spreading function;

Change log 1.1.1e: 30/09/08
Interim beta, with source as reversion to Delphi complete (with conditional define to re-enable all IA-32/x87 code).

Change log 1.1.1d: 10/09/08
Further revision to the simplified spreading function - slightly higher bitrates than 1.1.1c but I'm happier with the method;
Reference-threshold constants re-calculated using more iterations (2^(32-fft-bit-length) iterations, i.e. 512K iterations for 8192 sample FFT and 128M iterations for 32 sample FFT) and for the first time taking into account FFT-result values less than 1. This only really affects bits-to-remove values between 1 and 7, which is in line with my expectation when I made the change to the noise-calculation method;

Change log 1.1.1c: 02/09/08
Further revision to the simplified spreading function;
Dither removed;

Change log 1.1.1b: 26/08/08
Revision to the simplified spreading function. All bin "averages" now calculated taking into account a variable proportion of bins to either side, i.e. "average" = (fft_result[i]+(fft_result[i-1]+fft_result[i+1])*factor)/(1+2*factor), where factor = 0.0 at 20Hz and 1.0 at 16kHz, with linear interpolation for intermediate values.

Change log 1.1.1a: 25/08/08
Fundamental simplification of spreading function methodology put forward for comment. All bin "averages" now calculated taking into account a fixed proportion of bins to either side, i.e. "average" = (fft_result[i]+(fft_result[i-1]+fft_result[i+1])*factor)/(1+2*factor), where factor = 0.26 in this case;
FFT result overall averaging now carried out prior to the spreading function rather than at the same time;
Reference_threshold constants revised slightly.

Change log 1.1.0b: 03/08/08
FFT lengths will now increase for higher bitrate audio, i.e. 88.2/96kHz, 176.4/192kHz and 352.8/384kHz;
improved logfile output and --detail output;
reference threshold constants for rectangular dither and triangular dither have been calculated so added noise should be the same for dither off and any dither level between 0 and 1 - the number of bits-to-remove will however reduce with "increasing" dither.
Nick.C
Some questions:
  • Do we need dither?
  • Do we need 32-bit integer processing?
  • Do we need the capability to create correction files?
I ask as these all add to the time taken to process files (even if the options themselves are not selected).

Comments / criticisms / brickbats welcomed as before.

I will acknowledge the usefulness of the correction file as a quick and automatic way of generating the difference signal between the lossless original and processed output (for scaling=1 only).
botface
QUOTE (Nick.C @ Aug 25 2008, 13:36) *
Some questions:
  • Do we need dither?
  • Do we need 32-bit integer processing?
  • Do we need the capability to create correction files?
I ask as these all add to the time taken to process files (even if the options themselves are not selected).

Comments / criticisms / brickbats welcomed as before.

I will acknowledge the usefulness of the correction file as a quick and automatic way of generating the difference signal between the lossless original and processed output (for scaling=1 only).

Nick,
I have never been sure when/if dither should be used and if so, how much. I've tried with and without and can't hear any difference so from my perspective it isn't needed

Again personally, I can't see a need for 32 bit integer. As a CoolEdit user I would be more interested in 32 bit float but in any event I'd actually be unlikely to use it.

Similarly correction files. I've never used them and don't suppose I ever will as I get transparent (to me) output from lossyWav.

I actually like lossyWav as it is. It does exactly what I need. I can see that other's needs might be different to mine though.

The only thing I'd like to see is a nice GUI front end - I've mentioned that a couple of times before. I don't mean to go on about it but it would make life a bit easier for me as a non-techie.
halb27
QUOTE (Nick.C @ Aug 25 2008, 14:36) *
Some questions:
  • Do we need dither?
  • Do we need 32-bit integer processing?
  • Do we need the capability to create correction files?
I ask as these all add to the time taken to process files (even if the options themselves are not selected).

Comments / criticisms / brickbats welcomed as before.

I will acknowledge the usefulness of the correction file as a quick and automatic way of generating the difference signal between the lossless original and processed output (for scaling=1 only).

My opinion on the questions:

a) dithering is not needed
b) I like to see further support for 24 bit depth input files (I replaygain with foobar using 24 bit WAV output files). I cannot imagine anybody needs a bit depth of 32 bit. In case that's what your question is about.
c) I like to be able to listen to the error signal. I don't need the correction file for reconstructing the original signal.

Great that you're still struggling so much to improve lossyWAV.
I'm just a bit sceptical about the new spreading approach. It changes the machinery in a quite significant way at the low frequency end, and I think we can be very content with the current machinery. Changing the machinery would mean we throw away the experience we have so far with lossyWAV's quality (and though the experience situation isn't optimal we do have some experience), and start experiencing quality again from the zero point - more or less). Doesn't spreading just mean averaging over a certain number of bins? With this in mind I wouldn't care whether or not the virtual center of the bins involved is identical with one of the real bins. At least this is how I understand the new spreading idea. Sure there are numerous ways of doing the averaging, but are there expectations for a real benefit when going the new way?
sauvage78
w00t.gif I was waiting for this thread wink.gif

I wish I could answer your questions but:
1: I don't know what lossywav can gain from dithering (the same as mp3 ? supposed more "natural" background noise ? I always thought dithering was made to soften frequency destruction effect so I don't get the use with lossywav) crying.gif
2: I don't get what you meant but my CPU is 32bits & my audio is 16/24Bits crying.gif
3: I already said I didn't use correction files in the other thread

I disagree with halb27 on the spreading function, if you must refrain experimentation because you fear to break the machine lossywav will never progress, you just need to be sure it worth it before making it a full release.
Also I don't need a gui personnaly, even if it would exist, I would use F2K. I agree it would be better than F2K for noobs allergic to command-line but it should be a lossywav/flac gui or the noob will end with a big wav file asking himself the purpose of such a useless codec wink.gif so maybe a fork of speek' flac frontend ... but not a lossywav gui alone ...

Note: I will most likely convert my whole lossless collection to lossywav after the 1.2.0 release, so I hope it will be VERY good wink.gif
halb27
QUOTE (sauvage78 @ Aug 25 2008, 20:14) *
.. you just need to be sure it worth it before making it a full release. ...

That's the problem.
Nick.C
One possible variant is to use 1 as a spreading value where 1.1.0b did and for all the values which exceed 1 use something else (1<value<2), i.e.
((2,2,2,2,2,2,2,2),(2,2,2,2,2,2,2,2),(2,2,2,2,2,2,2,2),(1,2,2,2,2,2,2,3),(1,1,2,2,2,2,2,3),(1,1,2,2,
2,2,3,4))
goes to
((SC,SC,SC,SC,SC,SC,SC,SC),(SC,SC,SC,SC,SC,SC,SC,SC),(SC,SC,SC,SC,SC,SC,SC,SC),(1,SC,SC,SC,SC,SC,SC,
SC),(1,SC,SC,SC,SC,SC,SC,SC),(1,1,SC,SC,SC,SC,SC,SC))
this should go some way to alleviate any concerns with respect to reducing quality as less averaging = lower minima.
Axon
I'll second a request for 32-bit float.
halb27
QUOTE (Nick.C @ Aug 25 2008, 22:18) *
One possible variant is to use 1 as a spreading value where 1.1.0b did and for all the values which exceed 1 use something else (1<value<2), i.e.
((2,2,2,2,2,2,2,2),(2,2,2,2,2,2,2,2),(2,2,2,2,2,2,2,2),(1,2,2,2,2,2,2,3),(1,1,2,2,2,2,2,3),(1,1,2,2,
2,2,3,4))
goes to
((SC,SC,SC,SC,SC,SC,SC,SC),(SC,SC,SC,SC,SC,SC,SC,SC),(SC,SC,SC,SC,SC,SC,SC,SC),(1,SC,SC,SC,SC,SC,SC,
SC),(1,SC,SC,SC,SC,SC,SC,SC),(1,1,SC,SC,SC,SC,SC,SC))
this should go some way to alleviate any concerns with respect to reducing quality as less averaging = lower minima.

If I understand it correctly you want to stay conservative when including more bins in the averaging compared to what we have now by applying a considerably smaller weight to the bins which are off-center to the highest degree (so a weight of >0.5 for the center bin of the 3 bin spreading replacing current 2 bin spreading?). Sounds good though I still can't see the potential advantage and why you aren't content with current spreading.
Nick.C
I am re-examining each major processing component in turn - it's the turn of the spreading function....

I've modified the spreading function so that at the bin corresponding to 20Hz the range is 1.0 and at 16kHz it is 2.0, with linear interpolation for intermediate bins.

lossyWAV beta 1.1.1b attached to post #1 in this thread.

[edit]
The concensus (and what David and SebastianG said earlier) seems to be that dither is not required within lossyWAV.

On the processing of 32-bit integer samples, I'll leave it in at the moment, but I don't think that there are many packages that would output them in favour of 32-bit float samples. I don't know if the method would work on 32-bit float samples - I have a feeling it would be difficult to determine how many bits precision to remove from a float value - unless it was a simple "reduce a 32-bit float value (23-bit mantissa) to a 24-bit float value (15-bit mantissa) by brute force...." process.

It seems that some people like the correction file for analysis rather than reversion to lossless - maybe the --merge parameter can go?
[/edit]
halb27
QUOTE (Nick.C @ Aug 26 2008, 09:26) *
... maybe the --merge parameter can go?

As for my needs: yes.
2Bdecided
The noise floor already "floats" in 32-bit float.

What do FLAC and other lossless encoders do wrt floating point data and "wasted bits"?

Depending on that, the appropriate lossyWAV processing could be tricky but useful, or pointless.

I only ever use 32-bit float files as intermediate files. Sometimes I archive them as-is, so I can re-work the project later. lossyWAV might be useful here, though TBH I haven't even bothered FLACing them because it's so rare that I do this. Other people might do this on a daily basis!

I have no experience of 32-bit integer audio files. 48-bit integer is common in audio processing (DSP IIR filtering etc), but never as an output.


IMO dither can go, if having the option available is slowing down processing even when it's not used.


If "Implementation of SG's new noise shaping method" means dynamic noise shaping, then depending on how aggressively you do this, it might be worth changing from rectangular spreading functions to something else entirely. I'm pointing this out because you might spend a long time playing with the current spreading function, only to dump it soon after. What you have is a narrow (fractional) version of something vaguely related to the ERB (equivalent rectangular bandwidth) scale - I reckon one day you'll end up with something which is a narrow (fractional) version of something vaguely related to overlapping critical band filters.

I can't help feeling that there's no more or less reason to have reconstruction with lossyWAV than with wavpack lossy, apart from the currently inevitable clunkiness of it. However, if the concept is there, it's another "tick" in the format comparison table, and someone can always come along and implement a more graceful re-uniting of the lossy and correction files later, if they feed the need. If you drop this ability entirely, this possibility is removed. Whether you support the merging in lossyWAV itself is up to you - having that available can't slow down encoding though, can it?

Cheers,
David.
krmathis
Suggestion: Cross-platform code, allowing Mac OS X and GNU/Linux users to take part of the fun... wink.gif
But that may be too huge of a task?
Axon
QUOTE (2Bdecided @ Aug 26 2008, 06:33) *
The noise floor already "floats" in 32-bit float. What do FLAC and other lossless encoders do wrt floating point data and "wasted bits"? Depending on that, the appropriate lossyWAV processing could be tricky but useful, or pointless.


FLAC doesn't even support floating point IIRC.

One could go both ways with that, and say that either lossyWAV has no need for floating point support, or that it provides a very nice way to gracefully encode the floating-point data. I'm leaning towards the latter.

The only issue I'd see otherwise is how to handle +0dbFS samples. I have no suggestions on how to handle that except perhaps to optionally right-shift the output by a few bits and scribble the gain down in the tags.

QUOTE
I only ever use 32-bit float files as intermediate files. Sometimes I archive them as-is, so I can re-work the project later. lossyWAV might be useful here, though TBH I haven't even bothered FLACing them because it's so rare that I do this. Other people might do this on a daily basis!
I would preferrably record my vinyl transcriptions in floating point on principle alone.

QUOTE
I have no experience of 32-bit integer audio files. 48-bit integer is common in audio processing (DSP IIR filtering etc), but never as an output.
Heh, if 32-bit float is going to be supported, why not 64-bit floating point too? It's a negligible code change.


In principle, the binning process used to establish critical band responses could be circumvented through clever frequency mapping. For instance, doing a frequency shift from 10khz down to say 100hz would mean that quantization noise that originally fit inside one bin could now fit in several. This could kick something into audibility.

Right?

What I'm getting at here is that perhaps more work should be put into tuning lossyWAV so that virtually all DSP effects/manipulations could not possibly cause an audibility difference, rather than merely ensuring that straight listening will not tease out a difference.
Nick.C
It would be possible to read 32-bit float values and write 32-bit integer values (having suitably scaled the output) - this would not change the file-size but only some of the fmt chunk information.

I'll look into it....

To fit in the range -2,147,483,648..2,147,483,647 the 32-bit float value would require to be scaled by a factor of 2^-97.

[edit] I've just been reading about the draft IEEE-754r standard and there will be a 16-bit float value in the range +/-1.## x 2^-15 to +/-1.## x 2^14 with a mantissa of 10 bits. This seems to open up the possibility of 11 bit precision in a 2^30 range, or taking what we know about lossyWAV into account effectively stores a 32-bit integer in a 16-bit float (albeit with reduced precision - but reduced precision is not proving to be a problem smile.gif).
Axon
A complete mapping of the floating point domain is unnecessary unless HDR techniques start creeping in from the video realm to the audio realm (which is rather unlikely). All I'd anticipate would be desired would be a bit shift from 0-4 bits if that.

... Not like I have any kind of valid use for that feature, so feel free to ignore it.
SebastianG
QUOTE (Nick.C @ Aug 25 2008, 13:16) *
1.2.0: Implementation of SG's new noise shaping method

Yay!
If you like to get a Matlab version of the code I sent you click here.

QUOTE (Nick.C @ Aug 25 2008, 13:16) *
1.2.0: Revisit the spreading function

Can you shed some more light on what's currently happening in this regard? What exactly is fft_result[k], why is there an averaging and what happens after the averaging?

QUOTE (Nick.C @ Aug 27 2008, 08:40) *
To fit in the range -2,147,483,648..2,147,483,647 the 32-bit float value would require to be scaled by a factor of 2^-97.

IIRC digital full scale is usually +/- 1.0 in float formats. So, in case you want to convert it to 24 bit ints, you should scale the floats by 2^23.

Cheers,
SG
GeSomeone
I fail to see how anyone that goes through the lengths to save audio in floating point format would want to use lossyWav on it unsure.gif
The first step would be to convert to something like 24bit integer IMO.
Nick.C
QUOTE (SebastianG @ Aug 27 2008, 14:06) *
Can you shed some more light on what's currently happening in this regard? What exactly is fft_result[k], why is there an averaging and what happens after the averaging?
The FFT_result array is created by taking the magnitude of the raw results of the complex fft analysis and multiplying by the corresponding skewing value.

These results have always been averaged over a number of bins to remove zero or very low single bins. The most recent method now only takes into account a proportion of the bins on either side of the target bin rather than bins one or two bins away from the target bin. I feel that this will still remove single low bins but will possibly be better than the former method.

btw, thanks very much for the Matlab method - I can read matlab, not C!

I see what you mean about 32-bit floats. However the easiest way would be to convert to 32-bit integers (in the first instance) - maybe 24-bit integers later.
2Bdecided
QUOTE (GeSomeone @ Aug 27 2008, 14:49) *
I fail to see how anyone that goes through the lengths to save audio in floating point format would want to use lossyWav on it unsure.gif
The first step would be to convert to something like 24bit integer IMO.
Not at all - often when I use 32-bit floats (in CEP), it's so I don't have to worry about clipping. It's not always about the quality at all - sometimes it's simple convenience.

Cheers,
David.
Nick.C
lossyWAV beta 1.1.1c attached to post #1 in this thread.

Further minor changes to the spreading-function, resulting in the following bitrates for my 10 album test set:
CODE
|===============|==========|==========|==========|==========|==========|==========|
|    Version    |   FLAC   | --insane |--extreme |--standard|--portable|  --zero  |
|===============|==========|==========|==========|==========|==========|==========|
|lossyWAV 1.1.0b| 854kbit/s| 632kbit/s| 548kbit/s| 463kbit/s| 376kbit/s| 285kbit/s|
|---------------|----------|----------|----------|----------|----------|----------|
|lossyWAV 1.1.1c| 854kbit/s| 627kbit/s| 542kbit/s| 457kbit/s| 373kbit/s| 281kbit/s|
|===============|==========|==========|==========|==========|==========|==========|
krmathis
Guess nobody liked my suggestion.
Oh well! ...at least it was worth a try... rolleyes.gif
Nick.C
I love the idea, I just don't have the development platforms or the experience to carry out the conversion.
sauvage78
I just spend the last 15 min testing LossyWAV V1.1.0b Vs. LossyWAV V1.1.1c Beta at -q 1 (Ginnungagap), in order to see if there was any regression before the big noise shaping jump, personnaly I couldn't hear any major regression/progressions so I guess the serious things for 1.2 can start now wink.gif

foo_abx 1.3.3 report
foobar2000 v0.9.5.5
2008/09/03 17:58:44

File A: C:\Documents and Settings\Sauvage.S\Bureau\Nouveau dossier\02- Ginnungagap Test Sample (Lossywav)b.lossy.tak
File B: C:\Documents and Settings\Sauvage.S\Bureau\Nouveau dossier\02- Ginnungagap Test Sample (Lossywav)o.lossy.tak

17:58:44 : Test started.
18:00:35 : 01/01 50.0%
18:01:43 : 02/02 25.0%
18:03:09 : 02/03 50.0%
18:03:54 : 03/04 31.3%
18:05:52 : 04/05 18.8%
18:07:42 : 05/06 10.9%
18:10:19 : 05/07 22.7%
18:11:52 : 06/08 14.5%
18:22:29 : 06/09 25.4%
18:22:57 : Test finished.

----------
Total: 6/9 (25.4%)

Edit1: Even if I reached 5/6 at the beginning I couldn't really tell what I was listening to (I mean inside the area of the usual artefact) ... so I think it was lucky guessing ...

Edit2: To be 100% sure it was lucky guessing I spend 10 min this morning to redo the test at -q 0 & I failed even more clearly, so for me there is no regression/improvement except the very small but welcomed kbps gain.

foo_abx 1.3.3 report
foobar2000 v0.9.5.5
2008/09/04 16:27:22

File A: C:\Documents and Settings\Sauvage.S\Bureau\Nouveau dossier\02- Ginnungagap Test Sample (Lossywav)b.lossy.tak
File B: C:\Documents and Settings\Sauvage.S\Bureau\Nouveau dossier\02- Ginnungagap Test Sample (Lossywav)o.lossy.tak

16:27:22 : Test started.
16:29:22 : 01/01 50.0%
16:30:29 : 02/02 25.0%
16:34:43 : 03/03 12.5%
16:35:57 : 03/04 31.3%
16:36:35 : 03/05 50.0%
16:38:30 : 03/06 65.6%
16:38:42 : Test finished.

----------
Total: 3/6 (65.6%)
Antonski
QUOTE (Nick.C @ Aug 25 2008, 14:16) *
Suggested foobar2000 converter setup:

lossyFLAC:
CODE
Encoder: c:\windows\system32\cmd.exe
Extension: lossy.flac
Parameters: /d /c c:\"program files"\bin\lossywav - --standard --silent --stdout|c:\"program files"\bin\flac - -b 512 -5 -f -o%d
Format is: lossless or hybrid
Highest BPS mode supported: 24
lossyTAK:
CODE
Encoder: c:\windows\system32\cmd.exe
Extension: lossy.tak
Parameters: /d /c c:\"program files"\bin\lossywav - --standard --silent --stdout|c:\"program files"\bin\takc -e -p2m -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24
lossyWV:
CODE
Encoder: c:\windows\system32\cmd.exe
Extension: lossy.wv
Parameters: /d /c c:\"program files"\bin\lossywav - --standard --silent --stdout|c:\"program files"\bin\wavpack -hm --blocksize=512 --merge-blocks -i - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24


Just out of curiosity, why there is no example for lossyAPE (MonkeyAudio)?
Sorry if this has already been mentioned somewhere, the lossyWAV threads are a bit too long smile.gif

~
Nick.C
Monkey's Audio does not make use of a "wasted-bits" feature as FLAC, TAK, Wavpack, WMA-Lossless, etc do. Therefore there is no space-saving benefit in using lossyWAV with Monkey's Audio, ALAC, etc.
krmathis
QUOTE (Nick.C @ Sep 2 2008, 15:27) *
I love the idea, I just don't have the development platforms or the experience to carry out the conversion.

Fair enough, especially the lack of experience part.
But regarding development platform I am quite sure all you need is a GNU/Linux distro with GCC, ...
Nick.C
And a knowledge of C which I don't have - lossyWAV is Delphi & IA-32 assembler....
[JAZ]
QUOTE (Nick.C @ Sep 4 2008, 17:25) *
And a knowledge of C which I don't have - lossyWAV is Delphi & IA-32 assembler....


http://www.lazarus.freepascal.org/

I briefly tried it and one of the first problems is dealing with the "uses Windows" import.
But you should be able to workaround that.

If i've read it correctly, it runs in several platforms and *is able to crosscompile* (as in compiling in windows for linux).
Nick.C
Thanks [JAZ], that's certainly worth a look - if it means that other platforms can be accessed simply by changing my compiler I'll give it a try!
servimo
QUOTE
lossyFLAC:
CODE
Encoder: c:\windows\system32\cmd.exe
Extension: lossy.flac
Parameters: /d /c c:\"program files"\bin\lossywav - --standard --silent --stdout|c:\"program files"\bin\flac - -b 512 -5 -f -o%d
Format is: lossless or hybrid
Highest BPS mode supported: 24
What happen if I use lossywav - --standard and flac -8 ? will I have more compression? and why not use it in this default config for foobar2000? if there is some problem.

I did some tests in a little file (an acoustic guitar flac file), this is what happen:

original file:
7.741.294 bytes bitrate 572kbps

lossywav - --standard flac -5:
5.722.908 bytes bitrate 423kbps

lossywav - --standard flac -8: (here I lose the tags(?)) don't know if I did something wrong here, but in the next conversion the tags are all there.
5.713.914 bytes bitrate 422kbps

lossywav - --insane flac -8: (the size is increased)
8.029.224 bytes bitrate 593kbps
sauvage78
servimo:
RTFM
-b 512
servimo
I should have did it this way: FLAC -> WAV -> lossyFLAC
Is this?
Sorry smile.gif I didn't see it is a Development thread and there is others threads. I was just looking for some explanation about lossyWAV
Nick.C
servimo, the foobar2000 settings work perfectly - if you're having problems, check the converter settings carefully. Please ensure that the "-b 512" remains in the flac element of the command line as this is what ensures optimal flac block length during encoding.

Alternatively, if you create a batch file containing the following:
CODE
@if exist "%1" flac -d "%1" --stdout --silent|lossywav - --stdout --standard|flac - -b 512 -o "%~n1.lossy.flac" --silent && tag --fromfile "%1" "%~n1.lossy.flac"
and drag-n-drop single files onto it then that should also work.
halb27
QUOTE (servimo @ Sep 5 2008, 05:04) *
...lossywav - --insane flac -8: (the size is increased)...


a) -b 512 usually is essential as sauvage78 said.

b) You encoded a solo instrument. This is the situation where lossyWAV + FLAC doesn't come out well.
If you can use WavPack or TAK instead of FLAC the situation is better.
I have a series of tracks like that, but as they form a very minor portion of my total collection I don't care
and keep using FLAC. It doesn't sound good that in these cases lossless wavPack yieds smaller files than
lossyWAV + FLAC, but in practice it's insignificant. Moreover though in theory lossyWAV + FLAC is lossy in
these cases the error of the procedure is extremely close to zero if not really zero.
[JAZ]
QUOTE (servimo @ Sep 5 2008, 05:04) *
What happen if I use lossywav - --standard and flac -8 ?


The usage of flac -5 vs any higher compression was chosen because of the nature of FLAC.
Higher settings usually improve on compression, just because they can work on a bigger chunk of samples. But since lossyWav needs to work on a small chunk (so that it can maximize the reduction of bits. That's why the -b 512 setting is required for optimum results), the gains are few, while the encoding time increase.
servimo
In all those tests I'll keep the -b 512 I didn't change anything alse excepts the parts bold in my post above, --standard and -5. It just happen that when I used -8 for FLAC I didn't have the tags copied between. Nothing more...
I have a DVD with various albums in wav format and I encoded some of them to lossyFLAC and I like the result. The compression and what I hear is very good. The only thing, I think I could notice is that mp3 is more muffled(Google translation) than the lossyFLAC and it reminds me a little of the AAC convertion when I hear it.
As i said above I will keep using these defaults settings, not much gain at all if I use FLAC best compression.
uart
QUOTE (servimo @ Sep 5 2008, 12:56) *
The only thing, I think I could notice is that mp3 is more muffled(Google translation) than the lossyFLAC and it reminds me a little of the AAC convertion when I hear it.


So far only one person has posted transcode listening tests and they had trouble hearing any difference in the mp3's even on problem samples and with "lossyway -P" setting. See http://www.hydrogenaudio.org/forums/index....showtopic=65637

My understanding was that (theoretically at least) the lack of psychoacoustics should make it a very good format for transcoding. Can anyone confirm is that correct?
halb27
QUOTE (uart @ Sep 6 2008, 18:01) *
...My understanding was that (theoretically at least) the lack of psychoacoustics should make it a very good format for transcoding. Can anyone confirm is that correct?

It is expected to be a very good format for transcoding though just 1 test backs this up. This test was done at low quality setting -q 1, and as people with the target of transcoding are expected to use a higher quality setting this should give some confidence regarding lossyWAV for this one test.
uart
I'm new to lossywav and I've just tested started testing it with a few files. So far I really like it. With the "lossywav - S" setting I'm getting about one half the files size of my previous "monkey audio -extreme" files. smile.gif

A couple of questions though.

1. When I make a correction file it only seems to contain a small amount of noise without even vestige of the original music. I'm guessing this is a good thing. Is that small noise in the correction file exactly equal to the added noise in the lossywav file or is correspondence between the two more indirect.

2. I see that currently people are mostly interested in the correction file for the purpose of inspection only. Say however that I wanted to keep the correction file for archiving purposes, the correction file seems to compress much more poorly than the actual lossywav so the total storage is larger then with lossless (tak or monkeyaudio). Does anyone know if it would be possible (at least in theory) to make the correction file more compressible, or perhaps for a compressor that "understood it" to compress it more efficiently?
uart
HELP I'm getting double file extentions in all my lossy.tak's converted from foobar (latest 0.9.5.5).

I used the following guide to set up a custom command line encoder in foobar : http://wiki.hydrogenaudio.org/index.php?title=LossyWAV

It works perfectly except that the output files are named like "my_song_title.lossy.tak.lossy.tak" and I cant figure out why the double extention. Can anybody help?

BTW, here's the exact code as per the guide. I have exactly this except for different path names where appropriate.

CODE
lossyTAK settings:

Encoder: c:\windows\system32\cmd.exe
Extension  : lossy.tak
Parameters : /d /c c:\"program files"\bin\lossywav - --standard --silent --stdout|
             c:\"program files"\bin\takc -e -p2m -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24
Nick.C
uart:

1: The correction file is made up of the difference between the lossless original and the bit-removed samples. Essentially it is white noise, louder where more bits have been removed.

2: The compressibility of the correction file has long been an issue - I suppose if we could find a compressor which would handle it better we might not need lossyWAV at all!

Transcoding problem: try deleting the contents of the "extension" box in the converter settings in foobar2000, then retype "lossy.tak".
uart
Thanks for the info Nick.

QUOTE (Nick.C @ Sep 6 2008, 11:14) *
Transcoding problem: try deleting the contents of the "extension" box in the converter settings in foobar2000, then retype "".


No that didn't fix it.

BTW, if I remove the "lossy.tak" from the extention box then the conversion fails. When I put it back in it works but with the double extentions.

When I right-click (in foobars playlist) the file I want to convert and select "convert to..." etc, just before it does the conversion it pops up a "save as" dialog and the filename in the dialog is "my_song_name.lossy.tak. However when I click save and it start converting the "converter" states that the destination is "my_song_name.lossy.tak.lossy.tak

If I edit the save dialog and delete the ".lossy.tak" extention before I hit save then it names the file correctly. That is, if make the save file dialog read just "my_song_name" without any extention, then when I hit save it adds the correct extentions just once. This works ok but I have to do it manually every time.

BTW. I have my file system settings (in winxP) set to display file extentions (the default windows XP setting is not to do so, but I'm guessing that many others like myself enable it). Could that be a problem?
Nick.C
I have all file extensions visible by choice too.

How are you naming your files in the converter?

I use:
CODE
[[%album artist% - ][$char(91)%date%$char(93)] %album%\][%discnumber%-]%tracknumber% - %artist% - %title%
for single tracks and:
CODE
[%album artist% - ][$char(91)%date%$char(93) ]%album%[ - CD%discnumber%]
for albums.
uart
UPDATE.

Previously I've just been testing this with a single file. Just now I tested it with multiple files selected for conversion. Whereas a single files displays the actual filename in the save dialog box in the case of multiple files it only display the destination folder name. Well what do you know, it names the multiple files correctly.

I'd still be interested to know if there's a fix, but at least the work-around is not so bad now that I know I only need to do it (that is, to delete the extention in the save dialog box before clicking save) for the case of converting single files.

QUOTE (Nick.C @ Sep 6 2008, 11:38) *
I have all file extensions visible by choice too.

How are you naming your files in the converter?

I use:
CODE
[[%album artist% - ][$char(91)%date%$char(93)] %album%\][%discnumber%-]%tracknumber% - %artist% - %title%
for single tracks and:
CODE
[%album artist% - ][$char(91)%date%$char(93) ]%album%[ - CD%discnumber%]
for albums.


I haven't edited those fields. They currently read,

Single track
CODE
[%list_index% ]%title%


Album Images
CODE
[%album artist% - ]%album%


Is that a problem?
Nick.C
Your track / album naming strings shouldn't be a problem. I am at a loss with respect to a solution as I have never encountered this phenomenon using foobar2000.
uart
Thanks anyway Nick. I've just cut and pasted your filename settings into foobar and they work fine. With your settings it no longer puts the filename in the save dialog box, it just puts the folder name and everything works fine. I guess I'll have to learn the syntax of those foobar settings if I want to change anything there, otherwise I'll just keep your setting.
foosion
My theory is that some part of the software checks if you have set the file name has already the extension .lossy.tak, but compares it against .tak only. Of course, that doesn't match, so it will helpfully append .lossy.tak to the file name. Since this occurs only when converting single files, I suspect that the culprit may be the standard Windows "Save As" dialog, but I'm not sure.
sauvage78
As I wanted to know if it was usefull to include a lowpass filter to lossywav (with the hope to save some space) I did a quick test with Adobe Audition:

Album: Darkness, The - 2003 - Permission To Land:

CDImage Original + Tak -p2e ==> 286mo
CDImage Lowpass 20Khz + Tak -p2e ==> 280mo
CDImage Original + Lossywav Portable + Tak -p2e ==> 97.3mo (102 092 800 octets)
CDImage Lowpass 20Khz + Lossywav Portable + Tak -p2e ==> 97.3mo (102 054 995 octets)

Album: Fantômas - 2001 - The Director's Cut

CDImage Original + Tak -p2e ==> 246mo
CDImage Lowpass 20Khz + Tak -p2e ==> 243mo
CDImage Original + Lossywav Portable + Tak -p2e ==> 97.3mo (102 080 296 octets)
CDImage Lowpass 20Khz + Lossywav Portable + Tak -p2e ==> 97.3mo (102 090 392 octets)

needless to say it is completely useless sad.gif ... but I still wanted to share the result with everyone ...
Gow
Anyone know of a way to set up up a Lossy WMA-L string for foobar2000? Zune 80 supports WMA-Lossless and I was reading WMA-L also works with Lossy WAV, so I figured I would go Lossy WMA-L and replace the Mp3/m4a library that I sync my Zune with.

Though so far it looks like I am going to have to convert to another lossy WAV format and then to WMA-Lossless if I use just foobar2000.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.