Help - Search - Members - Calendar
Full Version: lossyWAV Development
Hydrogenaudio Forums > Hydrogenaudio Forum > Uploads
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
jesseg
well, i got bored again...

IPB Image

it's really easy to change the two colors without having to redo anything else, so if you have any ideas, let me know. And again, if anyone want to take this and run with it, please PM me and I'll provide it in whatever format you need.

I'll make another icon based on this too, perhaps tomorrow. smile.gif

[edit]
click here to see it on different colored backgrounds
all of those are actually the same exact PNG file as the one i put in this post. smile.gif
[/edit]

[edit2]
here's the logo, "naked", if you wanna see it alone.

IPB Image

[/edit2]
halb27
QUOTE(Nick.C @ Dec 9 2007, 15:10) *

...@Halb27: I think there might me some benefit in reducing the C at the end of the 1024 fft spf to, say, 9, to reduce the number of bins being averaged.

IMO that's the right direction, and I did first trials, but not with the 1024 sample FFT but with the 64 sample FFT the resolution of which is fine IMO for judging about the highest frequency zones and which has a good time resolution which may be essential for samples like eig. So far I've seen the second highest frequency zone is most important for eig. 22225 yields quite a good though not perfectly transparent result. I'm pretty busy now but I'll try whether 22224 (as of -2) will improve things. But I guess we'll also have to come down from -nts +6 a bit. We'll see.
Nick.C
QUOTE(halb27 @ Dec 10 2007, 09:00) *
QUOTE(Nick.C @ Dec 9 2007, 15:10) *
...@Halb27: I think there might me some benefit in reducing the C at the end of the 1024 fft spf to, say, 9, to reduce the number of bins being averaged.
IMO that's the right direction, and I did first trials, but not with the 1024 sample FFT but with the 64 sample FFT the resolution of which is fine IMO for judging about the highest frequency zones and which has a good time resolution which may be essential for samples like eig. So far I've seen the second highest frequency zone is most important for eig. 22225 yields quite a good though not perfectly transparent result. I'm pretty busy now but I'll try whether 22224 (as of -2) will improve things. But I guess we'll also have to come down from -nts +6 a bit. We'll see.
I've tried -3 -spf 22234-22235-22346-22357-22468 and it raises the bitrate to 412.3kbps for my sample set. It takes about 0.1bits off the number removed from eig.
halb27
QUOTE(Nick.C @ Dec 10 2007, 11:34) *

I've tried -3 -spf 22234-22235-22346-22357-22468 and it raises the bitrate to 412.3kbps for my sample set. It takes about 0.1bits off the number removed from eig.

My trial yesterday was with -3 -spf 22225-22235-22346-22357-224FF and bits to remove for eig went down significantly (~ 1 bit in the critical first seconds). So I think the 2nd highest frequency zone is essential here, maybe the highest zone as well. Average bitrate of regular music did not go up significantly btw.
Nick.C
QUOTE(halb27 @ Dec 10 2007, 11:48) *
QUOTE(Nick.C @ Dec 10 2007, 11:34) *
I've tried -3 -spf 22234-22235-22346-22357-22468 and it raises the bitrate to 412.3kbps for my sample set. It takes about 0.1bits off the number removed from eig.

My trial yesterday was with -3 -spf 22225-22235-22346-22357-224FF and bits to remove for eig went down significantly (~ 1 bit in the critical first seconds). So I think the 2nd highest frequency zone is essential here, maybe the highest zone as well. Average bitrate of regular music did not go up significantly btw.
You're right - it does indeed bring down eig quite a lot. How about a combination: 22225-22235-22346-22357-22468? This yields 414.0kbps for my 53 sample set.
halb27
I tried eig again using -3 -spf 22224-22236-22347-22358-2246C thus being more demanding with the two highest frequency zones at the 64 sample FFT. Now I can't abx eig any more.
But this is pretty much on the cutting edge for my listening experience, and I'm sure there are a lot of people out there with a better sensitivity towards temporal resolution problems. So I suggest we reduce the positive nts values and use -nts +3 for -3, and -nts 0 for -2.

-3 -spf 22224-22236-22347-22358-2246C -nts 3 yields 375 kbps with my regular set, -2 -nts 0 yields 422 kbps.

To me this is still a very good result not far away from the rsults of the current setting, and it brings us to a considerable amount back to the solid basis where in theory -nts should be 0.
Nick.C
QUOTE(halb27 @ Dec 10 2007, 20:36) *
I tried eig again using -3 -spf 22224-22236-22347-22358-2246C thus being more demanding with the two highest frequency zones at the 64 sample FFT. Now I can't abx eig any more.
But this is pretty much on the cutting edge for my listening experience, and I'm sure there are a lot of people out there with a better sensitivity towards temporal resolution problems. So I suggest we reduce the positive nts values and use -nts +3 for -3, and -nts 0 for -2.

-3 -spf 22224-22236-22347-22358-2246C -nts 3 yields 375 kbps with my regular set, -2 -nts 0 yields 422 kbps.

To me this is still a very good result not far away from the rsults of the current setting, and it brings us to a considerable amount back to the solid basis where in theory -nts should be 0.
I will happily agree the -spf parameter for -3.

As you can't abx any of the problem samples using -3 -spf 22224-22236-22247-22358-2246C -nts +6.0, I feel that a reduction of 3dB (0.5 bits potentially) for the -nts parameter is a bit too much.

-3 -spf 22224-22236-22347-22358-2246C -nts +3.0 results in 433.5kbps for my sample set and changing the +3.0 to +4.5 results in 422.7kbps for my sample set.

So, I suggest we use -spf 22224-22236-22347-22358-2246C -nts +4.5 for quality preset -3.
halb27
QUOTE(Nick.C @ Dec 10 2007, 23:23) *

As you can't abx any of the problem samples using -3 -spf 22224-22236-22247-22358-2246C -nts +6.0, I feel that a reduction of 3dB (0.5 bits) for the -nts parameter is a bit too much.

-3 -spf 22224-22236-22347-22358-2246C -nts +3.0 results in 433.5kbps for my sample set and changing the +3.0 to +4.5 results in 422.7kbps for my sample set.

So, I suggest we use -spf 22224-22236-22347-22358-2246C -nts +4.5 for quality preset -3.

For a real-life impression why don't you take a restricted selection of full length tracks? If bitrate goes up for problematic tracks like those in your sample set this is welcome. It's not so welcome of course with regular music.
My regular set consists of just 12 full length tracks of various musical direction so I can get at an impression very fast. I know from posted experience that my 375 kbps result is a bit low compared to other musical mixtures reported, but the difference isn't a big one. From this experience I think it's safe to say when the result of my regular set is 375 kbps then average bitrate is ~380 kbps.
Sure ~380 kbps is a bit more than the ~350 kbps of the current -3 setting, but it's not by much IMO.
If it's up to decide between -nts +3 and -nts +4.5 the difference is even smaller.
The reason why I dislike a rather small lowering +6 to +4.5 is that I do think that small -nts steps don't have a significant effect. This is a bit due to my listening experience when using insane positive nts values.
So I think in order to have a significant quality effect we shouldn't consider a delta lower than 3 for nts.
Not for the sake of saving ~15 kbps.
Of course this is because if in doubt I want to play it safe. Just my attitude towards it.
Nick.C
QUOTE(halb27 @ Dec 10 2007, 21:49) *
QUOTE(Nick.C @ Dec 10 2007, 23:23) *
As you can't abx any of the problem samples using -3 -spf 22224-22236-22247-22358-2246C -nts +6.0, I feel that a reduction of 3dB (0.5 bits) for the -nts parameter is a bit too much.

-3 -spf 22224-22236-22347-22358-2246C -nts +3.0 results in 433.5kbps for my sample set and changing the +3.0 to +4.5 results in 422.7kbps for my sample set.

So, I suggest we use -spf 22224-22236-22347-22358-2246C -nts +4.5 for quality preset -3.

For a real-life impression why don't you take a restricted selection of full length tracks? If bitrate goes up for problematic tracks like those in your sample set this is welcome. It's not so welcome of course with regular music.
My regular set consists of just 12 full length tracks of various musical direction so I can get at an impression very fast. I know from posted experience that my 375 kbps result is a bit low compared to other musical mixtures reported, but the difference isn't a big one. From this experience I think it's safe to say when the result of my regular set is 375 kbps then average bitrate is ~380 kbps.
Sure ~380 kbps is a bit more than the ~350 kbps of the current -3 setting, but it's not by much IMO.
If it's up to decide between -nts +3 and -nts +4.5 the difference is even smaller.
The reason why I dislike a rather small lowering +6 to +4.5 is that I do think that small -nts steps don't have a significant effect. This is a bit due to my listening experience when using insane positive nts values.
So I think in order to have a significant quality effect we shouldn't consider a delta lower than 3 for nts.
Not for the sake of saving ~15 kbps.
Of course this is because if in doubt I want to play it safe. Just my attitude towards it.
Tomorrow morning I'll process the 10 albums previously used for bitrate comparison using your proposal for the -3 quality preset.

I had a thought - if we end up with, say, 380kbps then that's still a bit less than OGG q 10 (circa 400kbps) and I'm not worried as it will be only about 60kbps above the upper bitrate limit for standard MP3.

I would be content with that.

Overall I am more concerned with the quality of the processed output than I am with the bitrate.

As an aside, I recently ordered a 16GB compact flash (to go with my 3 x 4GB SD Cards) for my iPAQ - lots more space for my .lossy.FLAC collection smile.gif ! Combined with Mortplayer using GSPFLAC.DLL it's working well.

[edit] Corrected OGG max bitrate [/edit]
Nick.C
QUOTE(Nick.C @ Dec 10 2007, 21:59) *
I'll process the 10 albums previously used for bitrate comparison using your proposal for the -3 quality preset. ........
lossyWAV beta v0.5.8 attached: Superseded.

Modified -1, -2 & -3 quality presets.
CODE
lossyWAV beta v0.5.8 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme settings [4xFFT] (-cbs 512 -nts -3.0 -skew 36 -snr 21
              -spf 22224-22225-11235-11246-12358 -fft 11011)
-2            default settings [3xFFT] (-cbs 512 -nts  0.0 -skew 36 -snr 21
              -spf 22224-22235-22346-12347-12358 -fft 10101)
-3            compact settings [2xFFT] (-cbs 512 -nts +3.0 -skew 36 -snr 21
              -spf 22224-22235-22347-22358-2246C -fft 10001)

Standard Options:

-o <folder>   destination folder for the output file
-nts <n>      set noise_threshold_shift to n dB (-48.0dB<=n<=+48.0dB)
              (-ve values reduce bits to remove, +ve values increase)
-force        forcibly over-write output file if it exists; default=off

Codec Specific Options:

-wmalsl       optimise internal settings for WMA Lossless codec; default=off

Advanced / System Options:

-shaping      enable fixed shaping using bit_removal difference of previous
              samples [value = brd(-1)/4]; default=off
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (-215.0dB<=n<=48.0dB) Increasing value reduces bits to remove.
-skew <n>     skew fft analysis results by n dB (0.0db<=n<=48.0db) in the
              frequency range 20Hz to 3.45kHz
-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 22235-22236-22347-22358-2246C (Characters must be one of
              1 to 9 and A to F (zero excluded).
-fft <5xbin>  select fft lengths to use in analysis, using binary switching,
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

David Robinson for the method itself and motivation to implement it in Delphi.
Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
Summary of bitrates for 10 album test set.
CODE
Conversion using lossyWAV beta v0.5.8, FLAC -8
|=======================================|=========|=========|=========|=========|
|Album                                  | FLAC -8 |  lW -1  |  lW -2  |  lW -3  |
|=======================================|=========|=========|=========|=========|
|AC/DC - Dirty Deeds Done Dirt Cheap    | 781kbps | 468kbps | 417kbps | 366kbps |
|B52's - Good Stuff                     | 993kbps | 476kbps | 421kbps | 376kbps |
|David Byrne - Uh-Oh                    | 937kbps | 464kbps | 413kbps | 363kbps |
|Fish - Songs From The Mirror           | 854kbps | 451kbps | 399kbps | 357kbps |
|Gerry Rafferty - City To City          | 802kbps | 468kbps | 416kbps | 366kbps |
|Iron Maiden - Can I Play With Madness  | 784kbps | 486kbps | 437kbps | 387kbps |
|Jean Michel Jarre - Oxygene            | 773kbps | 538kbps | 475kbps | 422kbps |
|Marillion - The Thieving Magpie        | 790kbps | 473kbps | 421kbps | 373kbps |
|Mike Oldfield - Tr3s Lunas             | 848kbps | 491kbps | 436kbps | 389kbps |
|Scorpions - Best Of Rockers N' Ballads | 922kbps | 492kbps | 437kbps | 378kbps |
|=======================================|=========|=========|=========|=========|
|Average                                | 850kbps | 480kbps | 426kbps | 376kbps |
|=======================================|=========|=========|=========|=========|
|53 sample "problem" set                | 784kbps | 543kbps | 491kbps | 434kbps |
|=======================================|=========|=========|=========|=========|
halb27
Thank you, Nick.
UED77
For all of the developers involved with this project, I'd first like to compliment you for coming up with such a wonderful idea. I've been tracking this thread for a while, and am greatly impressed by the progress biggrin.gif

My question relates to the identification of LossyWAV files. I'm sure it's a lot of [unnecessary?] hassle, but it might be advantageous to somehow note in a RIFF chunk the fact that the file is a LossyWAV file, and perhaps include a note of the settings used to create the file. APEv2 tags are also an option, though I hear [I have not confirmed this personally] that some software has trouble with APE tags at the end of RIFF files. RIFF mechanisms would be preferable as this information would be stored by some codes (WavPack for sure, others?) and could be re-pasted into a file when uncompressed.

Yes, I am aware that lossily compressed audio can be decompressed to WAV files without the WAV file being tagged in any special way, but if some tagging mechanism that would be performed by LossyWAV were in effect, it would be immediately obvious that the file is not losslessly compressed.

Then again, if it's more trouble that it is worth, then forget it, but if it's doable, then it would be a nice feature to have. In an ideal world, then lossless encoding tools could even read this header/footer/info tag and adjust blocksizes accordingly, allowing the user to get efficient results without personally knowing that the file has been pre-processed.

Regards,

UED77

[Edit: I just browsed back a couple pages to some posts I've previously missed and spotted a discussion about the possibility of including checksums. Unfortunately, I found the response to that a bit complicated to understand, so I don't know if it was ruled feasible or not.]
Nick.C
QUOTE(UED77 @ Dec 11 2007, 17:40) *
.... but it might be advantageous to somehow note in a RIFF chunk the fact that the file is a LossyWAV file, and perhaps include a note of the settings used to create the file....

....spotted a discussion about the possibility of including checksums....
Thanks for the input and appreciation. We're having fun with this project smile.gif!

When I work out how the WAV format works (Halb27 did all the difficult bit by writing the WAV I/O routines) I'll try to add a check for a relevant chunk with lossyWAV data in it and if none exists, create one with something like "lossyWAV <version> <quality setting> <list of other settings that were actually used> <CRC32 of output samples>"

I've just removed any reliance on knowing how large the file is (with a view to piped input, though how I pipe in from, say, Foobar2000 then pipe out to, say, FLAC, I haven't a clue as yet. Could it be as simple "lossywav - <quality setting> | flac -8 - -o<output_filename>"?) from the code.

Another thing on the list of things to do is to allow the parallel creation of a correction file, presumably with the same RIFF chunk in it, to allow recreation of the lossless original.

Thanks again,

Nick.
jesseg
bug:
IPB Image

bug on greyscale samples:
http://ictybtihky.com/lossywav/bug.htm

and the icon based on the bug:
http://ictybtihky.com/lossywav/lossyWAVicon.ico
the icon is hand pixeled, has transparency, comes in the 4 sizes (48, 32, 24, and 16), and clocks in at only 3.16kb. smile.gif
Nick.C
I had a PM from stel with some concerns:

QUOTE
I've come across what I think is a problem sample. Its a bit strange in that the sample seems to play OK on one of my DAPs, but I can hear sound distortion/ sound breaking up in places when played on the other DAP. What puzzles me more is that the original FLAC plays without a problem on both DAPs and this is what leads me to believe its an encoder problem. I first spotted the issue on beta 5.4 but I've just tried it on 5.8 and get the same results.

The DAPS are both rockboxed Sansa E280 & iAudio M5 and the earphones are Shure SE530. I can only hear the problem sample on the M5. The sample is 'Groove Armada - Soundboy Rock - Lightsonic' and although I hear issues throughout the track I've noted that it definitely happens at 4.30sec, the average bitrate for this track is 528kbps. when encoded using the standard -3 lossyWAV settings.

Even more annoying is that I cannot hear the distortion on my PC using an AMP & Sennheiser HD650's either.

I've also come across the problem on a different album but I need to dig this out again because I forgot to take a note when it happened.

Are you interested in investigating this further? What would you need?

I've also come across two samples by 'Shakespears Sister\Long Live The Queens!' album where the average bitrate for -3 encoding is 696kbps & 711kbps. Could these prove useful to you?
If anyone else has had any experience of this, please add samples of up to 30 seconds of the portion of the track in question to this thread.

Many thanks,

Nick.
stel
Sorry for not posting these sooner. I've been out all day.
I've attached 10 second samples. The issue happens 5 seconds in when encoding the sample.flac file.
Sorry, the M5_issue.flac isn't great quality but you can clearly hear the issue I'm experiencing.
If you need any more info, give me a shout.
I'm going to try and find the other track I've got problems with.

Edit: Oh no, look at the post number... I'm not the devil, honest smile.gif

Thanks
Steve
Nick.C
QUOTE(stel @ Dec 16 2007, 19:24) *
Sorry for not posting these sooner. I've been out all day.
I've attached 10 second samples. The issue happens 5 seconds in when encoding the sample.flac file.
Sorry, the M5_issue.flac isn't great quality but you can clearly hear the issue I'm experiencing.
If you need any more info, give me a shout.
I'm going to try and find the other track I've got problems with.

Edit: Oh no, look at the post number... I'm not the devil, honest smile.gif

Thanks
Steve
Thanks for the samples - I'll have a listen....

Just a thought, but which FLAC setting are you using? It appears to me that -8 will require more CPU than, say, -3. Halb27 and Mitch 1 2 found that -3 -e -m -r 2 -b 512 works really well (only about 1kbps difference from -8), and probably takes less effort to decode.
halb27
Thanks for your sample, stel.

I can hear the distortion with your M5 sample at ~ sec.5, but not when encoding sample.flac myself using lossyWAV -3 - the same experience you have on your computer.

How can we figure out whether it's an encoder problem (the fact that lossless FLAC works fine sounds like that) or a problem specific to the iAudio M5 (the fact that on a computer and on your other DAP there is no problem sounds like that)?

I suggest you do some other encodings using -2, -1, -1 -nts 6, -1 -nts 9, -1 -nts 12, ... and report what happens.
There's a lot of clipping in this sample. Can you reduce the volume a bit by using for instance a wav editor, and have a look whether the problem remains?

Nick, I guess when using -detail if you report a -1 as the number of bits removed this means no bit is removed due to clipping prevention (or does it say: due to clipping prevention not all the bits have been removed that could have been removed if we wouldn't use clipping prevention)?
Can you imagine there is a problem when no bit is removed due to clipping prevention in one block, and ~10 bits are removed in the next block?
Nick.C
QUOTE(halb27 @ Dec 16 2007, 21:52) *
Nick, I guess when using -detail if you report a -1 as the number of bits removed this means no bit is removed due to clipping prevention (or does it say: due to clipping prevention not all the bits have been removed that could have been removed if we wouldn't use clipping prevention)?
Can you imagine there is a problem when no bit is removed due to clipping prevention in one block, and ~10 bits are removed in the next block?
It *shouldn't* report -1 bits removed *ever*. That means there's a bug in that bit of the code.

On your point about 0 btr in one codec_block then 10 btr in the next, I don't really know.....
halb27
QUOTE(Nick.C @ Dec 17 2007, 00:04) *

It *shouldn't* report -1 bits removed *ever*. That means there's a bug in that bit of the code.

There are a lot of -1's when using -3 -detail.
QUOTE(Nick.C @ Dec 17 2007, 00:04) *

On your point about 0 btr in one codec_block then 10 btr in the next, I don't really know.....

Just my thought when looking at the -detail report interpreting '-1' as '0 btr'.
stel
Thanks for your replies gents,
Nick, I'm using flac settings -3 -m -e -r 2 -b 512 at the moment, but I had the same problem using -5 & -8

One thing I haven't tried yet is to play the actual lossyWAV file, I will try this tonight along with your suggestions halb27.

Also, I have seen the -1 bits removed on several of my encodings in the past due to clipping prevention, are you saying this could have a large impact on the encoding?

Also regards the clipping, should I be using replaygain on this type of track? I've never used replaygain before because I've always been under the impression that its processing the sound so its no longer sounds like the original.
Nick.C
QUOTE(stel @ Dec 17 2007, 07:28) *
Thanks for your replies gents,
Nick, I'm using flac settings -3 -m -e -r 2 -b 512 at the moment, but I had the same problem using -5 & -8

One thing I haven't tried yet is to play the actual lossyWAV file, I will try this tonight along with your suggestions halb27.

Also, I have seen the -1 bits removed on several of my encodings in the past due to clipping prevention, are you saying this could have a large impact on the encoding?

Also regards the clipping, should I be using replaygain on this type of track? I've never used replaygain before because I've always been under the impression that its processing the sound so its no longer sounds like the original.
Replaygain (applied to the file, rather than appended) will only ever change amplitude and should only affect volume - not sound.

I'll have a look at the -1 btr issue, although, thinking about it, it should have no impact at all as when btr falls to zero then the block is merely stored and is not processed at all.

Playing the WAV should be a good test to see whether the problem lies with playback or processing.
halb27
a) Nick said a '-1' shouldn't be seen with -detail, so probably there's something wrong and he'll certainly find out. Maybe only the display is affected. We'll see.
From what it's meant to be the current clipping prevention is to maintain quality - the downside is that bitrate with strongly clipping tracks can get pretty high as with your reported samples. But maybe there are other side effects with the clipping prevention strategy like no bit removed in one block due to clipping prevention and 10 bits removed in the next block due to normal lossyWAV mechanism leading to distortion. I can't imagine it's like that cause that would be audible also outside of your iAudio M5 environment. But until things are clear we should keep it in mind. For clarification it would be fine if you could encode a volume reduced variant of your sample. I can do the loudness reduced encoding in case you're not used to wave editing.

b) The replaygain procedure doesn't process the sound. It just computes a volume correction value and stores it in the file so that the playback machinery is able to adjust volume according to this value (in case the playback machinery has a replaygain feature).
The target is to have each track in a series of tracks originating from different albums at its adequate loudness.
Sound impression varies with different volume due to the volume depent frequency characteristics of our audio perception.
The only form of 'sound processing' because of replaygain can occur on playback when there is something like a 'soft clipping prevention' feature with replaygain which usually can be switched off if not wanted.
Anyway whether or not you want to use replaygain has nothing to do with our problem here in case it should be an encoding problem.
jesseg
QUOTE(halb27 @ Dec 17 2007, 01:56) *
b) The replaygain procedure doesn't process the sound. It just computes a volume correction value and stores it in the file so that the playback machinery is able to adjust volume according to this value (in case the playback machinery has a replaygain feature).
[snip]
The only form of 'sound processing' because of replaygain can occur on playback when there is something like a 'soft clipping prevention' feature with replaygain which usually can be switched off if not wanted.


Well that's not entirely true. In fact, if the replaygain code is handled in 16bits, it can easily cause an audible effect to be heard. Given it's a portable, and probably only supports 16bit (and FIR), I would bet money that the output of the replaygain multiplier is a 16bit number, thereby truncating bits.

The first thing to do would be to find out if your player's DAC even supports 24bit output. If not, then it's at least running dithering on the end of the replaygain function (or somewhere before it's sent to the DAC), if not just truncating anyways.
halb27
QUOTE(jesseg @ Dec 17 2007, 13:48) *

QUOTE(halb27 @ Dec 17 2007, 01:56) *
b) The replaygain procedure doesn't process the sound. ...


Well that's not entirely true. In fact, if the replaygain code is handled in 16bits, it can easily cause an audible effect to be heard. Given it's a portable, and probably only supports 16bit (and FIR), I would bet money that the output of the replaygain multiplier is a 16bit number, thereby truncating bits. ....

OK, but that's not exactly what I would call sound processing.
Nick.C
QUOTE(Nick.C @ Dec 17 2007, 07:52) *
I'll have a look at the -1 btr issue, although, thinking about it, it should have no impact at all as when btr falls to zero then the block is merely stored and is not processed at all.
The bug is merely presentational - when btr is indicated as -1 then it is actually 0 and the codec_block is stored. This will be amended for beta v0.5.9

As an aside, could this method be applied to 32-bit float values, i.e. round the mantissa only? And would any of the lossless codecs make use of zero LSB's in the mantissa?
2Bdecided
QUOTE(jesseg @ Dec 17 2007, 11:48) *

QUOTE(halb27 @ Dec 17 2007, 01:56) *
b) The replaygain procedure doesn't process the sound. It just computes a volume correction value and stores it in the file so that the playback machinery is able to adjust volume according to this value (in case the playback machinery has a replaygain feature).
[snip]
The only form of 'sound processing' because of replaygain can occur on playback when there is something like a 'soft clipping prevention' feature with replaygain which usually can be switched off if not wanted.


Well that's not entirely true. In fact, if the replaygain code is handled in 16bits, it can easily cause an audible effect to be heard.
No, not "easily". "Potentially".

Of course, the effects of ReplayGain are almost always audible - it changes the volume!

However, 16bits are more than enough to accomplish this transparently for most audio signals. Not all, but most.

The potential for detecting the re-quantisation (or re-dithering) exists with extremely dynamic signals, where ReplayGain reduces the volume because substantial sections are relatively loud, but the listener cranks the volume up to concentrate on some relatively quiet sections. In these sections, the dither or quantisation may be audible.


A far greater and more common problem is owning a poor DAP with a hissy amplifier, low power etc, where ReplayGain will make everything too quiet and drop it into the noise floor.

Cheers,
David.


QUOTE(stel @ Dec 16 2007, 19:24) *
Sorry for not posting these sooner. I've been out all day.
I've attached 10 second samples. The issue happens 5 seconds in when encoding the sample.flac file.
Sorry, the M5_issue.flac isn't great quality but you can clearly hear the issue I'm experiencing.
Just to check...

M5_issue is mono. I assume your DAP is actually playing back in stereo?

Cheers,
David.
stel
OK, I've spent some time this evening playing around with different encoder settings on my problem sample and I'm kicking myself with the results. The good news is that it's not lossyWAV that's at fault...

My problem was caused by the 'Bit Depth Control' settings in foobar2000. The 'Format is:' was set to lossy and 'Highest BPS mode supported:' set to 24. Setting the format to lossless(or hybrid) fixed the issue. I guess I've encoded everything at 24bit and my DAP isn't happy about it.

Apologies to everyone who spent time looking into it.
Nick.C
QUOTE(stel @ Dec 17 2007, 21:07) *
OK, I've spent some time this evening playing around with different encoder settings on my problem sample and I'm kicking myself with the results. The good news is that it's not lossyWAV that's at fault...

My problem was caused by the 'Bit Depth Control' settings in foobar2000. The 'Format is:' was set to lossy and 'Highest BPS mode supported:' set to 24. Setting the format to lossless(or hybrid) fixed the issue. I guess I've encoded everything at 24bit and my DAP isn't happy about it.

Apologies to everyone who spent time looking into it.
Many thanks for the subsequent investigations! My iPAQ won't even play 24bit.....

[edit] Interestingly, the difference between my 53 sample test set at 16 bit and 24 bit (Foobar WAV output) is about 40kB in 38.4MB [/edit]
jesseg
QUOTE(stel @ Dec 17 2007, 15:07) *
The 'Format is:' was set to lossy and 'Highest BPS mode supported:' set to 24. Setting the format to lossless(or hybrid) fixed the issue.


You're the second person in this thread alone to run into problems from that setting. I would think they would have it set to lossless by default, but I guess not (or is it?)
Nick.C
I've been toying with looking at implementing (& optimising in IA-32) mixed-radix FFT's (i.e. non-power-of-two-length).

At present, they're *very* slow.

Is there a feeling that changing the timeframe of the fft analysis from 1.45msec (64 samples @ 44.1kHz) to, say, 1.497msec (66 samples @ 44.1kHz) or 23.22msec (1024 samples @ 44.1kHz) to 20msec (882 samples @ 44.1kHz) would be beneficial?

This would also allow better handling of WAV's which are other than 44.1kHz as the optimal FFT length could be calculated directly rather than using the existing 64 / 128 / 256 / 512 / 1024 sample lengths, e.g. 72 samples = 1.5msec @ 48kHz; 960 samples = 20msec @ 48kHz.

If it is considered to be of merit then I will progress the implementation / optimisation over the festive period, initially alongside the existing power-of-two FFT routine unless (until?) the mixed-radix method becomes acceptably fast.
halb27
The best of possibilities that arise from such a generalization IMO comes with the long FFTs where a finer differentiation becomes possible. By now we only have the choice between a 512 and a 1024 sample FFT.

Anyway I can't imagine ist's a real benefit, even for a 48 kHz source which makes up for a ~10% difference in timing.

I once was worried about the overlapping of the 1024 sample FFT into the neighboring blocks (and I would still prefer the 5/8 partitioning I once suggested) which can also be reduced by using a say 882 sample FFT, but because of the very good quality we have I'm content with the way you do the FFT right now.
2Bdecided
I think it's a waste of time. The exact numbers aren't important, and even in lossy codecs (where the exact numbers are important) they still stick to fixed values irrespective of the sample rate, with reasonable results.

Cheers,
David.
Nick.C
QUOTE(halb27 @ Dec 18 2007, 14:32) *
The best of possibilities that arise from such a generalization IMO comes with the long FFTs where a finer differentiation becomes possible. By now we only have the choice between a 512 and a 1024 sample FFT.

Anyway I can't imagine ist's a real benefit, even for a 48 kHz source which makes up for a ~10% difference in timing.

I once was worried about the overlapping of the 1024 sample FFT into the neighboring blocks (and I would still prefer the 5/8 partitioning I once suggested) which can also be reduced by using a say 882 sample FFT, but because of the very good quality we have I'm content with the way you do the FFT right now.
QUOTE(2Bdecided @ Dec 18 2007, 15:10) *
I think it's a waste of time. The exact numbers aren't important, and even in lossy codecs (where the exact numbers are important) they still stick to fixed values irrespective of the sample rate, with reasonable results.

Cheers,
David.
Thanks guys - I'll park it. Only the presentational bug of -1 btr to be corrected for beta v0.5.9.

-shaping may get the bullet as it is really beyond my grasp of audio processing.

As things seem to be ticking along nicely, I think I'll concentrate on producing the correction file next.
2Bdecided
I probably said this before, but I strongly advise borrowing the wavpack file format for that, as much as possible/relevant, because I'm sure David Bryant has worked through some of the issues that might be faced here. It might not be entirely applicable, but it's always good to re-use someone else's (good) work when they're happy for you to do so!

Cheers,
David.
Nick.C
QUOTE(2Bdecided @ Dec 19 2007, 10:16) *
I probably said this before, but I strongly advise borrowing the wavpack file format for that, as much as possible/relevant, because I'm sure David Bryant has worked through some of the issues that might be faced here. It might not be entirely applicable, but it's always good to re-use someone else's (good) work when they're happy for you to do so!

Cheers,
David.
My approach was going to be even more simplistic - write two WAV files, one lossy, one correction, then encode both in FLAC. I'm working on adding a FACT chunk to the WAV file - containing a null terminated string which will include lossyWAV version information, parameters used and date/time of processing (and ultimately the CRC32 of the lossless and lossy data) which would be written to both files.
Nick.C
QUOTE(Nick.C @ Dec 19 2007, 10:31) *
I'm working on adding a FACT chunk to the WAV file - containing a null terminated string which will include lossyWAV version information, parameters used and date/time of processing (and ultimately the CRC32 of the lossless and lossy data) which would be written to both files.
I've successfully implemented a mechanism to insert a 160 byte FACT chunk immediately after the FMT chunk in the WAV file. This takes the form:
CODE
fact/152/lossyWAV v0.5.9 : 20/12/2007 08:46:57
-2 -cbs 512 -nts 0.00 -snr 21.00 -skew 36.00
-spf 22224-22235-22336-12347-12358 -fft 10101
If a file has already been processed, the FACT chunk will be found and lossyWAV will exit. When encoding in FLAC the --keep-foreign-metadata switch must be used to preserve the lossyWAV FACT chunk.

Thinking about it, I should make a bit more effort and make the FACT chunk variable length (up to a sensible maximum). In this way, the total length of the FACT chunk will be (8+string_length+(string_length and 1).
Nick.C
lossyWAV beta v0.5.9 attached at post 1 of the thread.

Fixed btr -1 bug;
Implementation of FACT chunk inclusion in output when processed. lossyWAV will exit if it finds a lossyWAV FACT Chunk in a WAV file. FLAC required --keep-foreign-metadata switch to retain FACT chunk.
jesseg
Could you have the exe return... something besides error level 1, when it actually has an error, such as the WAV file already having the lossyWAV flag? That will allow me, and others, to add handling for it into batch files and software.

Very good idea though. I'll release a new batch for iFLCDrop when I can get back a specific code for the "lossyWAV flag exists" error. With the error code returned I can make the script copy the source-file into the temp directory, and have it get encoded into FLAC anyways. smile.gif
Nick.C
QUOTE(jesseg @ Dec 22 2007, 07:52) *
Could you have the exe return... something besides error level 1, when it actually has an error, such as the WAV file already having the lossyWAV flag? That will allow me, and others, to add handling for it into batch files and software.

Very good idea though. I'll release a new batch for iFLCDrop when I can get back a specific code for the "lossyWAV flag exists" error. With the error code returned I can make the script copy the source-file into the temp directory, and have it get encoded into FLAC anyways. smile.gif
Very good idea, I'll start thinking about the codes needed and will post as beta v0.6.0 in a couple of days.
Nick.C
QUOTE(Nick.C @ Dec 22 2007, 11:37) *
QUOTE(jesseg @ Dec 22 2007, 07:52) *
Could you have the exe return... something besides error level 1, when it actually has an error, such as the WAV file already having the lossyWAV flag? That will allow me, and others, to add handling for it into batch files and software.

Very good idea though. I'll release a new batch for iFLCDrop when I can get back a specific code for the "lossyWAV flag exists" error. With the error code returned I can make the script copy the source-file into the temp directory, and have it get encoded into FLAC anyways. smile.gif
Very good idea, I'll start thinking about the codes needed and will post as beta v0.6.0 in a couple of days.
Another thought - would it be useful to, say, "lossyWAV <wavfile.wav> -check" to allow the user to check for a processed file without trying to process it again?
jesseg
Yes, especially if it returned an error code if it was already processed. It would allow an application to check or batch check for lossyWAV or non-lossyWAV files... for whatever reason someone might want to do that. But even without an error code, it would be ok for non-automated or non-batch use I guess.
Nick.C
QUOTE(jesseg @ Dec 24 2007, 12:16) *
Yes, especially if it returned an error code if it was already processed. It would allow an application to check or batch check for lossyWAV or non-lossyWAV files... for whatever reason someone might want to do that. But even without an error code, it would be ok for non-automated or non-batch use I guess.
I would of course include an error code smile.gif following your request. You can even see the rifffact chunk stored at the beginning of FLAC files thanks to Josh's latest --keep-foreign-metadata switch, and Tiny Hexer file binary viewer / editor.
singaiya
Is lossyWAV's method similar to Wavpack lossy's method? I thought it was a different approach, but then seeing mention of correction files maybe they're more similar than I thought?
Nick.C
The concept of the correction file will be similar in that the lossless original will be able to be recomposed from the lossy.wav and the lwcdf.wav files by simple sample addition.

lossyWAV rounds LSB's to zero where the added noise of the rounding is calculated to be below a threshold value.
Nick.C
QUOTE(Nick.C @ Dec 24 2007, 12:20) *
I would of course include an error code smile.gif following your request. You can even see the rifffact chunk stored at the beginning of FLAC files thanks to Josh's latest --keep-foreign-metadata switch, and Tiny Hexer file binary viewer / editor.
lossyWAV beta v0.6.0 attached at post 1 of the thread.

Error code = 16 on exit if WAV file has already been processed.

-check parameter included to allow checking without trying to process the file, error code = 16 if processed. [edit]It's not clear in the parameters that -check should only be used in the context: "lossyWAV <wavfile.wav> -check" as it will always exit after determining whether a lossyWAV FACT chunk exists.
jaybeee
Thanks for the latest update Nick.C.
I personally think it's best to add download links to the first post in the thread; that way it's always easy to find smile.gif

I use foobar, the set up as per the lossywav wiki entry, and so do you think it's possible to preserve Replaygain tags? (I'm unsure if this is a foobar or lossywav issue).
Nick.C
QUOTE(jaybeee @ Dec 29 2007, 17:06) *
Thanks for the latest update Nick.C.
I personally think it's best to add download links to the first post in the thread; that way it's always easy to find smile.gif

I use foobar, the set up as per the lossywav wiki entry, and so do you think it's possible to preserve Replaygain tags? (I'm unsure if this is a foobar or lossywav issue).
David will correct me if I am wrong, but I've seen slightly different Replaygain values for the same track pre as opposed to post processing, so I don't know if retaining them is a good idea. It wasn't much though. lossyWAV does nothing at all with tags - that's all Foobar (thankfully).

I will edit the first post in this thread to reflect both its content and as the download point.
halb27
How do you make foobar pass the replaygain information to the resulting (say) FLAC file?
I'd love to do that as my input ape files do have replaygain information.

I personally wouldn't care about slightly incorrect replaygain values (and I can't but imagine they are small).
The bigger problem to me is with replaygain that sometimes the replaygain values have to be corrected manually to achieve an equal loundness impression. I'd be happy if I could do these manual corrections just once in my ape files, and take profit of it whenever I encode them.
Nick.C
lossyWAV beta v0.6.1 now appended to post #1 of this thread.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.