IPB

Welcome Guest ( Log In | Register )

Lame 3.99.5z, a functional extension
halb27
post Sep 18 2012, 23:06
Post #1





Group: Members
Posts: 2414
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



You can download it from here.

Whatís the functional extension?

It offers VBR quality settings -V3+ to -V0+ and -V0+eco (economic version of -V0+).


What are -Vn+ and -V0+eco good for?

They improve pre-echo behavior.
Beyond that, they combine the quality advantages of VBR (regarding pre-echo) with the quality advantages of CBR/ABR (with respect to ringing and other tonal issues).


Lame users can be classified into three categories:

a) Users who donít care about rare quality issues and/or care much about small file size.
The common way for these users to work with Lame is to use -V5, -V4, or similar.

b) Users who donít like to have obvious and especially ugly issues in their music even when theyíre rare but who care about file size as well.
The common way for these users to work with Lame is to use -V2, -V3, or similar.
CBR 192 or similar, or ABR in this bitrate range, is an alternative (but seldom used).

c) Users who want overall transparency or at least a quality which comes close to it, and who donít care much about file size.
The common way for these users to work with Lame is to use -V0, -V1, or similar as a VBR method, or to use CBR 320 or 256. Using very high bitrate ABR is an alternative (seldom used).

For users of group b) and c) -Vn+/-V0+eco offers significant quality advantages:

We have two major issue classes with most of the lossy codecs:
- temporal smearing (pre-echo) issues
- ringing (tremolo) and other tonal issues.

Letís look at the worst samples I know for these classes:
- eig (extremely strong temporal smearing)
- lead-voice (extreme ringing, for instance at sec. 0..2)

With samples like these users of group b) canít be very happy when using -V2 or -V3, because the ringing issues are very obvious and ugly. The temporal smearing of eig is pretty obvious as well, especially around sec. 3. Using CBR/ABR 192 or similar is a good procedure to fight the ringing, but temporal smearing is much worse than with VBR, itís real ugly.
Things donít really change when using slightly increased quality settings.

For users of group c) itís exactly the same thing, with quality requirements and quality received just both on a higher level.

So the traditional way of doing things isnít totally satisfactory.

Users of group b) can use -V3+ or -V2+ (recommended) and get much better results in the overall view.

Users of group c) can use -V1+ or -V0+eco (recommended) or -V0+ (recommended for the paranoid like me) and get transparency or close-to-transparency. Sure itís impossible to prove transparency for the universe of music, but itís true for the samples mentioned. And as these are very outstanding samples within their problem classes and because of the technical details of -Vn+ described below itís plausible that the approach works rather universally.


How is it done?

-Vn+ uses -Vn internally (-V0+eco uses -V0), but the accuracy demands for short blocks are increased. Short blocks are used when the encoder takes care of good temporal resolution. Audio data bitrate is kept rather high also with long blocks which are normally used.
These audio data requirements are helpful for any kind of problem, they are not restricted to ringing or pre-echo issues.
Moreover a strategy is used which is targeting at providing close to maximum possible audio data space for short blocks.


Whatís the price to pay?

Compared to -Vn the increased accuracy demands of -Vn+ raise average bitrate. As -Vn+ is targeting at significant quality improvements compared to -V2 for real bad samples, we need an average bitrate around 200 kbps at least.

-V3+ and -V2+ are designed for users of group b) above, and as such take care of average bitrate not to be much higher than 200 kbps. For my test set of various pop music average bitrate is 205 kbps for -V3+, and 217 kbps for -V2+.

For users of group c) I allow for the full quality resp. average bitrate range mp3 can offer.
-V1+ takes an average bitrate of 257 kbps for my test set, -V0+ takes 317 kbps.
-V0+eco (economic version of -V0+) takes 266 kbps. So -V0+eco comes nearly for free as -V0 takes 260 kbps for my test set.

Unlike versions I published before, mp3packer isnít really needed any more to squeeze the unused bits out of the mp3 file (with the exception of fractional settings like -V0.5+ between -V1+ and -V0+).
mp3packer brings average bitrate down by only 1 kbps maximum for -Vn+ between -V3+ and -V2+, by 1 to 2 kbps for -Vn+ between -V2+ and -V1+, and by 2 kbps for -V0+ and -V0+eco. So I think we can forget about mp3packer with these settings.


Options

--adbr_short x
sets the minimum audio data bitrate for short blocks to x [kbps] when using -Vn+ or -V0+eco, with x in the range 150..450.
Defaults are 360,370,420,440,440 kbps for -V3+,-V2+,-V1+,-V0+eco,-V0+ resp.

--adbr_long x
sets the minimum audio data bitrate for long blocks to x [kbps] when using -Vn+ or -V0+eco, with x in the range 110..310.
Defaults are 160,170,215,220,290 kbps for -V3+,-V2+,-V1+,-V0+eco,-V0+ resp.

--frameAnalysis
prints detailed information for each frame (L/R or M/S representation, blocktype of both granules, available audio data bits, audio data bits used, etc.). Works for both -Vn and -Vn+.


--------------------
lame3100m --bCVBR 300
Go to the top of the page
+Quote Post
 
Start new topic
Replies
Dynamic
post Nov 6 2012, 21:08
Post #2





Group: Members
Posts: 793
Joined: 17-September 06
Member No.: 35307



BFG, I can't speak for Robert, but my own thoughts are along the psymodel lines.
I think that:

a fairly large proportion of cases where Lame3.99.5 -Vn has problems that halb27's -Vn+ version fixes consist of sharp transient (highly localized in the time-domain but spread out in the frequency domain) simultaneous with a tonal signal (highly-localized in the frequency domain but spread out in the time domain).

The time-frequency product tradeoff type characteristic (localized in one means spread-out in the other) is analagous to Heisenberg's Uncertainty Principle (Δt.Δf ~ constant).

The mathematics of transforms such as MDCT (or FT) means that:

if you have a long block, you have a lot of frequency bins, each of which is fairly narrow in bandwidth, allowing fairly precise reproduction of long-duration tonal signals (localized peaks in the frequency domain) even with relatively imprecise values* stored for each frequency bin (the imprecision implies lower bit-depth and hence lower bitrate). As these tonal signals are spread out in the time domain, any time-domain variation is slow enough not to need precise representation.

*these frequency-domain values are complex numbers, basically implying that they carry information about both amplitude and phase. Values from neighbouring bins actually interfere when transformed into the time domain, allowing reproduction of frequencies more precisely defined than the bin-width itself.

Stlll in a long block, if you have an event that is localized in time, however, such as a transient, you can reproduce it, but it requires much greater precision for the values of each frequency bin to sum together in the time domain with correct phase to reproduce the time-localization to prevent it from being smeared out like a soft noise over a longer time (which produces pre-echo and post-echo, though post-echo is more readily masked). Such precision (or bit-depth) over so many frequency bins requires a high bitrate.

An alternative is to detect these time-localized transients and split the time into, say three short blocks. There are now fewer frequency bins in each short block (each having greater bandwidth) but there's less smearing of time (the maximum smearing being the duration of the whole short block), and sufficient time-localization can be achieved with a modest precision of the values for each frequency bin, thus a modest bit-depth and bit-rate (at the expense of frequency-smearing). As time-localized signals are frequency-unlocalized (broad spectrum, noiselike) that's often not a problem.

If there is a tonal (frequency-localized but time-smeared) signal to be represented within the short block that we don't think will be masked by the loud transient, its frequency can be reproduced more accurately only by increasing the precision of the values for each of the frequency bins (because the summation of interfering components of neighbouring broad bandwidth frequency bins when we convert back to the time-domain will then accurately preserve the frequency and phase of the tonal signal. This greater precision, as before, requires greater bit-depth to represent the values in the transform-domain and thus higher bitrate.

It's this latter case that -Vn+ seems to solve, but it doesn't currently detect that there actually IS an important tonal component that isn't masked by the transient (pre-masking and post-masking), it just assumes that there might be, so to be on the safe side, employ a much higher bitrate (much higher precision of bin values) during all short blocks.

For any encoder, with enough processing time, it should be possible to derive an extra measurement on the analysis FFT in the psymodel, but only do the check once short blocks have been triggered (and only test the check on short blocks). That check would look for tonal signals (frequency-localized) during these short blocks, and probably during the switching windows too (long->short and short->long), to determine whether any of them might not be masked entirely by the transient and whether they require higher precision in the transform-domain quantization (and thus higher bitrate) to maintain their frequency precision despite the wide bin-width. It might be possible to determine a suitable mathematical function to determine the required quantization precision from listening tests on tone+transient signals of varying relative amplitudes (and varying tone frequency ranges) and to build in enough margin of safety to account for practical limitations arising from window functions and the like, or failing that to simply determine a threshold of 'tonality' that triggers the encoder to turn up the precision to the maximum for the affected short blocks. Either way would mainly solve the problem cases without boosting bitrate for many general unproblematic short blocks, which is the efficient approach normally adopted in LAME VBR tuning.

Robert has improved the lead-voice problem sample in the latest 3.100 alpha, which I'd have put into this category, so I'll do some keen listening tests to see what might be fixed. Having taken a quick look at the diffs for the latest psymodel.c seem to include a good deal of stuff relating to tonality measures, so I'm hopeful that a lot of the problem samples are going to be hard to ABX using 3.100 alpha2 when I get time to try.

There remain some problems that don't fit this tonal+transient during short-block description, so halb27's -Vn+ modes will still have mileage while the psymodel hasn't fixed them.
Go to the top of the page
+Quote Post
BFG
post Nov 7 2012, 00:10
Post #3





Group: Members
Posts: 205
Joined: 22-July 12
Member No.: 101637



QUOTE (Dynamic @ Nov 6 2012, 14:08) *
BFG, I can't speak for Robert, but my own thoughts are along the psymodel lines.
I think that:

Thanks for the explanation Dynamic; I'll need to read through it a couple more times to ensure I fully understand it.
I have a lot of interest in this area, but not much understanding!

In the meantime, I have a (perhaps silly) question: would adding a twopass system to LAME help the tonal or sharp attack problems in any way?
That is, if LAME already knew what the data in future frames looked like, would it be able to more accurately encode the current frames and/or anticipate tonal or sharp attack problems?
Go to the top of the page
+Quote Post

Posts in this topic
- halb27   Lame 3.99.5z, a functional extension   Sep 18 2012, 23:06
- - GeSomeone   I think you forgot to mention the lowpass values t...   Sep 26 2012, 20:17
- - solidornot   Thank you. I have used the extension since last D...   Oct 6 2012, 18:13
- - halb27   The main difference between -V0+eco and -V0 is the...   Oct 6 2012, 22:07
|- - solidornot   QUOTE (halb27 @ Oct 6 2012, 16:07) The ma...   Oct 8 2012, 02:28
- - Dynamic   ...and for future use (not existing encodes), the ...   Oct 8 2012, 17:10
- - soundping   3.99.5z doesn't work with dBpoweramp R14.3. I...   Oct 8 2012, 19:34
- - halb27   Maybe just the VC10 runtime library is missing whi...   Oct 8 2012, 21:08
- - n8er11   Worked ok for me on DBPowerAmp...replaced the orig...   Oct 9 2012, 11:35
- - soundping   The version that's packed with 9.99.5z is ...   Oct 9 2012, 16:55
- - halb27   Lame3.99.5z is a 32 bit exe (which works with 64 b...   Oct 9 2012, 18:49
- - lvqcl   lame3995z.exe + msvcr100.dll 10.0.40219.325: works...   Oct 9 2012, 19:12
- - soundping   That's weird. The regular LAME 3.99.5 works ju...   Oct 10 2012, 08:46
- - BFG   Unfortunately, I wasn't able to get 3.99.5z to...   Oct 22 2012, 03:09
- - halb27   In which folder is the msvcr100.dll?   Oct 22 2012, 07:39
- - halb27   I took it as an occasion to install EAC1.0b3 on my...   Oct 22 2012, 09:52
|- - BFG   QUOTE (halb27 @ Oct 22 2012, 03:52) I hav...   Oct 22 2012, 19:22
|- - BFG   QUOTE (BFG @ Oct 22 2012, 13:22) QUOTE (h...   Oct 23 2012, 00:38
- - nastea   Is it already possible to use wildcards with LAME?...   Oct 23 2012, 01:43
|- - Aleron Ives   QUOTE (nastea @ Oct 22 2012, 17:43) Is it...   Oct 23 2012, 04:15
- - BFG   QUOTE (nastea @ Oct 22 2012, 19:43) Maybe...   Oct 23 2012, 02:26
- - BFG   halb27, can you help me understand: What is the ad...   Oct 23 2012, 02:28
|- - halb27   QUOTE (BFG @ Oct 23 2012, 03:28) halb27, ...   Oct 23 2012, 08:52
- - BFG   QUOTE (halb27 @ Oct 23 2012, 02:52) A bun...   Oct 24 2012, 01:50
|- - GeSomeone   QUOTE (BFG @ Oct 24 2012, 02:50) It simpl...   Oct 24 2012, 12:12
|- - halb27   QUOTE (GeSomeone @ Oct 24 2012, 13:12) .....   Oct 24 2012, 12:36
- - halb27   Yes, you understand it very well. -Vn+ resp. -V0+e...   Oct 24 2012, 08:41
- - BFG   Halb, you mentioned that LAME 3.100 is in developm...   Oct 24 2012, 18:37
|- - pdq   QUOTE (BFG @ Oct 24 2012, 13:37) Halb, yo...   Oct 24 2012, 20:53
- - halb27   What you describe is pretty much what 3.99.5z does...   Oct 24 2012, 19:41
|- - BFG   QUOTE (halb27 @ Oct 24 2012, 13:41) What ...   Oct 24 2012, 20:25
- - halb27   @BFG: The more you go to the limits the more often...   Oct 24 2012, 21:07
|- - BFG   QUOTE (halb27 @ Oct 24 2012, 15:07) @BFG:...   Oct 24 2012, 21:28
|- - BFG   Well, I'm testing some rather extreme settings...   Oct 25 2012, 04:34
|- - halb27   QUOTE (BFG @ Oct 25 2012, 05:34) ...I sus...   Oct 25 2012, 06:28
|- - BFG   QUOTE (halb27 @ Oct 25 2012, 00:28) Absol...   Oct 25 2012, 23:02
- - halb27   For the worst tonal and pre-echo problems I know: ...   Oct 26 2012, 07:02
- - BFG   Thanks for all the info on this, halb27 (not to me...   Oct 26 2012, 17:15
- - halb27   Granules of blocktype 'start' and 'sto...   Oct 26 2012, 17:39
|- - BFG   QUOTE (halb27 @ Oct 26 2012, 11:39) I did...   Oct 26 2012, 20:18
- - halb27   A preliminary remark: you can drop the '-v -q0...   Oct 26 2012, 23:13
- - halb27   I was asked to provide more parameters. So I have ...   Oct 27 2012, 08:28
|- - BFG   QUOTE (halb27 @ Oct 27 2012, 02:28) I was...   Oct 29 2012, 00:00
- - IgorC   Hi, halb27. V3+ ends up with the same average bit...   Oct 28 2012, 01:45
|- - Dynamic   QUOTE (IgorC @ Oct 28 2012, 01:45) Any ch...   Oct 28 2012, 09:10
- - halb27   OK, I can provide -Vn+ for -V5+ to -V0+. With this...   Oct 28 2012, 10:59
- - Dynamic   I haven't tried --frameAnalysis but I was thin...   Oct 28 2012, 11:33
|- - robert   QUOTE (Dynamic @ Oct 28 2012, 11:33) I ha...   Oct 28 2012, 13:26
- - halb27   You're welcome to do this work. As far as I am...   Oct 28 2012, 12:17
- - halb27   Yes, I will take care of best quality for -V0+, an...   Oct 29 2012, 00:49
- - IgorC   I have tried one more sample. This time a tonal on...   Oct 29 2012, 02:40
- - halb27   Thanks for your tests, IgorC. If I skipped the def...   Oct 29 2012, 08:01
|- - IgorC   QUOTE (halb27 @ Oct 29 2012, 04:01) Thank...   Oct 30 2012, 02:52
- - Dynamic   halb27, would you say lead-voice is still problema...   Oct 29 2012, 15:50
|- - robert   QUOTE (Dynamic @ Oct 29 2012, 15:50) BTW,...   Nov 5 2012, 19:42
- - halb27   With the test version robert gave me lead-voice wa...   Oct 29 2012, 17:44
- - halb27   Here comes a preliminary new version to play with....   Oct 29 2012, 23:35
- - halb27   Thanks for your test, IgorC. I think I've fou...   Oct 30 2012, 11:50
|- - BFG   Thanks for posting the "C" version, halb...   Oct 31 2012, 00:34
- - halb27   Recent changes in the strategy of the functional e...   Oct 31 2012, 10:14
- - halb27   Sorry, in my last post '310/253 for --adbr_lon...   Oct 31 2012, 17:33
- - halb27   Here comes a new version. a) The default lowpass ...   Nov 1 2012, 01:02
|- - BFG   QUOTE (halb27 @ Oct 31 2012, 19:02) Here ...   Nov 1 2012, 03:27
- - halb27   Sorry. Glad you tested so quickly. I've correc...   Nov 1 2012, 09:21
|- - BFG   QUOTE (halb27 @ Nov 1 2012, 03:21) Sorry....   Nov 1 2012, 16:46
- - halb27   I did a minor change: default to the 'universa...   Nov 1 2012, 12:01
- - GreenSeer   I have enjoyed reading this very technical thread ...   Nov 1 2012, 16:02
- - halb27   Throwing away part of the dynamic range for the sa...   Nov 1 2012, 16:09
- - halb27   trumpet_myPrince triggered this (see post #59). C...   Nov 1 2012, 17:26
- - halb27   Here comes another version. Sorry for the flood. ...   Nov 1 2012, 19:09
- - halb27   A related question is: Is it better to achieve an...   Nov 2 2012, 01:19
- - IgorC   halb27, If You don't mind here are a few que...   Nov 2 2012, 03:43
- - halb27   Hallo IgorC, You're absolutely right. I'v...   Nov 2 2012, 09:21
- - IgorC   Great. As for version 'e', V5+eco is twic...   Nov 3 2012, 19:50
- - halb27   There is no -V5+eco. I gave up the -Vn+eco idea (s...   Nov 3 2012, 22:13
- - Dynamic   Thanks for the pointer, Robert. I've grabbed ...   Nov 5 2012, 20:33
- - BFG   This thread has me curious: Robert, have you consi...   Nov 6 2012, 06:32
- - Dynamic   BFG, I can't speak for Robert, but my own thou...   Nov 6 2012, 21:08
|- - BFG   QUOTE (Dynamic @ Nov 6 2012, 14:08) BFG, ...   Nov 7 2012, 00:10
- - halb27   Yes, a lowpass can help.   Nov 7 2012, 00:20
|- - BFG   QUOTE (halb27 @ Nov 6 2012, 17:20) Yes, a...   Nov 7 2012, 00:42
- - halb27   Sorry. A twopass system could help a bit to better...   Nov 7 2012, 07:45
|- - BFG   QUOTE (halb27 @ Nov 7 2012, 00:45) A twop...   Nov 8 2012, 06:32
- - halb27   I think what I said holds true for mp3 in general....   Nov 8 2012, 11:07
- - halb27   I'm back having finished everything on my mind...   Nov 26 2012, 13:44
|- - GeSomeone   QUOTE (halb27 @ Nov 26 2012, 14:44) In th...   Nov 28 2012, 14:30
- - halb27   Yes, I'd like to remove all the speciial optio...   Nov 28 2012, 18:30
|- - BFG   QUOTE (halb27 @ Nov 28 2012, 11:30) Yes, ...   Nov 28 2012, 19:23
- - halb27   a) I can keep all the --adbr_xxx options if this i...   Nov 28 2012, 23:03
- - Kamedo2   I'm considering doing a listening test of the ...   Dec 2 2012, 16:52
- - halb27   Such a test is wonderful, Kamedo2. You are welcome...   Dec 2 2012, 18:14
- - Kamedo2   Halb27, thank you for your great advice! I put...   Dec 2 2012, 19:44
- - halb27   Fine. Nice that you want to give 3.100a2 a try. Th...   Dec 2 2012, 22:06
- - Kamedo2   LAME 3.100a2 a priori test: finished. There might ...   Dec 8 2012, 19:08


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 18th April 2014 - 20:41