IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Details on VBR frame quantization with reduced accuracy / improvements
halb27
post Mar 27 2011, 23:05
Post #1





Group: Members
Posts: 2414
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



In two other threads I was showing how -V0 SNR can be improved by using --ns-bass/alto/treble, and I gave a special version 3.98.4m which gives some modifications towards these parameters (allowing values down to -12 and keeping away from sfb21 behavior).

In order to provide additional data space it is essential to use -b 320 -F (and not bad to also use -Y or a lowpass like --lowpass 17.5 or similar).
But the question arises: Is this enough, or are we often running out of data space when demanding for a higher SNR? In this case quantization has to be done with some simplifications to what the VBR mechanism requires, thus lowering accuracy for the frame.

This is a question in it's own right as it applies to plain -V0 as well.

I did some investigations in the number of these out of data space events.
For this purpose I created another 3.98.4 variant (I call it 3.98.4n) which counts these events, and when --verbose is given as an option, displays the number of total frames, the number of reduced accuracy frames, and the corresponding percentage.
Another feature introduced with 3.98.4n is the new parameter --ns-short which modifies --ns-bass/alto/treble behavior for short blocks. If given, for short blocks it limits SNR improvement of --ns-bass and --ns-alto to the SNR improvement given by --ns-treble. The resulting --ns-bass/alto/treble values are then multiplied by the --ns-short value which has to be in the range 0 ... 1.
The details are displayed when --verbose is used.
As an example, --ns-bass -8 --ns-alto -7 --bs-treble -6 --ns-short 0.5 yields a value of -3 for --ns-bass, --ns-alto, --ns-treble for short blocks.
3.98.4n can be downloaded (together with the changed source files) from here.

I tested 6 full length pop music tracks, and 8 problem sample snippets.
First I tested plain -V0.
'lead-voice' and 'herding_calls' had no accuracy reduced frames.
Apart from these tracks the accurracy reduction percentages were 0.9%, 1.3%, 4.0%, 6.1%, 6.6%. 9.8% (Wake Me Up When September Ends) for the full length tracks, and 2.1%, 2.1%, 7.5% (metropolis), 18.7% (trumpet), 23.1% (castanets), 23.3% (eig_essence) for the problem samples.

Accuracy reduction is not necessarily an issue as we know from the excellent pre-echo behavior of current Lame VBR (with respect to what's possible with mp3), and it's simply necessary in cases like eig and castanets. But percentages beyond 20% were quite astonishing to me, as was the 9.8% of a regular full-length track.

When improving -V0's SNR by increasing accuracy demands via --ns-bass/alto/treble/short while not allowing for a high percentage of accuracy reduced frames, a natural strategy is to use such values for --ns-bass/alto/treble/short that the percentage of reduced accuracy frames is similar to what we get from plain -V0 usage.

'-V0 -b 320 -F --lowpass 17.5 --ns-bass -8 --ns-alto -7 --ns-treble -6 --ns-short 1' does the job.
It's not vital if '--ns-short 1' is dropped or another minor modification is done. '--lowpass 17.5' has shown to be more effective than '-Y', but things don't change dramatically if neither '-Y' nor a lowpass is used.

'lead-voice' and 'herding_calls' had no accuracy reduced frames with this setting too.
Apart from these tracks the accurracy reduction percentages were 0.5%, 0.7%, 1.0%, 3.6% (Wake Me Up When September Ends), 4.0%. 6.1% for the full length tracks, and 0.5%, 2.8%, 7.7% (metropolis), 19.0% (trumpet), 22.0% (castanets), 27.7% (eig_essence) for the problem samples.

In case the percentage of accuracy reductions on bad occasion is considered to be too high for this setting (and for plain -V0), the benefits of '-b 320 -F' (and '--lowpass 17.5' according to one's likings) can be used with this target in mind.

Let's first look at '-V0 -b 320 -F --lowpass 17.5':

For 'Blackbird,Yesterday', 'lead-voice', 'harp40_1' and 'herding_calls' there is no accuracy reduction at all.
Apart from these tracks the percentages were 0.02%, 0.1%, 0.1%, 0.2%, 0.6% for the full length tracks, and 0.2%, 0.3%, 2.5% (eig_essence), 3.1% (trumpet), 9.8% (castanets) for the problem samples.

So with the '-b 320 -F' addition to plain -V0 we get significant improvements.

We can get significant improvements as well when we use --ns-bass/--ns-alto/--ns-treble/--ns-short.
The high percentages of accuracy reductions we had are mostly due to the higher SNR demands on short blocks. If we drop this we arrive at a much lower percentage of accuracy reductions. Of course we restrict VBR quality improvements to tonal issues this way.

'-V0 --lowpass 17.5 -b 320 -F --ns-bass -9 --ns-alto -8 --ns-treble -7 --ns-short 0' or similar is the way to go.
For 'Blackbird,Yesterday', 'lead-voice', 'harp40_1' and 'herding_calls' there is also no accuracy reduction at all.
Apart from these tracks the percentages were 0.05%, 0.2%, 0.5%, 0.8%, 1.2% for the full length tracks, and 1.2%, 1,7%, 5.7% (trumpet), 6.9% (eig_essence), 15.7% (castanets) for the problem samples.

An open question is how much (if at all) the effect of accuracy reduction due to lacking data space is audible.
My problem here is that with tonal problems which I am (partially) sensitive to there is no accuracy reduction. The problem is more related to pre-echo problems I am not sensitive to.
To have at least a small chance I did a test with eig for the extreme setting '-V0 --lowpass 17.5 -b 320 -F --ns-bass -12 --ns-alto -12 --ns-treble -12' (44.0% accuracy reductions) and compared it with '-V0 -b 320 -F --lowpass 17.5' (2.5% accuracy reductions). With these contenders I'd say that the extreme --ns-xxx setting is worse. It was easier to ABX, and my likings were a little bit more with the '-V0 -b 320 -F --lowpass 17.5' version. But '-V0 --lowpass 17.5 -b 320 -F --ns-bass -12 --ns-alto -12 --ns-treble -12' really isn't a useful setting.

I'd be happy if a member with a good pre-echo sensitivity could try say '-V0 -b 320 -F --lowpass 17.5 --ns-bass -8 --ns-alto -7 --ns-treble -6 --ns-short 1' against '-V0 -b 320 -F --lowpass 17.5' and/or plain -V0.

This post has been edited by halb27: Mar 27 2011, 23:15


--------------------
lame3100m --bCVBR 300
Go to the top of the page
+Quote Post
halb27
post Mar 29 2011, 07:41
Post #2





Group: Members
Posts: 2414
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



I did a listening test with trumpet and castanets because with these samples I also have a chance to ABX especially as they yield an accuracy reduced frame percentage of as high as 42.1% (trumpet) and 36.9% (castanets) when using '-V0 --lowpass 17.5 -b 320 -F --ns-bass -12 --ns-alto -12 --ns-treble -12'.
Again I compared with '-V0 -b 320 -F --lowpass 17.5' which yields 3.1% (trumpet) and 9.8% (castanets).

I could not ABX trumpet with both of the versions.
I ABXed castanets 6/8 with both of the versions.

While -V0 --lowpass 17.5 -b 320 -F --ns-bass -12 --ns-alto -12 --ns-treble -12' is really too much it shows that even with a high amount of accuracy reduced frames quality isn't necessarily reduced.


--------------------
lame3100m --bCVBR 300
Go to the top of the page
+Quote Post
onkl
post Mar 29 2011, 22:03
Post #3





Group: Members
Posts: 125
Joined: 27-February 09
From: Germany
Member No.: 67444



Why are you using the -F switch? I just made quick test and the -F doesn't seem to affect bitrate or reservoir size, -b320 is enough.
Go to the top of the page
+Quote Post
halb27
post Mar 31 2011, 09:45
Post #4





Group: Members
Posts: 2414
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (onkl @ Mar 29 2011, 23:03) *
Why are you using the -F switch? I just made quick test and the -F doesn't seem to affect bitrate or reservoir size, -b320 is enough.

Probably you're totally right, and I didn't use it in earlier days. Someday somebody proposed it (maybe with irony in mind) as a reply to one of my posts. Obsolete or not it doesn't hurt, is more consequent, and the encoding result is a true CBR 320 result (filled with data produced by the VBR encoding machinery).

This post has been edited by halb27: Mar 31 2011, 09:50


--------------------
lame3100m --bCVBR 300
Go to the top of the page
+Quote Post
lvqcl
post Mar 31 2011, 16:43
Post #5





Group: Developer
Posts: 3212
Joined: 2-December 07
Member No.: 49183



QUOTE
Unlike Modo the dwarf, Sergeant Colon did know the meaning of the word 'irony'. He thought it meant "sort of like iron".


What if a song starts with silence? Then LAME uses 32kbps frames without -F switch, and it is 10 times smaller than with -F switch. So without -F, bit reservoir can grow slower.

(correct me if I'm wrong...)
Go to the top of the page
+Quote Post
halb27
post Apr 1 2011, 07:39
Post #6





Group: Members
Posts: 2414
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



Yes, that's why I use -F though I don't know whether or not it's essential in practice. But it's the consequent way to do it.


--------------------
lame3100m --bCVBR 300
Go to the top of the page
+Quote Post
halb27
post Oct 8 2011, 13:23
Post #7





Group: Members
Posts: 2414
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



Just for completeness: Lame3.98.4.n is obsolete.
Lame 3.98.4x as introduced here does the job much better.


--------------------
lame3100m --bCVBR 300
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 20th April 2014 - 12:54