WMA VBR 2-pass encoding

Topic: WMA VBR 2-pass encoding (Read 6542 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

WMA VBR 2-pass encoding

2005-12-01 09:28:53

Samples related to this message.

Original are available on rarewares.

WMA VBR 2-pass encoding

Reply #1 – 2005-12-01 12:23:55

Five snapshots: one for each file corresponding to the graphical representation of the dynamic of each FULL track (in white is selected the exact small part which corresponds to the 23...30 sec sample).

As you can see, the "fake track" (which correspond to 18 samples merged into one) is very different from all four real tracks. The loudness may explain why the four classical samples are underpriviledged with WMA Std 2-pass (106 kbps on average) compared to the remaining parts of the "virtual track". Consequently the non-classical tracks benefit from the additionnal bitrate.

Code: [Select]

          REAL TRACK    FAKE TRACK
bartok     124,39        116,40
debussy    105,44         78,45
hongroise  130,93         88,72
mahler     192,70        142,66
           ____________________
            138,37        106,56

WMA VBR 2-pass encoding

Reply #2 – 2005-12-01 15:22:19

Quote

Code: [Select]

          REAL TRACK    FAKE TRACK   REAL/FAKE   R/F /MEAN R/F (130)
bartok     124,39        116,40      107%      82%
debussy    105,44         78,45      134%      103%
hongroise  130,93         88,72      146%      112%
mahler     192,70        142,66      135%      104%
           ____________________________
            138,37        106,56      130%

[{POST_SNAPBACK}][/a]

You see what ive done there, added the REAL results as a percentage of FAKE,
and see how they correlate to around 130%, with an encodes maximum deviation from mean of -18%.
Its the relative Demand..factor ;-) of the added 'fake' audio (difference between the demand factor of the substituted audio and the real audio) which informs this ratio between the achieved bitrates of the Real encodes and fake encodes.
As you have shown, there is also some random deviation on top of this (max of 18% in the samples documented).
It would also be interesting to see, the results of a fake track, with no added substitute audio, 2Pass targeting 128Kbs or even better the 138kbs achieved by the real encodes.

How relevant is the random deviation? Read [a href="http://www.hydrogenaudio.org/forums/index.php?showtopic=38723&view=findpost&p=346583]Ivan Dimkovic[/url]'s take on the subject.

The random deviation basicaly balances itself out over a sufficient number of test samples. If you examine how any ABR methods bitrate allocation differs with 20-30sec clip encodes from in-situ encodes youll see at least as serious deviations from the norm and worse if you compare bitrate distribution within the samples (bit allocation to the start of the sample /w bitrate allocation towards the end) I expect you might be shocked

This could be taken as a problem for the listening test, but things will get difficult if it is or unbalanced if it is only applied to Wma std option.
With a well choosen fake audio substitute you can achieve parity of bit allocation with the 2pass vbr method with individual encodes tending to agree between real and fake method.
I think it would work best to not use an audio substitute and just 2pass target the mean bitrate achieved by the other encoders (as an approximation of the the real achieved bitrate calculated here in this small sample)

If you are going to go with considering individual encoding discrepancies relevant, you should really adopt Gabriels suggestion of discarding 10 seconds or so run-in time of the samples for Vbr encodes as well as ABR (he kinda knows what hes talking about there ;-)

Anyway I take my hat off to your curiosity and labours here, good luck with resolving the issues raised.

regards'

WMA VBR 2-pass encoding

Reply #3 – 2005-12-01 18:01:11

Quote

[...]
[a href="index.php?act=findpost&pid=346908"][{POST_SNAPBACK}][/a]

Gosh, I never considered that listening tests would be this complicated.

WMA VBR 2-pass encoding

Reply #4 – 2005-12-01 18:43:48

This would be a whole lot more fun if the poster gave us some release notes - hell, I am an EE and I have no idea exactly what is being discussed. I know about WMA 2-pass encoding, but the explanations of key terms and goals are completely missing:

1. "real" and "fake" encodes - what on earth do you mean? How does one create a "fake"? Why is this relevant?
2. Relative "demand" factor - care to explain? What is doing the demanding? I demand an explanation!
3. 18 samples merged into one - why? What is the goal? What samples? Merged how? Averaged? Convoluted? Added? Modulated?

All of this is very interesting, just utterly opaque as presented. In the meantime, I will sit at my desk listening to some 128kbps AAC encodes. One pass. Nothing added.

WMA VBR 2-pass encoding

Reply #5 – 2005-12-01 19:25:11

Quote

This would be a whole lot more fun if the poster gave us some release notes - hell, I am an EE and I have no idea exactly what is being discussed. I know about WMA 2-pass encoding, but the explanations of key terms and goals are completely missing:

1. "real" and "fake" encodes - what on earth do you mean? How does one create a "fake"? Why is this relevant?

Real encodes are 2pass bitrate targeted encodes of full tracks, and sections of the track extracted for use in the 128kbs listening test.
Fake encodes are 2pass bitrated targeted encodes of just the sections of track used for listening, pasted together one after each other, and another section of 'averageish' audio pasted in at the beginning or end, and then the individual listening test sections are removed after the 2passed encode.

Quote

2. Relative "demand" factor - care to explain? What is doing the demanding? I demand an explanation!

Demand factor is a handle on how demanding a section of audio is to encode for the encoder, how strongly its algorithm demands bits to encode a specified section of audio.

I cooked it up with this generalised model of 2pass perfomance at this point in the huge listening test thread:
(note: its poorly name Demandrate here as point out by sehested)

Quote

It would be wonky for the prepass calculated vbr setting not to be global,
assuming it is, this equation would be true:
Code: [Select]
 phraseA_Bitrate*phraseA_Duration
+phraseB_Bitrate*phraseB_Duration
+phraseC_Bitrate*phraseC_Duration
=target_Bitrate*Total_Duration (=total bit allocation)
Next define Demandrate, a kind of passage complexity estimate from the encoders preferences, high for passages which would demand more bits, low for passages which would demand less.

phrase_Demandrate=phrase_Bitrate/target_Bitrate
phrase_Bitrate=phrase_Demandrate*target_Bitrate

Substituting phrase_Bitrate for its Demandrate expression in previous equation...

Code: [Select]
 (phraseA_Demandrate*target_Bitrate)*phraseA_Duration
+(phraseB_Demandrate*target_Bitrate)*phraseB_Duration
+(phraseC_Demandrate*target_Bitrate)*phraseC_Duration
=target_Bitrate*Total_Duration
devide both sides of equation by target_Bitrate to leave:
Code: [Select]
 phraseA_Demandrate*phraseA_Duration
+phraseB_Demandrate*phraseB_Duration
+phraseC_Demandrate*phraseC_Duration
=Total_Duration

Quote

All of this is very interesting, just utterly opaque as presented. In the meantime, I will sit at my desk listening to some 128kbps AAC encodes. One pass. Nothing added.

Its about whether wma standard 2pass bitrate targeting method could be fairly used in the 128 kbs listening test (since other available wma std modes cant reach the tests bitrate target. For background read the other thread -careful it is a bit of a headf*

hth

WMA VBR 2-pass encoding

Reply #6 – 2005-12-01 19:32:17

Quote

[...]
[a href="index.php?act=findpost&pid=346908"][{POST_SNAPBACK}][/a]

I believe you just had an overdose of maths...

You found the difference in percentage. guruboolez already said the difference in kbits.
You show the difference to the mean difference... Ok, now we know relatively how much the differences differ.
We got again the demand factor, which you suggest that it has a random deviation over the mean deviation.
Then, you go asking about another test, which is exactly the method #2 of the original post (already out of question), and to which the deviations don't correlate at all with your current findings.

Next, you try to say that this random deviation is self controlled, and that it magically balances itself. Wow... I knew that something random has the same possibilities of being anything, but that it graduates itself...

Shocking surprise, now we know that (1-pass) ABR encoders balance themselves to reach the desired rate, but do so giving more bitrate to the start than the end (like we do with films when it reaches the credits, but that's another story). Sorry to say, but that 20-30second seems way too much. At most, it's the 5-10 seconds you also mentioned later.

So, solution to all of our headaches is to find a fake audio stream that simulates the reality. I'd take the reality...

Solution #2 is out of question as said some lines above.

WMA VBR 2-pass encoding

Reply #7 – 2005-12-01 20:01:29

Quote

Quote
[...]

Gosh, I never considered that listening tests would be this complicated.

heh, i hear that

A problem with gurus test here and quick conclusion, is the substitute audio used was noticably unideal and effectively 'stole' bits from the samples in the fake encode. We dont know how much that mismatch scaled (or scaled^?+/-?) the random deviations from the results which my generalised model would have suggested.
Its noticable that if this test was attempting to find a method of using 2pass fairly, it wasnt quite thorough. But I have to admit from first appearances the random deviations seen in this small unmanaged sample dont flatter my model

I believe a wider examination of in-situ /cropped encoding results could make plenty of people worry. But notice how those who deal with these issues to make codecs perform as they do, depreciate the importance of sample wide bitrate variations. Thats because they are random by nature, and effects are as individual to the performance of each codec as testing one sample or another. Thats one reason why result confidence increases with samples used.

With many shortish singly encoded samples used, encoder run-in performance is included in every sample unlike real world usage where run in effects encoder performance just once every 3 minutes or so. What would be the significant run in effect for each codec/mode? That effect would be repeated for the codecs preference (positive or negative) for each and every sample (if it was longer than the 1sec startup skip used in the abch/r. -Unlike, the random effects resulting from differences resulting from 2pass performance.

WMA VBR 2-pass encoding

Reply #8 – 2005-12-01 20:36:53

Quote

,Dec 1 2005, 07:32 PM]You found the difference in percentage. guruboolez already said the difference in kbits.

Those percentages and gurus quoted kbit differences dont relate to the same thing.
Edit: well admittedly, the 3rd column does but in a more accessible form (%),
the fourth column puts them in context of what should be achieved (all other things being equal) if suitable substitue audio had been used in the 2pass.

Quote

...Then, you go asking about another test, which is exactly the method #2 of the original post

No. Method 2 is the samples 2passed individualy (plain to read) i suggested 2passing them altogether, without the substitute audio added.

Quote

Next, you try to say that this random deviation is self controlled, and that it magically balances itself. Wow... I knew that something random has the same possibilities of being anything, but that it graduates itself...

Yes that is a property of randomness. Its the reason why for example that carbon dating works, even - the reason why listening tests work.

edit: and more that that, these random deviations can be forced to add up to zero. how? the 2pass method allocates an amount of bits {average kbs*duration} If you 2pass only the joined samples, and target an approximated bit allocation - the mean of other encoders achieved allocations, the deviations will (be forced to) cancel out toward that specifiable amount.

Quote

Shocking surprise, now we know that (1-pass) ABR encoders balance themselves to reach the desired rate, but do so giving more bitrate to the start than the end

ABR allocation through tack time is heavily dependant on bits already allocated to the previous passage. If its tight, (CBR-like) ABR the relevant preceeding passage is short, if its normal its a fair bit longer than usual CBR. This results in more bit availability after low bit passages and less bit availability after high bit passages. This why if the preceeding passages are not included, startup bit allocation can be very different from if they are, and since start up can be different, so can following passages, in an oscillating manner, damped by the relevant ABR passage length.

Quote

So, solution to all of our headaches is to find a fake audio stream that simulates the reality. I'd take the reality...

Well so would everyone, but thats not quite possible with any of the encodes or samples, so they're trying to fairly approximate reality.

Quote

Solution #2 is out of question as said some lines above.

and so you just repeated your miscomprehension of the simplest of my statements.

Im sorry for snapping at you earlier.

WMA VBR 2-pass encoding

Reply #9 – 2005-12-01 22:29:03

There's too much unnecessary mathematics in this thread.

What guruboolez clearly has shown that the surroundings of the sample influence how it is encoded with WMA 2-pass. We already knew this and it would be weird if it didn't.

What now should be investigated is how important this is: are there discrepancies between real and fake using ABR (and VBR) and if so (to which the answer already is yes), how big are these compared to the WMA 2-pass method?

WMA VBR 2-pass encoding

Reply #10 – 2005-12-03 15:42:23

Quote

No. Method 2 is the samples 2passed individualy (plain to read) i suggested 2passing
them altogether, without the substitute audio added.

Then it sounds as method #3.

Quote

Yes that is a property of randomness. Its the reason why for example that carbon dating works, even - the reason why listening tests work.

But that is only with a big sample, not our case.

Quote

Well so would everyone, but thats not quite possible with any of the encodes or samples, so they're trying to fairly approximate reality.

But we need to know the reality to approximate to it, and the reality is the samples we want.

Quote

Im sorry for snapping at you earlier.

Same here.

WMA VBR 2-pass encoding

Reply #11 – 2005-12-03 21:13:53

Quote

,Dec 3 2005, 03:42 PM]
Quote
i suggested 2passing them altogether, without the substitute audio added.

Then it sounds as method #3.

Method 3, is with the substitute added.

Quote

The longer C's duration (the substitute audio), the greater its DemandRates effect on A and B's bit allocation, at 0 it ceases to have an effect.

In Gurus demonstration, method #2 basicaly checks the 2pass's ability to hit a kbs target, which it does ok.
The comparison we see is the real (method #1) vs method #3, in which the substitute audio was of much stronger demand factor than the real audio, so the individual sample encodes were all thrown off by it - starved of bits. I pointed that out and showed how much the individual samples deviated from what would be expected if a suitable substitute was used OR if no substitute was used with 2pass targeting the mean real sample kbs which could be approxmated from the gross mean of the other codecs encodes.
There is also reasonable possibility that the unsuitable substitute, caused exaggerated deviations in method #3 which is what a suitable or no substitute test would investigate.

Quote

you try to say that this random deviation is self controlled, and that it magically balances itself. Wow...
Quote
Yes that is a property of randomness. Its the reason why for example that carbon dating works, even - the reason why listening tests work....
But that is only with a big sample, not our case.

Its not a tendency only for many instances, its present in our case or any number of random events, but there is no number at which it becomes certain, its just a tendency, unlike the second point I made there:

Quote

more that that, these random deviations can be (are) forced to add up to zero. how? the 2pass method allocates an amount of bits {average kbs*duration} If you 2pass only the joined samples, and target an approximated bit allocation - the mean of other encoders achieved allocations, the deviations will (be forced to) cancel out toward that specifiable amount.

A total amount of bits is allocated with the 2pass method (kbs*duration) and gurus method #2 showed that target is reliable, so if any one samples allocation is deprived the other samples are boosted, and viceversa.

Quote

But we need to know the reality to approximate to it, and the reality is the samples we want.

That is an axiom (operating principle ;-) that you and guru have choosen. Ive struggled to explain, that in my understanding it is not necessarily so. Failing to do so, I see youre sure anyway, hey its your understanding, your test anyway. Ill let it lie for now, and you guys can keep your fingers crossed that I dont find the time to do a proper investigation of the methods performance (skillfull 2pass approximation vs Neros ABR)...

'all the best

Notice