IPB

Welcome Guest ( Log In | Register )

> Upload forum rules

- No over 30 sec clips of copyrighted music. Cite properly and never more than necessary for the discussion.


- No copyrighted software without permission.


- Click here for complete Hydrogenaudio Terms of Service

 
Reply to this topicStart new topic
WMA VBR 2-pass encoding, identical setting but different output
guruboolez
post Dec 1 2005, 10:28
Post #1





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



Samples related to this message.

Original are available on rarewares.

This post has been edited by guruboolez: Dec 1 2005, 10:32
Go to the top of the page
+Quote Post
guruboolez
post Dec 1 2005, 13:23
Post #2





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



Five snapshots: one for each file corresponding to the graphical representation of the dynamic of each FULL track (in white is selected the exact small part which corresponds to the 23...30 sec sample).

As you can see, the "fake track" (which correspond to 18 samples merged into one) is very different from all four real tracks. The loudness may explain why the four classical samples are underpriviledged with WMA Std 2-pass (106 kbps on average) compared to the remaining parts of the "virtual track". Consequently the non-classical tracks benefit from the additionnal bitrate.

CODE
         REAL TRACK    FAKE TRACK
bartok     124,39        116,40
debussy    105,44         78,45
hongroise  130,93         88,72
mahler     192,70        142,66
          ____________________
           138,37        106,56


This post has been edited by guruboolez: Dec 1 2005, 13:53
Attached thumbnail(s)
Attached Image
Attached Image
Attached Image
Attached Image


Attached Image
 
Go to the top of the page
+Quote Post
ChiGung
post Dec 1 2005, 16:22
Post #3





Group: Members
Posts: 439
Joined: 9-February 05
From: county down
Member No.: 19713



QUOTE (guruboolez @ Dec 1 2005, 12:23 PM)
CODE
         REAL TRACK    FAKE TRACK   REAL/FAKE   R/F /MEAN R/F (130)
bartok     124,39        116,40      107%      82%
debussy    105,44         78,45      134%      103%
hongroise  130,93         88,72      146%      112%
mahler     192,70        142,66      135%      104%
          ____________________________
           138,37        106,56      130%

*


You see what ive done there, added the REAL results as a percentage of FAKE,
and see how they correlate to around 130%, with an encodes maximum deviation from mean of -18%.
Its the relative Demand..factor ;-) of the added 'fake' audio (difference between the demand factor of the substituted audio and the real audio) which informs this ratio between the achieved bitrates of the Real encodes and fake encodes.
As you have shown, there is also some random deviation on top of this (max of 18% in the samples documented).
It would also be interesting to see, the results of a fake track, with no added substitute audio, 2Pass targeting 128Kbs or even better the 138kbs achieved by the real encodes.

How relevant is the random deviation? Read Ivan Dimkovic's take on the subject.

The random deviation basicaly balances itself out over a sufficient number of test samples. If you examine how any ABR methods bitrate allocation differs with 20-30sec clip encodes from in-situ encodes youll see at least as serious deviations from the norm and worse if you compare bitrate distribution within the samples (bit allocation to the start of the sample /w bitrate allocation towards the end) I expect you might be shocked rolleyes.gif

This could be taken as a problem for the listening test, but things will get difficult if it is or unbalanced if it is only applied to Wma std option.
With a well choosen fake audio substitute you can achieve parity of bit allocation with the 2pass vbr method with individual encodes tending to agree between real and fake method.
I think it would work best to not use an audio substitute and just 2pass target the mean bitrate achieved by the other encoders (as an approximation of the the real achieved bitrate calculated here in this small sample)

If you are going to go with considering individual encoding discrepancies relevant, you should really adopt Gabriels suggestion of discarding 10 seconds or so run-in time of the samples for Vbr encodes as well as ABR (he kinda knows what hes talking about there ;-)

Anyway I take my hat off to your curiosity and labours here, good luck with resolving the issues raised.

regards'

This post has been edited by ChiGung: Dec 1 2005, 16:31


--------------------
no conscience > no custom
Go to the top of the page
+Quote Post
naylor83
post Dec 1 2005, 19:01
Post #4





Group: Members
Posts: 204
Joined: 19-June 05
From: Uppsala, Sweden
Member No.: 22842



QUOTE (ChiGung @ Dec 1 2005, 05:22 PM)
[...]
*


Gosh, I never considered that listening tests would be this complicated. blink.gif


--------------------
davidnaylor.org

Vorbis Q4, please. AoTuv b5, preferably.
Go to the top of the page
+Quote Post
BradPDX
post Dec 1 2005, 19:43
Post #5





Group: Members
Posts: 142
Joined: 16-August 05
From: Portland, Oregon
Member No.: 23924



This would be a whole lot more fun if the poster gave us some release notes - hell, I am an EE and I have no idea exactly what is being discussed. I know about WMA 2-pass encoding, but the explanations of key terms and goals are completely missing:

1. "real" and "fake" encodes - what on earth do you mean? How does one create a "fake"? Why is this relevant?
2. Relative "demand" factor - care to explain? What is doing the demanding? I demand an explanation!
3. 18 samples merged into one - why? What is the goal? What samples? Merged how? Averaged? Convoluted? Added? Modulated?

All of this is very interesting, just utterly opaque as presented. In the meantime, I will sit at my desk listening to some 128kbps AAC encodes. One pass. Nothing added.
Go to the top of the page
+Quote Post
ChiGung
post Dec 1 2005, 20:25
Post #6





Group: Members
Posts: 439
Joined: 9-February 05
From: county down
Member No.: 19713



QUOTE (BradPDX @ Dec 1 2005, 06:43 PM)
This would be a whole lot more fun if the poster gave us some release notes - hell, I am  an EE and I have no idea exactly what is being discussed. I know about WMA 2-pass encoding, but the explanations of key terms and goals are completely missing:

1. "real" and "fake" encodes - what on earth do you mean? How does one create a "fake"? Why is this relevant?

Real encodes are 2pass bitrate targeted encodes of full tracks, and sections of the track extracted for use in the 128kbs listening test.
Fake encodes are 2pass bitrated targeted encodes of just the sections of track used for listening, pasted together one after each other, and another section of 'averageish' audio pasted in at the beginning or end, and then the individual listening test sections are removed after the 2passed encode.
QUOTE
2. Relative "demand" factor - care to explain? What is doing the demanding? I demand an explanation!

Demand factor is a handle on how demanding a section of audio is to encode for the encoder, how strongly its algorithm demands bits to encode a specified section of audio.

I cooked it up with this generalised model of 2pass perfomance at this point in the huge listening test thread:
(note: its poorly name Demandrate here as point out by sehested)
QUOTE
It would be wonky for the prepass calculated vbr setting not to be global,
assuming it is, this equation would be true:
CODE
phraseA_Bitrate*phraseA_Duration
+phraseB_Bitrate*phraseB_Duration
+phraseC_Bitrate*phraseC_Duration
=target_Bitrate*Total_Duration (=total bit allocation)

Next define Demandrate, a kind of passage complexity estimate from the encoders preferences, high for passages which would demand more bits, low for passages which would demand less.

phrase_Demandrate=phrase_Bitrate/target_Bitrate
phrase_Bitrate=phrase_Demandrate*target_Bitrate

Substituting phrase_Bitrate for its Demandrate expression in previous equation...

CODE
(phraseA_Demandrate*target_Bitrate)*phraseA_Duration
+(phraseB_Demandrate*target_Bitrate)*phraseB_Duration
+(phraseC_Demandrate*target_Bitrate)*phraseC_Duration
=target_Bitrate*Total_Duration

devide both sides of equation by target_Bitrate to leave:
CODE
phraseA_Demandrate*phraseA_Duration
+phraseB_Demandrate*phraseB_Duration
+phraseC_Demandrate*phraseC_Duration
=Total_Duration


QUOTE
All of this is very interesting, just utterly opaque as presented. In the meantime, I will sit at my desk listening to some 128kbps AAC encodes. One pass. Nothing added.

Its about whether wma standard 2pass bitrate targeting method could be fairly used in the 128 kbs listening test (since other available wma std modes cant reach the tests bitrate target. For background read the other thread -careful it is a bit of a headf* wink.gif

hth

This post has been edited by ChiGung: Dec 1 2005, 22:07


--------------------
no conscience > no custom
Go to the top of the page
+Quote Post
[JAZ]
post Dec 1 2005, 20:32
Post #7





Group: Members
Posts: 1568
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



QUOTE (ChiGung @ Dec 1 2005, 04:22 PM)
[...]
*


I believe you just had an overdose of maths...

You found the difference in percentage. guruboolez already said the difference in kbits.
You show the difference to the mean difference... Ok, now we know relatively how much the differences differ.
We got again the demand factor, which you suggest that it has a random deviation over the mean deviation.
Then, you go asking about another test, which is exactly the method #2 of the original post (already out of question), and to which the deviations don't correlate at all with your current findings.

Next, you try to say that this random deviation is self controlled, and that it magically balances itself. Wow... I knew that something random has the same possibilities of being anything, but that it graduates itself...

Shocking surprise, now we know that (1-pass) ABR encoders balance themselves to reach the desired rate, but do so giving more bitrate to the start than the end (like we do with films when it reaches the credits, but that's another story). Sorry to say, but that 20-30second seems way too much. At most, it's the 5-10 seconds you also mentioned later.

So, solution to all of our headaches is to find a fake audio stream that simulates the reality. I'd take the reality...

Solution #2 is out of question as said some lines above.
Go to the top of the page
+Quote Post
ChiGung
post Dec 1 2005, 21:01
Post #8





Group: Members
Posts: 439
Joined: 9-February 05
From: county down
Member No.: 19713



QUOTE (naylor83 @ Dec 1 2005, 06:01 PM)
QUOTE (ChiGung @ Dec 1 2005, 05:22 PM)
[...]

Gosh, I never considered that listening tests would be this complicated. blink.gif

heh, i hear that wink.gif

A problem with gurus test here and quick conclusion, is the substitute audio used was noticably unideal and effectively 'stole' bits from the samples in the fake encode. We dont know how much that mismatch scaled (or scaled^?+/-?) the random deviations from the results which my generalised model would have suggested.
Its noticable that if this test was attempting to find a method of using 2pass fairly, it wasnt quite thorough. But I have to admit from first appearances the random deviations seen in this small unmanaged sample dont flatter my model tongue.gif

I believe a wider examination of in-situ /cropped encoding results could make plenty of people worry. But notice how those who deal with these issues to make codecs perform as they do, depreciate the importance of sample wide bitrate variations. Thats because they are random by nature, and effects are as individual to the performance of each codec as testing one sample or another. Thats one reason why result confidence increases with samples used.

With many shortish singly encoded samples used, encoder run-in performance is included in every sample unlike real world usage where run in effects encoder performance just once every 3 minutes or so. What would be the significant run in effect for each codec/mode? That effect would be repeated for the codecs preference (positive or negative) for each and every sample (if it was longer than the 1sec startup skip used in the abch/r. -Unlike, the random effects resulting from differences resulting from 2pass performance.


--------------------
no conscience > no custom
Go to the top of the page
+Quote Post
ChiGung
post Dec 1 2005, 21:36
Post #9





Group: Members
Posts: 439
Joined: 9-February 05
From: county down
Member No.: 19713



QUOTE ([JAZ)
,Dec 1 2005, 07:32 PM]You found the difference in percentage. guruboolez already said the difference in kbits.

Those percentages and gurus quoted kbit differences dont relate to the same thing.
Edit: well admittedly, the 3rd column does but in a more accessible form (%),
the fourth column puts them in context of what should be achieved (all other things being equal) if suitable substitue audio had been used in the 2pass.
QUOTE
...Then, you go asking about another test, which is exactly the method #2 of the original post

No. Method 2 is the samples 2passed individualy (plain to read) i suggested 2passing them altogether, without the substitute audio added.
QUOTE
Next, you try to say that this random deviation is self controlled, and that it magically balances itself. Wow... I knew that something random has the same possibilities of being anything, but that it graduates itself...

Yes that is a property of randomness. Its the reason why for example that carbon dating works, even - the reason why listening tests work.

edit: and more that that, these random deviations can be forced to add up to zero. how? the 2pass method allocates an amount of bits {average kbs*duration} If you 2pass only the joined samples, and target an approximated bit allocation - the mean of other encoders achieved allocations, the deviations will (be forced to) cancel out toward that specifiable amount.
QUOTE
Shocking surprise, now we know that (1-pass) ABR encoders balance themselves to reach the desired rate, but do so giving more bitrate to the start than the end

ABR allocation through tack time is heavily dependant on bits already allocated to the previous passage. If its tight, (CBR-like) ABR the relevant preceeding passage is short, if its normal its a fair bit longer than usual CBR. This results in more bit availability after low bit passages and less bit availability after high bit passages. This why if the preceeding passages are not included, startup bit allocation can be very different from if they are, and since start up can be different, so can following passages, in an oscillating manner, damped by the relevant ABR passage length.
QUOTE
So, solution to all of our headaches is to find a fake audio stream that simulates the reality. I'd take the reality...

Well so would everyone, but thats not quite possible with any of the encodes or samples, so they're trying to fairly approximate reality.
QUOTE
Solution #2 is out of question as said some lines above.

and so you just repeated your miscomprehension of the simplest of my statements.

Im sorry for snapping at you earlier.

This post has been edited by ChiGung: Dec 1 2005, 21:59


--------------------
no conscience > no custom
Go to the top of the page
+Quote Post
stephanV
post Dec 1 2005, 23:29
Post #10





Group: Members
Posts: 394
Joined: 6-May 04
Member No.: 13932



There's too much unnecessary mathematics in this thread. blink.gif

What guruboolez clearly has shown that the surroundings of the sample influence how it is encoded with WMA 2-pass. We already knew this and it would be weird if it didn't.

What now should be investigated is how important this is: are there discrepancies between real and fake using ABR (and VBR) and if so (to which the answer already is yes), how big are these compared to the WMA 2-pass method?


--------------------
"We cannot win against obsession. They care, we don't. They win."
Go to the top of the page
+Quote Post
[JAZ]
post Dec 3 2005, 16:42
Post #11





Group: Members
Posts: 1568
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



QUOTE (ChiGung @ Dec 1 2005, 09:36 PM)
No. Method 2 is the samples 2passed individualy (plain to read) i suggested 2passing
them altogether, without the substitute audio added.

Then it sounds as method #3.

QUOTE
Yes that is a property of randomness. Its the reason why for example that carbon dating works, even - the reason why listening tests work.

But that is only with a big sample, not our case.

QUOTE
Well so would everyone, but thats not quite possible with any of the encodes or samples, so they're trying to fairly approximate reality.

But we need to know the reality to approximate to it, and the reality is the samples we want.

QUOTE
Im sorry for snapping at you earlier.

Same here.
Go to the top of the page
+Quote Post
ChiGung
post Dec 3 2005, 22:13
Post #12





Group: Members
Posts: 439
Joined: 9-February 05
From: county down
Member No.: 19713



QUOTE ([JAZ)
,Dec 3 2005, 03:42 PM]
QUOTE (ChiGung @ Dec 1 2005, 09:36 PM)
i suggested 2passing them altogether, without the substitute audio added.

Then it sounds as method #3.
Method 3, is with the substitute added. ermm.gif

QUOTE (me in other thread)
The longer C's duration (the substitute audio), the greater its DemandRates effect on A and B's bit allocation, at 0 it ceases to have an effect.

In Gurus demonstration, method #2 basicaly checks the 2pass's ability to hit a kbs target, which it does ok.
The comparison we see is the real (method #1) vs method #3, in which the substitute audio was of much stronger demand factor than the real audio, so the individual sample encodes were all thrown off by it - starved of bits. I pointed that out and showed how much the individual samples deviated from what would be expected if a suitable substitute was used OR if no substitute was used with 2pass targeting the mean real sample kbs which could be approxmated from the gross mean of the other codecs encodes.
There is also reasonable possibility that the unsuitable substitute, caused exaggerated deviations in method #3 which is what a suitable or no substitute test would investigate.

QUOTE (Jaz)
you try to say that this random deviation is self controlled, and that it magically balances itself. Wow...
QUOTE (chigung)
Yes that is a property of randomness. Its the reason why for example that carbon dating works, even - the reason why listening tests work....
But that is only with a big sample, not our case.
Its not a tendency only for many instances, its present in our case or any number of random events, but there is no number at which it becomes certain, its just a tendency, unlike the second point I made there:
QUOTE (chigung)
more that that, these random deviations can be (are) forced to add up to zero. how? the 2pass method allocates an amount of bits {average kbs*duration} If you 2pass only the joined samples, and target an approximated bit allocation - the mean of other encoders achieved allocations, the deviations will (be forced to) cancel out toward that specifiable amount.
A total amount of bits is allocated with the 2pass method (kbs*duration) and gurus method #2 showed that target is reliable, so if any one samples allocation is deprived the other samples are boosted, and viceversa.

QUOTE
But we need to know the reality to approximate to it, and the reality is the samples we want.
That is an axiom (operating principle ;-) that you and guru have choosen. Ive struggled to explain, that in my understanding it is not necessarily so. Failing to do so, I see youre sure anyway, hey its your understanding, your test anyway. Ill let it lie for now, and you guys can keep your fingers crossed that I dont find the time to do a proper investigation of the methods performance (skillfull 2pass approximation vs Neros ABR)...

'all the best

This post has been edited by ChiGung: Dec 4 2005, 01:49


--------------------
no conscience > no custom
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 19th June 2013 - 20:14