Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: listening test at 160 kbps (Read 74182 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

listening test at 160 kbps

In order to add some datas for the frequently asked question: “what vorbis encoder should I use”, I've decided to run a listening test, comparing four different version of vorbis at 160...165 kbps:
- CVS (oggenc 2.3)
- GT3b2 tuning (associated to 1.01 reference code)
- aoTuV beta 2
- aoTuV beta 2+ QK tuning

The choice of settings is directly based on phong and Mac useful works:
http://www.hydrogenaudio.org/forums/index....ndpost&p=215978
http://www.hydrogenaudio.org/forums/index....ndpost&p=216019


I. PURPOSE OF THIS TEST

I focused the test on one problem: pre-echo (in other words, sharpness, or edge details). Conclusions are therefore limited to this unique problem. Why pre-echo only? In my opinion, and under -q 6 setting, aoTuV code (beta 2) outperforms CVS code with or without Garf Tuning on all problems, except one: pre-echo. Overall sound with aoTuV at mid/low settings is cleaner, with less noise, and not as fat and coarse than with CVS code. It's just my opinion, based on personnal listening test (with classical music samples).
Pre-echo performances of aoTuV are for me a total mystery. I didn't really test this encoder on that point (I'm more annoyed by coarse sounding of vorbis). I've read that aoTuV include some pre-echo tuning, so I suppose that aoTuV performs better (but how much?) than CVS. More important question is: are aoTuV performances comparable to the nice tunings of Garf?
Garf Tuning (GT3) is very impressing on pre-echo samples. The only problem with this tuning: it's based on CVS code, which suffers at -q5...-q5.99 from serious problems (described as hiss, tonality difference, noise, coarse sound, stereo imaging...).
If aoTuV compete with GT3b2, I suppose that we could conclude on its overall superiority. But if aoTuV has more smearing issues, the question of “recommanded encoder” would probably stay problematic.
In addition to this test, the hybrid encoder named aoTuV+QK. I perfectly know that QK code implementation on aoTuV is problematic, ruining sometimes the positive effect of aoTuV tuning. Nevertheless, I wonder if these negative effects couldn't be balanced by positive performances of sharpness. The test should give some elements of answers.

II. SAMPLES.

For this test, I used more than 20 samples. Some of them are well-known: castanets, castanets2, c44... But most of them have my library as origin. I used short one, in order to upload them. I must add that most of my samples are not so "pure" or precise than the three previous one: guitar, marimbas, harpsichord, drums... couldn't compete with castanets for sharpness. Therefore, these samples are a bit harder to ABX. But they are maybe more representative, I don't know....
In addition to sharp and ponctual attacks represented in samples like castanets or percussions, I've add some samples with micro-attacks (like fatboy, but this one isn't present in this test). Lossy encoders tend to encode this kind of signal with extra-noise, more or less annoying. Four samples corresponds to this signal: awe32 (well-known - electronic music), Hmong (traditional vietnamese instrument), Orion II (solo trombone, one of the most problematic "occidental" instrument for lossy encoders), and Pierres Réfléchies (electronic/concrete music). I could add the beginning of the "creaking [door]" sample.
Funny thing to note: bitrate with the "guimbard" vietnamese instrument is terribly high : 280 kbps with CVS encoder, and up to 461 kbps with GT3b2 at -q 5.00 (full track is 'only' 450 kbps).

III. RESULTS




IV. CONCLUSIONS

• without any doubt, vorbis CVS performs poorly on the pre-echo problem. It's a well-known problem, no need to insist.

• GT3b2 tunings are impressing, metamorphosing the original CVS code. It's especially true for "pure pre-echo samples" (like castanets), but progress are clearly audible on other samples too. On micro-attacks, the bitrate explode completely (up to +60% on Hmong.wav), but quality is always in consequence: very nice.

• aoTuV beta 2 performances on pre-echo are simply remarquable. But the overall notation must be analysed:
- on very sharp attacks, aoTuV suffers a lot from smearing, and is not really far from CVS original code. Pity...
- on not-too-sharp attacks, aoTuV performs very well. Pre-echo is very limited. Comparable but probably slightly inferior to GT3b2.But without the extra-brightness and the irritating noise audible with GT3b2, quality of both encoders is similar.
- on micro-attacks, aoTuv is superior to CVS, but GT3b2 is the uncontestable winner.

• aoTuV+QK. The “winner” of this pre-echo test (best overall notation). Quantum Knot modifications are very pertinent on these kind of samples.
- on very sharp attacks, performances are slightly better than GT3b2! and outperforms aoTuV reference code.
- but on micro-attacks, quality is inferior to the original aoTuV code (for my taste; results might differs from other people).
- on not-too-sharp attacks, quality is close to aoTuV (sometimes more; sometimes less...), and is better than GT3b2 which suffers from other problems.

• there's not only pre-echo in music. The sample “Die Schlacht.wav” is a cruel reminiscence of this fact. I expected pre-echo on attacks: the four encodings were free of this problem. But I was badly surprised by the sound of strings, severly wounded by CVS and GT3 tunings. The responsible is maybe the problematic lossy-stereo model of Vorbis, bad or untuned. aoTuV lowered the problem, which is nevertheless still audible... Additional tunings are therefore welcome :-)

listening test at 160 kbps

Reply #1
Wow, you truly are a star -  thanks Guru, very interesting reading!

listening test at 160 kbps

Reply #2
Though it's closer to a "192 kbps" test, than a 160 one... Interesting none the less

listening test at 160 kbps

Reply #3
http://www.hydrogenaudio.org/forums/index....ndpost&p=215978

According to these statistics, -q 5.00 and -q5.50 are ~160 kbps for general music (full CD encodings). On short and problematic sample, higher bitrate are something common and expected.

listening test at 160 kbps

Reply #4
It is always a pleasure to read these detailed and meticulous listening tests from guruboolez.  My eyes were glued to the screen till I reached the last full stop.

On average, it seems aoTuV does a bit better than GT3b2 as the former top-scored in 10 of the 22 samples while GT3b2 only 6.  Also, my suspicion is that perhaps the hiss, coarseness, and noisy nature of CVS/GT3b2 is more annoying than smearing of transients and pre-echo, hence, coupled with harashin's findings, I think it is time for aoTuV to be the recommended coder for all q's. 

Can I have a show of hands on whether we should retire GT3b2? 

I vote to retire GT3b2 and have aoTuV as the recommended encoder at q > 5



EDIT:  Changed 20 to 22 samples 

listening test at 160 kbps

Reply #5
Quote
In order to add some datas for the frequently asked question: “what vorbis encoder should I use”, I've decided to run a listening test, comparing four different version of vorbis at 160...165 kbps:
- CVS (oggenc 2.3)
- GT3b2 tuning (associated to 1.01 reference code)
- aoTuV beta 2
- aoTuV beta 2+ QK tuning

Very interesting to see that aoTuV ties with GT3 even on those kind of samples. I'd like to do some tests which feature my(and other people's) pre-echo samples on this weekend.

listening test at 160 kbps

Reply #6
Quote
According to these statistics, -q 5.00 and -q5.50 are ~160 kbps for general music (full CD encodings). On short and problematic sample, higher bitrate are something common and expected.

Indeed... Sorry for the useless comment 
Quote
Can I have a show of hands on whether we should retire GT3b2?

I've been using aoTuV only, since Roberto's last listening test  So you have my vote

listening test at 160 kbps

Reply #7
Note that aoTuV is slightly inferior to GT3b2 on average, if we remove the sample named "Die schalcht": GT3b2 is loosing two points on a problem which have nothing to do with pre-echo.

Before removing GT3b2, take a look to the average notation of "pure pre-echo" file (i.e. file with very strong and sharp attacks):
- c44
- castanets
- castanets2
- cataclysmes
- clapping
- creaking

I don't have notes in mind, but IIRC it's something like 2.3 / 5 for aoTuV and 3.6 for GT3b2.

aoTuV+QK is maybe the best vorbis compromise of the moment: it's the sharpest encoder, and with a lot of correction of hiss, noise, etc...

listening test at 160 kbps

Reply #8
I've upload all samples (except 41_30) in optimfrog format here:

ftp://ftp2.foobar2000.net/foobar/

Someone should test the archive. They are maybe corrupted.
I can't upload them on HA sever now

listening test at 160 kbps

Reply #9
Quote
Someone should test the archive. They are maybe corrupted.
I can't upload them on HA sever now

Both archives are fine here. (12+9 *.ofr files)

Edit: wrong number

Edit2: 41_30sec which isn't included in the archive, is available at ff123's site.

listening test at 160 kbps

Reply #10
Quote
Note that aoTuV is slightly inferior to GT3b2 on average, if we remove the sample named "Die schalcht": GT3b2 is loosing two points on a problem which have nothing to do with pre-echo.

Before removing GT3b2, take a look to the average notation of "pure pre-echo" file (i.e. file with very strong and sharp attacks):
- c44
- castanets
- castanets2
- cataclysmes
- clapping
- creaking

I don't have notes in mind, but IIRC it's something like 2.3 / 5 for aoTuV and 3.6 for GT3b2.

aoTuV+QK is maybe the best vorbis compromise of the moment: it's the sharpest encoder, and with a lot of correction of hiss, noise, etc...

Yeah, it all comes down to a compromise but do you think that pre-echo is a more 'forgiveable' problem for a lossy perceptual coder than hiss and noise?  That is why I think aoTuV is probably more 'well-rounded', if we place more emphasis on its hiss and noise suppression rather than its higher pre-echo.

I think it is disappointing that aoTuV+QK regressed from aoTuV, though it shouldn't be unexpected, considering how I just slapped the two tunings together without any testing.  But it certainly means that more effort is needed to tune it.  I might have an idea on how to improve it on the micro-attack samples.

listening test at 160 kbps

Reply #11
Quote
Yeah, it all comes down to a compromise but do you think that pre-echo is a more 'forgiveable' problem for a lossy perceptual coder than hiss and noise?

For me, without an hesitation. I have more violins than castanets in my CD library
Neverteless, some people listening a lot of sharp electronic music and sensitive to pre-echo, GT3b2 is maybe preferable. I don't know...

In my opinion, aoTuV is more complete to GT3b2; but aoTuV+QK seems to be more equilibrated: good trade-off between noise performance and sharpness.


EDIT: I've forgot to post the log files of the test:
here

I used ff123 abc/HR 1.1 beta for this test.

listening test at 160 kbps

Reply #12
I am one of those people for whom the noise/hiss (at the level present in gt3b2) is less offensive than preecho.  However, I will agree with the choice to make aotuv the recommended encoder because:
- I think I'm unusually sensetive to preecho and transient smearing compared to other artifacts
- The noise problem seems to be more common than preecho on average
- The advantages of having one recommended encoder are worth it
- I'm "in the know" so if I want to use gt3b2 on my Aphex Twin albums, I will know to do that
- aotuv has been reported to fix the sometimes very annoying stereo issues with vorbis below -q 6

I guess it's also too early to rule out your experimental aotuv+QK as it did well here.  Even though it regresses compared to aotuv it's still definately better than stock and may represent a decent comprimise between "fixing noise" and "fixing preecho".

One thing that's definately shown by these test is that vorbis is still in need of a quite a bit of tuning based on the number of samples that have problems even at high -q levels (compared to e.g. mpc.)
I am *expanding!*  It is so much *squishy* to *smell* you!  *Campers* are the best!  I have *anticipation* and then what?  Better parties in *the middle* for sure.
http://www.phong.org/

listening test at 160 kbps

Reply #13
 Hello.
Do the answer please though you are silly question.
Can any one explain what pre-echo is?
Thanks for any help or advice.

listening test at 160 kbps

Reply #14
Quote
Do the answer please though you are silly question.
Can any one explain what pre-echo is?

Just look into the page from ff123 web site.

Anyway....thanks Guru for these great tests

 

listening test at 160 kbps

Reply #15
I suggest you retest aoTuV b2 at -q6 (to compare with GT3 b2 -q5). There is certainly enough "bitrate room" and personally I find aoTuV b2 to perform a bit better on pre-echo at 6.00+ than 5.00-5.99. To my ears, it's transparent on castanets at -q6. It's has that smoothed-over sound at 5.99 and under.

EDIT: I understand why aoTuV b2 was used at -q5.5 (instead of -q6). My hypothesis, however, is that aoTuV -q6 will be better than GT3 -q5.99.

listening test at 160 kbps

Reply #16
Thanks for replies

About aoTuV+QK regression (compared to aoTuV): I wonder how often and in what precise conditions it happens.
I graphically compared a solo violin piece encoded with aoTuV and aoTuV+QK. Why violin? Violin is a tonal instrument. QuantumKnot code shouldn't therefore modify the file. A graphical comparison confirms that, and it *shows* that *objective* difference is sub-existant:
• whole file: http://membres.lycos.fr/guruboolez/AUDIO/v..._difference.png
• 10 seconds zoom: http://membres.lycos.fr/guruboolez/AUDIO/v...rence_10sec.png

Filesize is exactly the same (2kb only difference). Few samples are modified. It's really nothing, and can't modifiy the noise performances of aoTuV code. Good thing

For comparison, here's a difference between aoTuV+QK and GT3b2:
http://membres.lycos.fr/guruboolez/AUDIO/v...GT3_aoTuVQK.png

(P.S. all encodings were done at the same setting: -q 5,00)
(P.S.2 Before someone complain about TOS infringing, I've also tried to ABX the files: no difference between aoTuV/aoTuVQK and obvious difference in favor of aoTuV compared to GT3b2).

listening test at 160 kbps

Reply #17
Quote
My hypothesis, however, is that aoTuV -q6 will be better than GT3 -q5.99.

Testing aoTuV at -q 6,22 and GT3b2 at -q 6,00 is more interesting in my opinion than testing GT3b2 at -q 5,99 against anything. I don't want to offend any developer, but performances of CVS/GT3 below -q 6.00 are disappointing in regard to bitrate. Instruments generally sound fat, coarse (it's limited compared to -q 4.00, but it's inacceptable for encodings close to 200 kbps).

listening test at 160 kbps

Reply #18
Aotuv anyday. The CVS/GT3 <Q6 noise issue for the high bitrate is bad and can be audiable in normal listening. Q6 or higher differences between GT3 & AoTuv will probably only exist in abx situations  - I may be wrong though.

At least we will have *one* good general purpose encoder in aotuv. It will help boost confidence in vorbis i think.

listening test at 160 kbps

Reply #19
A dumb question, why aren't GT3, AoTuv and QK modifications available in the Xiph CVS (SVN) ? Like an alternative choice on command-line.

AFAIK LAME has some different command-line choices for different algos.

listening test at 160 kbps

Reply #20
QK, I would be inclined to go with the increasing number of tests showing either of the AoTuV tunings to be superior to GT3 at q5+  It is time for one Vorbis to rule them all, or something like that..
< w o g o n e . c o m / l o l >

listening test at 160 kbps

Reply #21
Quote
Aotuv anyday. The CVS/GT3 <Q6 noise issue for the high bitrate is bad and can be audiable in normal listening. Q6 or higher differences between GT3 & AoTuv will probably only exist in abx situations - I may be wrong though.

One preecho sample I have is detectable (but not particularly serious) in what I would consider "normal listening" at -q 6.  It's improved (but not quite "fixed") by gt3b2 compared to aotuv and 1.0.1.  I am going to test the aotuv+qk version this weekend with this sample and some other preecho samples; I think it may have been counted out too soon!
I am *expanding!*  It is so much *squishy* to *smell* you!  *Campers* are the best!  I have *anticipation* and then what?  Better parties in *the middle* for sure.
http://www.phong.org/

listening test at 160 kbps

Reply #22
Quote
Thanks for replies

About aoTuV+QK regression (compared to aoTuV): I wonder how often and in what precise conditions it happens.
I graphically compared a solo violin piece encoded with aoTuV and aoTuV+QK. Why violin? Violin is a tonal instrument. QuantumKnot code shouldn't therefore modify the file. A graphical comparison confirms that, and it *shows* that *objective* difference is sub-existant:
• whole file: http://membres.lycos.fr/guruboolez/AUDIO/v..._difference.png
• 10 seconds zoom: http://membres.lycos.fr/guruboolez/AUDIO/v...rence_10sec.png

Filesize is exactly the same (2kb only difference). Few samples are modified. It's really nothing, and can't modifiy the noise performances of aoTuV code. Good thing

For comparison, here's a difference between aoTuV+QK and GT3b2:
http://membres.lycos.fr/guruboolez/AUDIO/v...GT3_aoTuVQK.png

(P.S. all encodings were done at the same setting: -q 5,00)
(P.S.2 Before someone complain about TOS infringing, I've also tried to ABX the files: no difference between aoTuV/aoTuVQK and obvious difference in favor of aoTuV compared to GT3b2).

That narrows the regression down a bit.  The QK component affects only short blocks, hence this agrees with the findings from your violin sample, which are mostly long blocks.  The regression must be occurring in regions of transient attacks, probably boosting the HF hiss caused by point stereo.

listening test at 160 kbps

Reply #23
I'm back, with the same test, but at higher setting: -q 6.00...6.50. Samples are the same. I was nevertheless short in time, and I didn't ABXed the last files. For these files, the difference in notation may be imprecise (it means that small differences in notation and hierarchy are maybe unjustifed).
For those wondering about setting used for CVS encoder (-q 6,50): I don't have any internet access at home, and I hadn't the correspondant values found by phong when I started the test. I've at first encoded all files with CVS -q 6,22 like aoTuV, but I feared that bitrate was too low. Therefore, I've decided to round the setting to a nice 6,50. It's probably a bit too high, but quality shouldn't really change between the setting I used and the ideal one (6,36 according to phong bitrate table).

RESULTS


Log files are here (few or even no comments):
here


CONCLUSIONS


• CVS encoder have serious troubles with transients, even at -q 6,50 (210 kbps nominal, which is a pretty high bitrate for a lossy encoder). Something like brightness is also audible with some files, but it's not really annoying, and very far from thickness/coarseness heard previously at -q 5,50, which considerably lowered the notation. In other words, quality progress a lot between -q 5,50 and -q 6,50; pre-echo is in my opinion the biggest problem (but not the only one) of CVS encoders at -q 6...9 settings.

• In this conditions, a CVS encoder tweaked for pre-echo should be impressive. And it's the case for GT3b2, which progressed a lot between the two tests (-q 5.00 then -q 6.00). Incidentally, 30% of the samples were transparent on my test (I could probably find more differences with insane concentration). Interesting to note: the sample "Die Schlacht", symptomatic of the coarseness of vorbis, was here fully transparent on violins with both CVS and GT3b2. Other interesting point: micro-attacks (creaking, Hmong, Pierre Réfléchies, Orion II) are still better with GT3b2 than with any other vorbis encoder, though extra-noise is still perceptible.

• aoTuV's performances are now between CVS and GT3b2. It's always better than CVS (except one case, but it was on quick test, without ABX: notation is a bit imprecise, and hierarchy might be wrong). But it's rarely better or simply eaqual than GT3b2. On very sharp attacks (castanets...), aoTuV performances are a bit disapointing. We could expect more from a modern lossy encoders. On moderately sharp attacks, pre-echo is limited, not really annoying.
We could also note that the progress between the notation on the two tests is small. aoTuV -q 6,22 is just slightly better than -q 5,50, whereas progress with CVS and GT3b2 is very impressive. It's not a real problem, quite the reverse: it proves that aoTuV quality is more linear than CVS/GT3, without huge frontier between - q 5,99 and -q 6,00.
Last thing: when brightness is audible with CVS/GT3, aoTuV hasn't problem, or lowers it (cf. Atem-Lied for a good exemple).

• In conclusion, aoTuV needs more tuning in order to be •fully• recommanded over GT3b2. A temporary solution might be something like an aoTuV+GT3 encoder. Ready for another round?