Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: General discussion of future public test (Read 18174 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

General discussion of future public test

Hello,
Now when it’s sure that Apple encoder represents AAC standard very well the next step will be a public multiformat test.
No particular date or plan. It’s a general discussion with the purpose to hear the opinions, points of view, suggestions. 

Items to talk about (these numbers are first approximations):
1.The number of codecs:  4+low anchor. 4  is affordable number for one test.
2.The selection of codecs (basing on curret previous votes):
http://listening-test.blogspot.com/2011/06...reparation.html

Apple LC-AAC  - the  most optimal AAC encoder. (Well,  though  FhG was quite good too).
MP3 LAME  – 10 votes
Vorbis AoTuV – 8.5 votes.
Opus/CELT – 6 votes.  (still no info when there will be Opus 1.0)

It’s possible to change your votes.

(?) USAC (working name of future AAC’s successor ). No information about future availability of it.
(?) low anchor
(?) Settings, versions, etc.

3. Bitrate:
VBR 96 kbps. The last  AAC public test was at 96 kbps (well, it was actually 100 kbps).
Also I think it will be more interesting to compare MP3 128-135 kbps and AAC/AoTuV  96-100 kbps. Probably a lot of people are interesting to  trade off between compability/compression efficiency   

4. Samples.
Last time we  have applied the technic of random selection of the samples. 20 samples.

The developers are very welcome to participate in this topic.

This test should be more interesting as it will include interesting (for use) codecs. 

General discussion of future public test

Reply #1
Nice...

Quote
It’s possible to change your votes.


I propose to make up a big list of condidate-encoders and carry out another one poll.
🇺🇦 Glory to Ukraine!

General discussion of future public test

Reply #2
Good.

Guys, propose the codecs you are interested in.

Feel free to enter irc chat room on http://webchat.freenode.net/   
type /join #Hydrogenaudio

Anyway it's a general discussion without particular date. Average talk, suggestions and opinions.

General discussion of future public test

Reply #3
There is a first issue if MP3 will be tested at 96 kbps. MP3  has a sample rate 32 kHz at 96 kbps while AAC, AotuV, Opus -  44.1 kHz. 
Sampling issue.

Possible solutions are  resampler or higher bitrate for MP3.


General discussion of future public test

Reply #4
8.5 votes? 

then:

+1 for AAC
+1 for Vorbis
+0.5 for MP3
+0.5 for WMA standard (who knows, maybe it isn't very bad...)

IMHO mp3@96kbps cannot compete with aac/vorbis@96kbps.  112 or 128 kbps MP3 is more interesting.

General discussion of future public test

Reply #5
I'd like to see how Opus will perform here, so:

+1 for Opus
+1 for Vorbis
+1 for QT AAC
+0.5 WMA
+0.5 MP3@96
It's only audiophile if it's inconvenient.

General discussion of future public test

Reply #6
There is a first issue if MP3 will be tested at 96 kbps. MP3  has a sample rate 32 kHz at 96 kbps while AAC, AotuV, Opus -  44.1 kHz. 
Sampling issue.

Possible solutions are  resampler or higher bitrate for MP3.


I'm really not a fan of testing with multiple things unequal.

The reason for this is that the results lose their meaning and mean whatever the reader wants them to mean... or at least only one side is meaningful.  If AAC with 96 beats MP3 at 128, thats potentially interesting— but if they tie or MP3 wins then "of course, it's running at a higher rate, what did you expect?".

Of course, this is also true if the rate changes the bandpass "It wasn't really better, people just preferred the other lowpass setting".

Bitrate cures a lot of sins too.  A little extra rate can be the difference between transparency and not transparency.

What I would recommend is that the rates for the samples under test be run as close as possible to equal so long as there is still a good chance of any of them being a winner or loser.  E.g. if we think mp3 will very likely lose if run at 96k, then we should use a higher rate— but only enough to correct that.  If we run it at 192 and find that it does best, we'll have learned nothing from that effort.



General discussion of future public test

Reply #7
Possible solutions are  resampler or higher bitrate for MP3.

Another solution would be to match bitrates as has been mentioned. We *know* that the samplerate will be different - however those rating the files will rate the files as they hear them without forewarning as to the samplerate of the individual file. Of course, it could be an obvious difference....
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

General discussion of future public test

Reply #8
Not all cards support automatic switching between sample rates. For example my E-MU Pre Tracker doesn't. And some onboard cards have a distortion when you switch between sample rate. For example my onboard soundcard (ALC272). It's easy to see with udial.wav sample.

General discussion of future public test

Reply #9
The fact that Lame resamples to 32 kHz at 96 kbps shouldn't worry too much IMO. Maybe the lack of HF is easily noticeable, but at 96 kbps there is worse to be frightened of for mp3. Lacking the extreme frequency range doesn't really hurt much though it's often audible.

Whether it's useful that mp3 participates in a 96 kbps test is a question of sample selection. With problem samples, especially pre-echo stuff, mp3 won't have a chance, and participance of mp3 doesn't make sense.
So if mp3 should participate we'd better use only regular music, best chosen by chance.

On the other hand I wouldn't mind if we forget about mp3 at 96 kbps. It's really no candidate for the winner here.
lame3995o -Q1.7 --lowpass 17

General discussion of future public test

Reply #10
The fact that Lame resamples to 32 kHz at 96 kbps shouldn't worry too much IMO. Maybe the lack of HF is easily noticeable, but at 96 kbps there is worse to be frightened of for mp3.

QuickTime at CVBR96k surely doesn't do sample rate conversion by default, but LPF at 15k-16k or so is applied anyway. AFAIK FhG @ VBR q3 do the same.
In other words, "losing highs" is neither specific to LAME nor important.

As for QuickTime true VBR mode, sample rate conversion to 32k is done at quality 54 by default.
Therefore, when the test is done at quality 54, the decision must be made also for QuickTime, whether to use sample rate conversion or not.


General discussion of future public test

Reply #12
Some feedback was received on #Hydrogenaudio and  I would like to bring some information and answers basing my opinion on past public tests.


1. Can we go for  higher bitrate (>96 kbps)?
There were 531 results for 64 kbps test and only 280 results for 96 kbps AAC test while it was open during more time. 280 results were just fairly enough  to close the test.
64 kbps test has continued during 20 days, 96 kbps – 35 days (hence more difficult).
It’s not recomended to keep the test open for more than one month or so.

So it’s hard to imagine a public test at  higher bitrate.


2.a) Will be it more reasonable don’t discard all results of a listener who submit more invalid results that it’s allowed by rules? Keep “good” results and discard only erroneous results?
First of all it worth to mention that this question isn’t  new.
The problem is that it’s impossible to know if it was a really “good” result or it was a “lucky guess”. So it’s not accepting the “good” results and discarding the “bad” results but rather accepting “lucky guessing” results (which is still invalid one) and discarding “not lucky guessing”. 

Answer:  it’s more correct to discard all results of the listener who has past the limit of the invalid results (submit too much of them). Also see (1º)

2b) One would say then we can check if the results of this one particular listener correlates enough well with the results of other listener then it’s ok to accept  ¨only good results ¨.
The problem that later developers and memebers will complain why the rules weren’t applied homogeneously (to all  listeners in the same manner).
Then the simple and effective decesion was made:
The rules were  _strictly_  applied (as of  previous public tests 64 amd 96 kbps).  No context. You follow the rules – your results are in. If not – then no but you can start from zero. Simply as that. 



Also my observation about placebo effect and the interpretation of the results during the public listening test.

The score 4.0 means “perceptible but not annoying”. And how should the results of the last AAC test be interpreted  in case of  4.1 4.2 4.3 etc...?   
2-3 steps (0.2-0.3) is considering like a “very few ones” (in general interpretation. It’s psychology so I won't opine here  ). But more than 2-3 steps aren´t “very few” . So one could say that 4.2-4.3 is a limit score where the high quality begins.
I don´t know if it´s coincidence but this score (4.2-4.3) is  also where placebo effect  starts to reveal itself more.  (º1) Another interesting observation is that placebo leads to lower score than normal. There were quite enough results where the listener has  put lower score if it was actually placebo.  It leads to “flipped results” .  It means if average results were Codec A > B > C > D then flipped ones (with placebo effect) will be D>C>B>A.

(oh lord, my english  )


Speaking of this test we can go for both MP3 96 and 128 kbps. It's  a period to throw the ideas.

General discussion of future public test

Reply #13
+1 QT AAC
+1 Vorbis (Aotuv b603?)
+1 LAME 3.99
+0.5 Opus

General discussion of future public test

Reply #14
+1 for keeping the number of codecs small. Consider only three perhaps? Four is stretching it. I really struggled with the five in the last test (AAC @ ~96 kbps [July 2011]).
+1 for using a low anchor which doesn't fall so far behind in quality (like in the last test). It was an understandable choice back then, but I think it's quality was so low, that it didn't help to put things into perspective. (edit: never mind the next part, quality is probably too good to serve as a low anchor even at the lowest setting) How about using lossywav at rather extreme settings? Though I'm not sure if it is a good idea if the low anchor has completely different types of artifacts.

General discussion of future public test

Reply #15
Very low quality low anchor permits to analyse the results very fast.There were listeners who couldn't identify the low anchor in last test. Pretty strange.
So it isn't that bad to have bad low anchor but, yes, I agree that low anchor should have a little bit more quality.

General discussion of future public test

Reply #16
How about FAAC, I think it could be good as anchor. Actually it may be a bit too good as low anchor but who knows?
I would be interesting to see how it performs against LAME and Vorbis.

As to mp3 bitrate I think there is no point to encode mp3 below 128k so LAME@130 vs AAC/Vorbis@96 seems reasonable to me.

General discussion of future public test

Reply #17
Any chance to see Musepack competing against current encoders? Has been a while since it has been tested. Though it probably didn't change much regarding performance, I think it should be tested at lower bitrates than before, too.
It's only audiophile if it's inconvenient.

General discussion of future public test

Reply #18
USAC has been "released" with the confusing name "Extended HE-AAC." As one might guess from the new name, I don't think USAC is intended to replace AAC (LC or main profile) at higher bitrates; things I've read have seemed to imply that USAC (or at least its sweet spot) is supposed to top out at 64kbps.

Once somebody can get their paws on a USAC encoder I really think we should do another 48kbps (or possibly as low as 32kbps) test; I'd imagine HE-AAC encoders have come a long way in the >5 years since the last 48kbps test, and of course Opus is an important addition. Perhaps that could be the next one after the 96kbps test?

Go ahead and include Opus without waiting for a 1.0 release. The 1.0 release will likely include few changes; the reason it's long in coming has more to do with the drawn-out formalities of standard bodies than with the technical readiness of the format.

Vorbis too, partially because of its use for html5 audio.

I strongly feel that an MP3 encoder at the target bitrate should be included whenever possible, even when the target bitrate means we're confident it will be blown out of the water by the competition. My main reason is that this makes the test results more accessible to a wider audience; people are familiar with MP3 and having it as a point of comparison helps them have a better perspective on what the results mean. It's also simply interesting to see the progress made both in newer formats and in MP3 encoders. It'd be nice to include higher-bitrate MP3 as well; the space-compatibility tradeoff is real, and while "96kbps AAC=128kbps mp3" is probably roughly accurate, it's odd that a claim that's been made so frequently doesn't seem to have been definitively put to the test.

Note that Sebastian's test of MP3 encoders at 128kbps 3 1/2 years ago ended with "The quality at 128 kbps is very good and MP3 encoders improved a lot since the last test. This was the last test conducted by me at this bitrate. It's time to move to bitrates like 96 kbps or 80 kbps." Anybody who thinks that MP3 should only be tested at 128kbps or above needs a reality check; if we test at a bitrate where a significant number of codecs are practically transparent then testing is very difficult for the participants and in the end we get no worthwhile results.

As far as the LAME sample rate thing goes, I'd say we want to provide encoders the latitude to make any decisions they want to as they try to optimize audio quality. The only real reason to worry about LAME resampling the audio is questions of how people's sound cards may deal with the difference-- so just use a high-quality resampler to make all the encoders' decoded samples the same sample rate.

General discussion of future public test

Reply #19
Quote
Anybody who thinks that MP3 should only be tested at 128kbps or above needs a reality check.

We can use LAME@96k as a low anchor then. It should be good enough not to flatten results of the other encoders.
It could be a reality check to people who think mp3@96 can compete with vorbis/aac@96k.

General discussion of future public test

Reply #20
+1 Apple LC-AAC
+1 MP3 LAME @ 128-135 kbps
+1 Vorbis AoTuV
+1 Opus/CELT
I feel music enriches the soul.

General discussion of future public test

Reply #21
I think it would be a good idea to make another AAC listening at first (Maybe just QT vs. FhG) since the development of the Winamp encoder has been quite active.

General discussion of future public test

Reply #22
I think it would be a good idea to make another AAC listening at first (Maybe just QT vs. FhG) since the development of the Winamp encoder has been quite active.


Comparison of the commercial codec to the free one that was open sourced (bought by Google) would also be interesting.

Ideally I'd like to see, somewhere in the 48-96kbps range: Vorbis, FhG, FhG-free, Apple AAC, LAME (if 96kbps), Opus and USAC. Though it's likely too early for USAC.

General discussion of future public test

Reply #23
Ideally I'd like to see, somewhere in the 48-96kbps range: Vorbis, FhG, FhG-free, Apple AAC, LAME (if 96kbps), Opus and USAC. Though it's likely too early for USAC.

That's quite a lot to expect from one test. How many people would rate all 140 samples?

Keep in mind that there will be extra listener fatigue from trying to distinguish between encodes which are pretty transparent, especially for those of us without stellar golden ears. A 96kbps test is going to be considerably harder on people than a 64kbps test, and even in that test, which only had five encoders, only ten people submitted results for all samples.

FhG lost the public AAC 96kbps test to Apple last year by a statistically significant margin. Yes, it's been updated in the meantime and may be competitive with Apple's encoder at 96kbps now, and maybe it's noticeably better than Apple's for HE-AAC and <80kbps bitrates, but I don't think it's an important enough reference point at 96kbps to justify its inclusion at this stage.

Not only is it too early for USAC (no quality stereo encoder publicly available yet), but everything I see supports the idea that it's not really targeted at this bitrate anyway.

Let's just be sure to plan on another lower-bitrate test in the near future -probably 64kbps again, though I'd like to see one at ~48kbps too- to look at updated HE-AAC encoders, improvement in opus since the last test at that bitrate, and USAC.

General discussion of future public test

Reply #24
I think it would be a good idea to make another AAC listening at first (Maybe just QT vs. FhG) since the development of the Winamp encoder has been quite active.

If there  was a chance that any other encoder already performs as good or better than  Apple AAC at 96 kbps there would be a lod of people talking about that (me too). It's not the case.

We have dedicated a complete test to AAC format last time. if I will conduct the next public test it will be multiformat (96 kbps). We had a large discussion about that here. 
Nothing will change since then.  Period.