Help - Search - Members - Calendar
Full Version: AAC @ 128kbps listening test discussion
Hydrogenaudio Forums > Hydrogenaudio Forum > Listening Tests
Pages: 1, 2, 3, 4, 5, 6, 7
rjamorim
QUOTE(rjamorim @ Feb 29 2004, 11:44 PM)
Can someone enlighten me on the origins of Velvet?
http://lame.sourceforge.net/download/samples/velvet.wav

All I know is that it was submitted by Roel (r3mix).

Does anybody know artist (Velvet Underground?), title and album of this song? Also, what would be the style (no way to figure out from just the introduction)

ff123 already enlightened me about it. Thank-you very much.

Details are available at the listening test results page.
bond
QUOTE(rjamorim @ Mar 1 2004, 06:12 PM)
Nope. I couldn't decrypt your sample 09 results. It's the only result file that gave me problems in the entire test. I sent it to schnofler so that he can investigate. Sorry about that.

damn, i shouldnt have tried to manipulate the resultfiles wink.gif
rjamorim
A VERY IMPORTANT STATEMENT

OK. It seems I f-ed up very badly this time.

First, let me specify what ISN'T wrong: The ranking values are absolutely correct, as well as the screening methodology and the statystical calculations.

What is wrong: The error bars.

I didn't check how the error bars were being drawn in the excel spreadsheet I got from ff123. I thought the plots were getting values from a certain cell, but actually the values were hard-coded in the plot building routines.

So, the error bars are to this day the same as the ones used in his 64kbps listening test. And it affects all my listening tests. Both the overall plots and the individual ones.

I can't express how sorry I am.

Tomorrow I'll start fixingall the test results pages. Until I announce the results have been fixed, please disregard them.

In case someone is in a hurry to check the corrected zoomed result plot for the AAC test:
http://pessoal.onda.com.br/rjamorim/screen2.png
The only thing that changed is that iTunes is now clearly first place and Nero is second place.

Again, I'm terribly sorry. I can already feel my credibility going down the drain. sad.gif

Kind regards;

Roberto Amorim.
ff123
QUOTE(rjamorim @ Mar 4 2004, 10:49 PM)
What is wrong: The error bars.

I didn't check how the error bars were being drawn in the excel spreadsheet I got from ff123. I thought the plots were getting values from a certain cell, but actually the values were hard-coded in the plot building routines.


The fault is also mine for not making it perfectly clear how I was drawing the error bars. Plus I violated an Excel/software rule by not using a spreadsheet as a spreadsheet should be used, instead hard-coding in the error bar values.

QUOTE
Again, I'm terribly sorry. I can already feel my credibility going down the drain. sad.gif


Your integrity is intact. Credibility is a matter of trust. If you own up to your mistakes, correct them, and prevent future ones, that goes a long way towards enhancing your credibility.

I suggest keeping both the old (incorrect) overall graphs and showing the new, corrected overall graphs side by side, to show the before and after. I think the individual sample graphs can just be replaced.

ff123

Edit: You should probably rename the old overall graph and then use the original name of the graph for the corrected one. That way, websites which link to your overall graphs will be automatically updated.
rpop
QUOTE(ff123 @ Mar 5 2004, 03:00 AM)
Your integrity is intact.  Credibility is a matter of trust.  If you own up to your mistakes, correct them, and prevent future ones, that goes a long way towards enhancing your credibility.

Your integrity is, indeed, intact. I've seen a few other listening tests online, and discussion of their results always stops soon after the tests, with the page receeding in internet history. Updating these tests now goes a long way toward proving their reliability will be maintained in the future smile.gif.
Garf
QUOTE(rjamorim @ Mar 5 2004, 08:49 AM)
In case someone is in a hurry to check the corrected zoomed result plot for the AAC test:
http://pessoal.onda.com.br/rjamorim/screen2.png
The only thing that changed is that iTunes is now clearly first place and Nero is second place.

Aaaaaah, this explains my previous complaint that the graph didn't seem to align with your written statement about the test significance smile.gif

Now it does. iTunes indeed almost beats Nero by a significant margin.

As far as the moral winner is concerned, though: sad.gif
Continuum
QUOTE(Garf @ Mar 5 2004, 09:01 AM)
As far as the moral winner is concerned, though: sad.gif

huh.gif
"Moral winner"?
rjamorim
QUOTE(Garf @ Mar 5 2004, 05:01 AM)
Now it does. iTunes indeed almost beats Nero by a significant margin.

Erm.. I use Darryl's method to evaluate ranking positions.

Check, for instance, thear1 in his 64kbps test results
http://ff123.net/64test/results.html

Oggs are ranked second, according to him, although they overlap a little with MP3pro.

To put it short, I (and ff123, it seems) only consider codecs tied when one's confidence margin overlaps with the other's actual ranking. Or, to make things simpler, when more than half of the entire margins overlap.
Gabriel
QUOTE
I can already feel my credibility going down the drain


Finding, admitting, correcting your own errors only increases credibility I think.
guruboolez
Your credibility, your honesty and your honor are now stronger. Thank you.
ScorLibran
You have nothing to worry about, Roberto...you're credibility is quite secure. Anyone who conducts tests like this will occasionally have a mistake. It's inevitable. You took the best approach in resolving it. Our trust in you is only higher now. smile.gif

QUOTE(rjamorim @ Mar 5 2004, 03:27 AM)
QUOTE(Garf @ Mar 5 2004, 05:01 AM)
Now it does. iTunes indeed almost beats Nero by a significant margin.

...To put it short, I (and ff123, it seems) only consider codecs tied when one's confidence margin overlaps with the other's actual ranking. Or, to make things simpler, when more than half of the entire margins overlap.

That's what I had always thought was the case, but it was just an assumption on my part (that I never communicated). Glad to know it was correct.
ff123
[quote=ScorLibran,Mar 5 2004, 07:12 AM] ...To put it short, I (and ff123, it seems) only consider codecs tied when one's confidence margin overlaps with the other's actual ranking. Or, to make things simpler, when more than half of the entire margins overlap.[/QUOTE]
That's what I had always thought was the case, but it was just an assumption on my part (that I never communicated). Glad to know it was correct. [/quote]
To be absolutely correct, a codec wins with 95% confidence, for that group of listeners and set of samples, when the bars do not overlap. Or to put it another way, 19 times out of 20, those results would not occur by chance. Any overlap reduces that confidence. If the bars just barely overlap, there is still quite a high likelihood that that result did not occur by chance. A reasonable way to describe this situation would be to say that the results are suggestive (if not significant). Actually, in an ideal world, the graphs would speak for themselves, and there would be no "interpretation" to cause controversy.

If this were a drug test or something else where there is a lot at stake for making the right decision, everything below 95% confidence (or whatever threshold is chosen) would not be considered to be significant.

Also, the test would be corrected for comparing multiple samples, which would make the error bars overlap more. I personally don't think it's a real big deal if the type I errors in this sort of test (falsely identifying a codec as being better than another) are higher than they would be in a more conservative analysis. But others, for example on slashdot, can (and do) complain about this sort of thing.

ff123
Garf
I take it from the previous comment by rjamorim that 'bars' should be interpreted as 'error bars' and 'mean score marker' and not 2x 'error bars'?
ff123
QUOTE(rjamorim @ Mar 5 2004, 12:27 AM)
Check, for instance, thear1 in his 64kbps test results
http://ff123.net/64test/results.html

Oggs are ranked second, according to him, although they overlap a little with MP3pro.

In that test I used an "eyeball" method to rank the codecs when trying to determine an appropriate overall ranking. People (including me) didn't like the subjectivity involved in that method, so I changed to the method used now, which is to perform another ANOVA/Fisher LSD once the means for each music sample are determined. The assumption this method makes is that each sample is equally important to the final overall results. This may not actually be true if, for example, there are lots of people listening to some samples and only a few listening to others. Also, the choice of samples greatly affects the overall results.

But at least it seems to produce reasonable results, and it's removed the subjectivity involved in the earlier method.

QUOTE
I take it from the previous comment by rjamorim that 'bars' should be interpreted as 'error bars' and 'mean score marker' and not 2x 'error bars'?


The length of each error bar from top to bottom (mean in the middle) is equal to the Fisher LSD.

ff123
JohnV
QUOTE(ff123 @ Mar 5 2004, 05:34 PM)
To be absolutely correct, a codec wins with 95% confidence, for that group of listeners and set of samples, when the bars do not overlap.  Or to put it another way, 19 times out of 20, those results would not occur by chance.  Any overlap reduces that confidence.  If the bars just barely overlap, there is still quite a high likelihood that that result did not occur by chance.  A reasonable way to describe this situation would be to say that the results are suggestive (if not significant).  Actually, in an ideal world, the graphs would speak for themselves, and there would be no "interpretation" to cause controversy.

If this were a drug test or something else where there is a lot at stake for making the right decision, everything below 95% confidence (or whatever threshold is chosen) would not be considered to be significant.

Also, the test would be corrected for comparing multiple samples, which would make the error bars overlap more.  I personally don't think it's a real big deal if the type I errors in this sort of test (falsely identifying a codec as being better than another) are higher than they would be in a more conservative analysis.  But others, for example on slashdot, can (and do) complain about this sort of thing.

ff123

Right, well, with 95% confidence for the tested 12 samples:
iTunes is better than Real,FAAC and Compaact
Nero is better than Real and Compaact

With lower confidence for the tested 12 samples:
Nero is better than FAAC (small overlap)

With even lower confidence for the tested 12 samples:
iTunes is better than Nero (a bit bigger overlap than with Nero-FAAC)

Correct?
Garf
QUOTE(ff123 @ Mar 5 2004, 06:05 PM)
The length of each error bar from top to bottom (mean in the middle) is equal to the Fisher LSD.

So there shouldn't be any overlap between error bars at all, if I get that correctly, since no overlap between error bar and mean is only half the error length. (And hence my original comment was right).
Zed
QUOTE(rjamorim @ Mar 5 2004, 12:27 AM)
QUOTE(Garf @ Mar 5 2004, 05:01 AM)
Now it does. iTunes indeed almost beats Nero by a significant margin.

Erm.. I use Darryl's method to evaluate ranking positions.

Check, for instance, thear1 in his 64kbps test results
http://ff123.net/64test/results.html

Oggs are ranked second, according to him, although they overlap a little with MP3pro.

To put it short, I (and ff123, it seems) only consider codecs tied when one's confidence margin overlaps with the other's actual ranking. Or, to make things simpler, when more than half of the entire margins overlap.

how about this one?

where is the truth?
ff123
QUOTE(Garf @ Mar 5 2004, 08:36 AM)
QUOTE(ff123 @ Mar 5 2004, 06:05 PM)
The length of each error bar from top to bottom (mean in the middle) is equal to the Fisher LSD.

So there shouldn't be any overlap between error bars at all, if I get that correctly, since no overlap between error bar and mean is only half the error length. (And hence my original comment was right).

Yes. If the error bars do not overlap, that is a difference to 95% confidence. And yes, iTunes almost beats Nero with 95% confidence.
eagleray
Is there anything in the testig methodology to assure that iTunes does not sound "better" than the original CD through the addition of some audio "sugar"?

I hope the experts around here do not think this is too off the wall. For that matter I don't know if there is a way to make any recording sound "better" than the original.
ff123
QUOTE(Zed @ Mar 5 2004, 08:53 AM)
this one?

where is the truth?

The biggest weakness of this test IMO is that there were only 3 samples tested, and they made it even worse by combining them into one medley. Other problems: IIRC, people were asked to rank the codecs from best to worst, not to compare and rate against a known reference. I believe the reference was hidden as one of the samples to be ranked.

But the 3 sample medley is really the killer. They would have been much better off distributing lots of different samples (with that amount of listeners they could have distributed 50 different samples with ease) to determine which codec is better overall.

ff123
rjamorim
Hello.

Thank-you very much for your support smile.gif

I have been correcting the plots (will upload them later) and so far, it seems very few will change:

-At the first AAC@128kbps test, it only becomes more clear that QuickTime is the winner.
-At the Extension test, it seems Vorbis and WMAPro are no longer tied to AAC and MPC, and now share second place. I'll leave it to others to discuss.
-The 64kbps test results stay the same: Lame wins, followed by HE AAC, then MP3pro, then Vorbis. LC AAC, Real and WMA are still tied at fifth place, and FhG MP3 is still way down the graph.
-The MP3 test stays the same as well.

Regards;

Roberto.
ff123
QUOTE(eagleray @ Mar 5 2004, 09:12 AM)
Is there anything in the testig methodology to assure that iTunes does not sound "better" than the original CD through the addition of some audio "sugar"?

I hope the experts around here do not think this is too off the wall.  For that matter I don't know if there is a way to make any recording sound "better" than the original.

Yes, the listener is asked to rate the sample against the reference. The reference is 5.0 by default, so any difference, even if it "sounds better" than the reference must be rated less than 5.0

ff123
Zed
QUOTE(ff123 @ Mar 5 2004, 09:16 AM)
But the 3 sample medley is really the killer.  They would have been much better off distributing lots of different samples (with that amount of listeners they could have distributed 50 different samples with ease) to determine which codec is better overall.

but small number of the ears is also the killer i guess...
ff123
QUOTE(Zed @ Mar 5 2004, 09:28 AM)
QUOTE(ff123 @ Mar 5 2004, 09:16 AM)
But the 3 sample medley is really the killer.  They would have been much better off distributing lots of different samples (with that amount of listeners they could have distributed 50 different samples with ease) to determine which codec is better overall.

but small number of the ears is also the killer i guess...

They had about 3000 listeners for both the 64 kbit/s and 128 kbit/s tests. If they had distributed 50 separate samples instead of the one medley, they could have gotten more than 50 listeners per sample. That's more than enough to make a statistical inference. In fact, one can do quite well with far fewer.

ff123
Garf
The test also seems at least 1.5 years old. Lots has happened in that time with AAC.
ScorLibran
QUOTE(Garf @ Mar 5 2004, 01:07 PM)
The test also seems at least 1.5 years old. Lots has happened in that time with AAC.

I was going to say the same thing.

Plus, it seems the results of that test have been recycled from time to time over the past year-and-a-half. I'm sure I've seen them multiple times now.

After at least one major update has been made to a codec involved in a test, the test results will be out of date (at least when discussing the current version of a format). Only recent results can be trusted, and only until such a code change occurs again.

(This is in addition to the other shortcomings of the afforementioned test.)
guruboolez
QUOTE(ScorLibran @ Mar 5 2004, 07:21 PM)
Plus, it seems the results of that test have been recycled from time to time over the past year-and-a-half.  I'm sure I've seen them multiple times now.

I could confirm that.
dewey1973
I think rjamorim might take offense to this...

http://www.macworld.co.uk/news/top_news_item.cfm?NewsID=8097

QUOTE
Apple's iTunes has emerged victorious in a series of listening tests run by the CD Freaks Web site.


gun2.gif
indybrett
Holy cow. Will Apple be using Roberto's test results in commercials now wink.gif
rjamorim
QUOTE(dewey1973 @ Mar 5 2004, 03:49 PM)
I think rjamorim might take offense to this...

http://www.macworld.co.uk/news/top_news_item.cfm?NewsID=8097

QUOTE
Apple's iTunes has emerged victorious in a series of listening tests run by the CD Freaks Web site.


gun2.gif

[bad language removed by moderation]

This is too much. I already got pissed when Slashdot claimed my tests were conduced by Hydrogenaudio. But this is ridiculous!
smok3
QUOTE(dewey1973 @ Mar 5 2004, 08:49 PM)
I think rjamorim might take offense to this...

probably a silly mistake, i took a liberty and send email to the news editor (Jonathan Evans, http://www.macworld.co.uk/contact/ ) with a request to fix the error.
bidz
QUOTE(smok3 @ Mar 5 2004, 11:48 AM)
QUOTE(dewey1973 @ Mar 5 2004, 08:49 PM)
I think rjamorim might take offense to this...

probably a silly mistake, i took a liberty and send email to the news editor (Jonathan Evans, http://www.macworld.co.uk/contact/ ) with a request to fix the error.

Me too biggrin.gif
indybrett
QUOTE(bidz @ Mar 5 2004, 02:50 PM)
QUOTE(smok3 @ Mar 5 2004, 11:48 AM)
QUOTE(dewey1973 @ Mar 5 2004, 08:49 PM)
I think rjamorim might take offense to this...

probably a silly mistake, i took a liberty and send email to the news editor (Jonathan Evans, http://www.macworld.co.uk/contact/ ) with a request to fix the error.

Me too biggrin.gif

Me too, and I got a reply. Guess his mailbox is getting busy smile.gif
Garf
Their interpretation of the result is also factually wrong, and different from what roberto wrote.
JohnV
QUOTE(rjamorim @ Mar 5 2004, 09:10 PM)
This is too much. I already got pissed when Slashdot claimed my tests were conduced by Hydrogenaudio. But this is ridiculous!

Hehe, I think the problem is that they want to refer to some known web site. Rarewares is not so known really. And since you don't want to mention Hydrogenaudio's strong support anymore in the results page in providing test discussion and test participants (in fear of sites likes Slashdot referring to the test as HA test), this is what happens.. wink.gif

I'm sure CD-Freaks don't mind though.. laugh.gif
indybrett
Look again...

http://www.macworld.co.uk/news/top_news_item.cfm?NewsID=8097
rjamorim
QUOTE(JohnV @ Mar 5 2004, 04:54 PM)
And since you don't want to mention Hydrogenaudio's strong support anymore in the results page in providing test discussion and test participants (in fear of sites likes Slashdot referring to the test as HA test), this is what happens.. wink.gif

Blah. Bollocks. I give lots of credits to HydrogenAudio at the listening tests start page. And they insisted on keeping the CDfreaks link even after I pointed they were wrong (they also ruined their HTML, heh. morons). So, it's just that they seem to want to advertize Freaks, in some way or another.
JohnV
QUOTE(rjamorim @ Mar 6 2004, 01:56 AM)
QUOTE(JohnV @ Mar 5 2004, 04:54 PM)
And since you don't want to mention Hydrogenaudio's strong support anymore in the results page in providing test discussion and test participants (in fear of sites likes Slashdot referring to the test as HA test), this is what happens.. wink.gif

Blah. Bollocks. I give lots of credits to HydrogenAudio at the listening tests start page. And they insisted on keeping the CDfreaks link even after I pointed they were wrong (they also ruined their HTML, heh. morons). So, it's just that they seem to want to advertize Freaks, in some way or another.

Well, at the same time did you ask them to change the "CD-Freaks is reporting" to "Hydrogenaudio is reporting"?
This is not the first time freaks take credit for something they have no business what so ever..
tigre
Zed's ranting moved here.
rjamorim
QUOTE(rjamorim @ Mar 5 2004, 04:10 PM)
[bad language removed by moderation]

Duh...
http://cornelldailysun.com/articles/11061/

Anyway: Already fixed the plots for the first AAC test and the 128kbps extension test. Working on the 64kbps test plots now.

Edit: 64kbps results are up too. Will work on MP3 128 and AACv2 later. Stay tuned...
rjamorim
QUOTE(JohnV @ Mar 5 2004, 09:08 PM)
Well, at the same time did you ask them to change the "CD-Freaks is reporting" to "Hydrogenaudio is reporting"?

Nope, because I didn't know they would insist on mentioning CDfreaks. I guessed they would replace it with a link to rjamorim.com. It seems to me they deliberately want to give some sort of credit to CDfreaks.
music_man_mpc
QUOTE(rjamorim @ Mar 7 2004, 12:50 PM)
It seems to me they deliberately want to give some sort of credit to CDfreaks.

Thats absolutely absurd. Does CDfreaks have some affiliation with Apple?
rjamorim
QUOTE(music_man_mpc @ Mar 7 2004, 08:34 PM)
Thats absolutely absurd.  Does CDfreaks have some affiliation with Apple?

Not that I know of. But why insist of mentioning them at their article? CDfreaks weren't the only ones announcing it, it was also announced by Afterdawn, Slashdot etc.
music_man_mpc
QUOTE(rjamorim @ Mar 7 2004, 03:47 PM)
Not that I know of. But why insist of mentioning them at their article? CDfreaks weren't the only ones announcing it, it was also announced by Afterdawn, Slashdot etc.

Thats what I was saying, I meant it was absurd of them, not you smile.gif. Sorry for not being totally clear.
bond
i just noted that the results of the last aac test still point to the old, not corrected plots

it gets displayed this old plot by default:
user posted image
(rarewares.hydrogenaudio.org/rja/plot12z.png)
which is the one i created right after the test and surely hasnt the corrected error bars

still you seem to have already uploaded the correct plots already here:
user posted image
(www.rjamorim.com/test/64test/plot12z.png)
but the result page doesnt link to it
rjamorim
OK, after lots of interpretation, I understood what you are saying.

Keep in mind this is the AAC test thread. And you mention "the results of the last aac test", and right under you link graphics of the 64kbps test. There was a knot in my brain ("what is he talking about after all?") smile.gif

Fixed it now. Thanks for reporting.
bond
me stupid blink.gif
DeepDose
I encode my audio cds with NERO mp4 @ 387kps....then listen with my 5.1...and thrash out to some insane sounding euphoric sound!
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.