IPB

Welcome Guest ( Log In | Register )

2 Pages V  < 1 2  
Reply to this topicStart new topic
Second in the series of 128 tests
Ruse
post Jan 14 2002, 10:17
Post #26





Group: Members
Posts: 136
Joined: 10-November 01
From: AUS
Member No.: 433



In biological systems, I suppose it is possible to get unusual sensitivities, freak performances and critical failings. I have read of a human hearing defect where a person hears a different pitch in each ear: to use that subject to develop an audio coding system wouldn't be useful.

It is more useful to look at attributes and responses that can be categorised as standard subject response. To do otherwise would be to study atypical human perception and disease.

For developing perceptual audio coding systems, one should be able to identify & categorise artifacts that "typical" listeners will recognise and dislike. I think that ff123 has identified that most of his listening group responded in a similar fashion to the artifacts produced by the codecs. This must represent the standard response to artefacts by the human ear/brain system. There will be some that respond differently, but they would be better pulled from the testing group on the basis of outlier performance.


--------------------
Ruse
____________________________
Don't let the uncertainty turn you around,
Go out and make a joyful sound.
Go to the top of the page
+Quote Post
Garf
post Jan 14 2002, 11:12
Post #27


Server Admin


Group: Admin
Posts: 4853
Joined: 24-September 01
Member No.: 13



QUOTE
Originally posted by Ruse
Why don't you analyse and publish the results without listener 28 for comparison purposes. There must be a statistical validity of some type for excluding "wonky' data. I think the plots you have shown above indicate that listener 28 is an "outllier".

Can't you just exclude him on the basis of being more than 2 standard deviations from the mean?


No. The analysis that was used doesn't have a concept of 'standard deviation' anyway, and 'removing' data is always a very tricky thing to do, and not even generally accepted as possible in a statitically valid way.

Note that this guy would have passed even if post-screening would have been used. He is a valid data point. Us not liking what the data says doesn't change that.

--
GCP
Go to the top of the page
+Quote Post
ff123
post Jan 21 2002, 21:21
Post #28


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



I've been getting some help from Rich Ulrich in sci.stat.math in identifying outliers, and it appears that the statistic to use is the "corrected item-total correlation," or the (Pearson) correlation of each rater with the average for all the other raters.

For example, using this statistic, Monty has a correlation coefficient of 0.86, and Joerg (listener 28) has a value of -0.81.

A large, negative value (near -1.0) indicates a preference that runs highly counter to the the general trend.

I will be performing a sub-analysis in the near future for those listeners (there are 9 of them) who are highly and positively correlated.

ff123
Go to the top of the page
+Quote Post
ff123
post Jan 22 2002, 05:30
Post #29


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



Subanalysis based on the nine listeners who were highly correlated with each other (r > 0.7). These were the following:

CODE
listener    r

  1       0.86

  2       0.95

  6       0.80

 10       0.86

 14       0.84

 18       0.82

 19       0.96

 23       0.86

 27       0.92


Resampling analysis as follows:

CODE
Means:



mpc      ogg      lame     aac      wma8     xing

 4.63     4.09     3.61     3.36     2.11     2.04



                           Unadjusted p-values

        ogg      lame     aac      wma8     xing

mpc      0.022*   0.000*   0.000*   0.000*   0.000*

ogg        -      0.043*   0.003*   0.000*   0.000*

lame       -        -      0.270    0.000*   0.000*

aac        -        -        -      0.000*   0.000*

wma8       -        -        -        -      0.772



Each '.' is 1,000 resamples.  Each '+' is 10,000 resamples

.........+



                            Adjusted p-values

        ogg      lame     aac      wma8     xing

mpc      0.077    0.001*   0.000*   0.000*   0.000*

ogg        -      0.114    0.011*   0.000*   0.000*

lame       -        -      0.465    0.000*   0.000*

aac        -        -        -      0.000*   0.000*

wma8       -        -        -        -      0.773


ff123
Go to the top of the page
+Quote Post
ff123
post Jan 22 2002, 08:39
Post #30


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



Going back to dogies.wav, the listener corrected item-total correlations were:

1: 0.63
2: 0.70
3: 0.72
4: 0.71
5: 0.70
6: 0.76
7: 0.69
8: 0.74
9: 0.71
10: 0.70
11: 0.71
12: 0.81
13: 0.73
14: 0.71

All the listeners on this data set were fairly well correlated.

ff123
Go to the top of the page
+Quote Post
ff123
post Jan 22 2002, 17:37
Post #31


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



Added the subanalysis to the report, maybe not in time for the latest slashdot discussion, though.

http://ff123.net/128test/interim.html

ff123
Go to the top of the page
+Quote Post
mithrandir
post Jan 22 2002, 17:39
Post #32





Group: Members
Posts: 669
Joined: 15-January 02
From: SE Pennsylvania
Member No.: 1032



QUOTE
CODE
Means:



mpc      ogg      lame     aac      wma8     xing

 4.63     4.09     3.61     3.36     2.11     2.04

These results correlate rather closely to my experience with these codecs overall.
Go to the top of the page
+Quote Post
Jon Ingram
post Jan 22 2002, 18:12
Post #33





Group: Members
Posts: 315
Joined: 29-September 01
Member No.: 53



This is all very interesting, and this way of outlier removal seems exactly what you would want for developing audio codecs -- what you want to do is to develop something which sounds the best for the normal listener.

FF123, what happens to the significance information when you perform the same procedure on the other samples in your test?
Go to the top of the page
+Quote Post
ff123
post Jan 22 2002, 23:30
Post #34


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



QUOTE
FF123, what happens to the significance information when you perform the same procedure on the other samples in your test?


Unfortunately, this procedure doesn't work for rawhide.wav. This is kind of strange because I know that at one time rawhide.wav had significant results. I'd guess some sort of factor analysis is needed to pull a cluster of like-preferences out of the noise. I'll post the corrected item-total correlations later today for rawhide.wav and fossiles.wav.

ff123
Go to the top of the page
+Quote Post
ff123
post Jan 23 2002, 05:58
Post #35


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



Oops. It does work for rawhide.wav. I made a mistake when calculating the statistic for that file. The correlation coefficients are listed below. If I use the same standard as wayitis, and choose only those listeners satisfying 0.7 < r < 1.0, that would leave me with only two listeners. To get a decent group of listeners, I would have to change the standard and include weakly correlated listeners as well (0.3 < r < 0.7).


1. -0.33
2. 0.36
4. 0.75
5. 0.61
6. 0.49
7. 0.38
8. 0.94
10. 0.54
13. -0.36
14. 0.51
16. 0.06
17. 0.43
18. 0.27
19. 0.54
20. 0.23
21. -0.01
22. 0.18
23. -0.40
24. -0.33
25. 0.01
26. -0.48

If I include all listeners with 0.3 < r < 1.0, the following analysis follows:

CODE
Read 6 treatments, 10 samples



                           Unadjusted p-values

        ogg      wma8     mpc      lame     xing

aac      0.679    0.384    0.007*   0.006*   0.000*

ogg        -      0.646    0.020*   0.018*   0.001*

wma8       -        -      0.058    0.053    0.002*

mpc        -        -        -      0.963    0.201

lame       -        -        -        -      0.218



Each '.' is 1,000 resamples.  Each '+' is 10,000 resamples

.........+



                            Adjusted p-values

        ogg      wma8     mpc      lame     xing

aac      0.951    0.791    0.053    0.048*   0.001*

ogg        -      0.951    0.126    0.120    0.004*

wma8       -        -      0.281    0.278    0.018*

mpc        -        -        -      0.960    0.648

lame       -        -        -        -      0.648


ff123
Go to the top of the page
+Quote Post
Delirium
post Jan 23 2002, 06:03
Post #36





Group: Members
Posts: 300
Joined: 3-January 02
From: Santa Cruz, CA
Member No.: 891



ff123: I'm not sure if I'm reading your statistics correctly; do the wayitis results indicate that with a reasonable degree of certainty aac, ogg, and wma all outperformed both mpc and lame on this sample? Seems a lot different than the results for the other samples, but plausible.
Go to the top of the page
+Quote Post
ff123
post Jan 23 2002, 06:14
Post #37


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



QUOTE
ff123: I'm not sure if I'm reading your statistics correctly; do the wayitis results indicate that with a reasonable degree of certainty aac, ogg, and wma all outperformed both mpc and lame on this sample? Seems a lot different than the results for the other samples, but plausible.


for wayitis, for the nine highly correlated listeners, after adjustment for multiple samples,

mpc is better than xing
ogg is better than xing
lame is better than xing
aac is better than xing
mpc is better than wma8
ogg is better than wma8
lame is better than wma8
aac is better than wma8
mpc is better than aac
ogg is better than aac
mpc is better than lame

with 95% confidence

ff123
Go to the top of the page
+Quote Post
tangent
post Jan 23 2002, 10:42
Post #38





Group: Members
Posts: 674
Joined: 29-September 01
Member No.: 63



ff123, what happens if you consider only the rawhide results from the 9 listeners who "passed" the wayitis results?
Go to the top of the page
+Quote Post
ff123
post Jan 23 2002, 16:34
Post #39


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



QUOTE
what happens if you consider only the rawhide results from the 9 listeners who "passed" the wayitis results?


The results wouldn't be as significant as what I posted above. For example, xiphmont has a negative correlation on rawhide. Actually, I'm a bit leery of digging out groups of people this way. Grouping together a bunch of strongly correlated people is one thing (r > 0.7). It's another to pull in weakly correlated people as well.

ff123
Go to the top of the page
+Quote Post
tangent
post Jan 28 2002, 06:14
Post #40





Group: Members
Posts: 674
Joined: 29-September 01
Member No.: 63



What about using this technique for AQ1 results?
Go to the top of the page
+Quote Post
ff123
post Jan 28 2002, 06:47
Post #41


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



I thought about that, but I need to automate the process before I apply it to AQ1. I did the others by hand.

ff123
Go to the top of the page
+Quote Post
ff123
post Jan 28 2002, 07:53
Post #42


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



Ah, what the heck. I was curious.

I found the following correlations by listener, and sorted from most to least correlation (I am listener 6):

CODE
listener    r

   6         0.87

  20         0.79

  17         0.74

   1         0.71

  34         0.67

  13         0.67

   7         0.63

  30         0.60

  15         0.58

  37         0.56

  11         0.54

  41         0.54

  35         0.45

   9         0.43

  16         0.42

  10         0.38

   4         0.30

  18         0.29

  39         0.08

   2         0.06

  14         0.05

  38         0.02

  25        -0.01

  23        -0.07

  36        -0.12

  29        -0.17

  32        -0.56

  28        -0.56


If I choose only the 18 listeners with at least weak positive correlation (including listener 18), I get the following results:

CODE
mpc      dm-std   dm-xtrm  dm-ins   cbr256   abr224   r3mix    cbr192

 4.76     4.63     4.49     4.38     4.36     4.29     4.27     3.81



                           Unadjusted p-values

        dm-std   dm-xtrm  dm-ins   cbr256   abr224   r3mix    cbr192

mpc      0.379    0.068    0.010*   0.007*   0.002*   0.001*   0.000*

dm-std     -      0.339    0.087    0.062    0.021*   0.015*   0.000*

dm-xtrm    -        -      0.444    0.359    0.169    0.137    0.000*

dm-ins     -        -        -      0.878    0.540    0.467    0.000*

cbr256     -        -        -        -      0.646    0.566    0.000*

abr224     -        -        -        -        -      0.908    0.001*

r3mix      -        -        -        -        -        -      0.002*



Each '.' is 1,000 resamples.  Each '+' is 10,000 resamples

.........+



                            Adjusted p-values

        dm-std   dm-xtrm  dm-ins   cbr256   abr224   r3mix    cbr192

mpc      0.924    0.459    0.120    0.087    0.025*   0.020*   0.000*

dm-std     -      0.931    0.522    0.445    0.203    0.166    0.000*

dm-xtrm    -        -      0.922    0.922    0.724    0.660    0.000*

dm-ins     -        -        -      0.985    0.922    0.922    0.003*

cbr256     -        -        -        -      0.941    0.922    0.005*

abr224     -        -        -        -        -      0.985    0.021*

r3mix      -        -        -        -        -        -      0.027*


ff123
Go to the top of the page
+Quote Post
Delirium
post Jan 28 2002, 09:21
Post #43





Group: Members
Posts: 300
Joined: 3-January 02
From: Santa Cruz, CA
Member No.: 891



Again I seem to have trouble reading these charts, but would it be correct then to say that this analysis does not show any statistically significant difference between MPC, dm-std, and dm-xtrm (on the high end)? Also interesting than the average for dm-std seems to be higher than that for dm-xtrm, though again there's no statistically significant difference (I think?).
Go to the top of the page
+Quote Post
Jon Ingram
post Jan 28 2002, 10:39
Post #44





Group: Members
Posts: 315
Joined: 29-September 01
Member No.: 53



QUOTE
Again I seem to have trouble reading these charts

The only statistically significant results (after resampling) were:
*everything* is better than cbr192
*mpc* is also better than r3mix and abr224.
Go to the top of the page
+Quote Post

2 Pages V  < 1 2
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 20th April 2014 - 07:18